Big Data Analysis module (AC51011)
About the Module
This module looks at storing, manipulating and analysing big data. We define big data as data that is high in volume, is captured at high velocity and contains high variety (in terms of structured and unstructured parts). We will investigate the tools and techniques both for storage of this data, such as distributed databases and filesystems, and for processing this data, such as Apache Hadoop, Apache Spark (and its library for machine-learning, MLLib), Apache Hive etc. A part of the module is also dedicated to programming in Scala, a functional language with built in concurrency that allows easy and rapid development of parallel data analytics, and to some fundamental algorithms for extracting knowledge from data, such as frequent pattern mining, classification and clustering.
There are 20 SCQF points available on this module.
|4||Introduction to Big Data and Big Data Analysis|
|5||Big Data Analysis Methodologies|
|6||Big Data Storage|
|8||Concurrent and Distributed Functional Programming using Scala|
|9||Big Data Processing Frameworks (Hadoop, Spark, Fink)|
|10||Data Mining Using Spark MLLib, Frequent Pattern Mining|
|11||Classification and Clustering|
Assessment and Coursework
Coursework counts for 40% of the final module mark.
Written exam counts for 60% of the final module mark.
Marking criteria are provided on My Dundee for all assignments so that you know what we are looking for when we are marking your coursework. Please ensure that you refer to these when completing assignments.
|Title||Week Given||Week Due||Effort Expected (hours)||Value (%)|
|Big Data Processing||8||12||20||20|
All course material is available on My Dundee. This includes copies of lecture materials, practical exercises, and assignments. The reading list for this module can be accessed from My Dundee and provides recommended materials for completing the module.