PAPER NO.11(E1) BIG DATA MANAGEMENT UNIT DESCRIPTION
To equip the candidate with the knowledge, skills and attitudes that will enable him/her to understand the role of policy in the Big Data ecosystem and to apply current tools and technologies for managing and processing Big Data within a business environment.
LEARNING OUTCOMES
A candidate who passes this paper should be able to:
• Identify the technological and business needs for Big Data management
• Explore the infrastructure and architectures for Big Data
• Describe the Big Data and Hadoop ecosystem and its management
• Utilise relevant technologies to deploy structured and unstructured data
• Design and manage Big Data storage structures
• Apply data regulatory frameworks and policy trends for Big Data management
CONTENT
1. Introduction to Big Data
1.1 Big Data concepts, drivers and techniques
1.2 Trends in Big Data management
1.3 Big Data applications
1.4 Challenges and opportunities for Big Data
2. Big Data Architectures
2.1 Relational systems architecture
2.2 Data warehousing architectures
2.3 Service- oriented architecture
2.4 The Lambda architecture
3. Big Data Acquisition, Cleaning and Storage
3.1 Big Data gathering
3.2 Big Data filtering and cleaning
3.3 Big Data quality considerations
3.4 Extract Transform Load (ETL)Tools
4. Big Data Mining and Warehousing
4.1 Exploration of massive datasets
4.2 Big Data mining
4.3 Volume management
4.4 Velocity management
4.5 Case study in Data Mining and Data Warehousing
5. Big Data and Hadoop ecosystem
5.1 Overview and difference between Hadoop and traditional data manage system
5.2 Data Storage: Hadoop Distributed File System (HDFS) and HBASE
5.3 Data processing: MapReduce and YARN
5.4 Data Access: Hive, Pig, Mahout, Avro and Sqoop
5.5 Data Management: Oozie, Flume and Zookeper
5.6 Spark framework
5.7 Big Data exploration and visualization
5.8 Case Study
6. Pattern Mining over Big Data
6.1 Candidate generation
6.2 Identification of patterns and growth
6.3 Sequential data
6.4 Temporal data
6.5 Understanding uncertainty in data
6.6 Case study in pattern mining
7. Big Data Processing Pipelines
7.1 Pipelining and parallelism
7.2 Big data synchronization
7.3 Multi-tenancy schemes
7.4 Resilient data sets
8. Big Data Design
8.1 Schema less database design
8.2 Wide column structures
8.3 Document stores
8.4 NoSQL data stores (Hive, MongoDB)
8.5 Case study in Big Data design
9. In-memory Data Management
9.1 Columnar data storage
9.2 Late reconstruction
9.3 Light-weight compression
10. Distributed Computing
10.1 Features and reference model
10.2 Capacity requirement for distributed systems
10.3 Concurrency control and mutual exclusion mechanisms
10.4 Security issues for distributed applications
10.5 Integration of distributed applications
10.6 Case study in distributed computing
11. Big Data Policy Frameworks
11.1 Policy, Law and Institutions
11.2 Data privacy and protection
11.3 Big Data ethics
11.4 Selected case studies on policies
12. Deployment of Big Data
12.1 Information technology (IT) infrastructures for Big Data
12.2 Dedicated versus shared resources
12.3 On-premise versus public cloud services
12.4 Open-source software versus Proprietary software deployment
12.5 Selected case studies on Big Data deployment