Big Data Analytics notes

PAPER NO.12 (E1) BIG DATA ANALYTICS
UNIT DESCRIPTION
This paper is intended to equip the candidate with the knowledge, skills and attitude that will enable him/her to apply relevant Big Data analytical tools and techniques to generate insight from large volumes of data for informed decision making.

LEARNING OUTCOMES
A candidate who passes this paper should be able to:
• Explore the Big Data analytics and machine learning approaches
• Utilize exploratory and predictive data analysis techniques for massive datasets
• Apply Big Data analytics and related technologies and present visual outcomes for decision making
• Deploy applications that leverage Big Data analytics for sustainable impact.
• Evaluate ethical or public policy concerns and emerging issues in Big Data analytics
CONTENT
1. Introduction to Big Data Analytics
1.1 Concepts of Big Data analytics
1.2 Artificial intelligence
1.3 Challenges and opportunities
1.4 Applications of Big Data analytics
1.5 Introduction to Python programming language
2. Mathematics for Big Data Analytics
2.1 Linear algebra
2.2 Tensor and matrix implementation
2.3 Transposition and matrix multiplication
2.4 Eigenvalue and eigenvector
2.5 Determinant and singular value decomposition
2.6 Probability theory
2.7 Least squares
2.8 Gradient descent implementation
2.9 Implementation with python

3. Exploration of massive datasets in Python
3.1 Working with Pandas and Dask
3.2 Predictor and target feature identification
3.3 Techniques for handling missing values, noise and outliers
3.4 Feature engineering (Feature transformation and creation)
3.5 Case study in Data Exploration

4. Social Networks Analysis in Python
4.1 Introduction to Graph Theory
4.2 Modelling networks
4.3 Network metrics
4.4 Network library

4.5 Graph databases
4.6 Neo4j
4.7 Case studies in Social Networks Analysis

5. Machine Learning Pipeline
5.1 Big Data collection
5.2 Big Data pre-processing
5.3 Feature extraction and selection (labelling and dimensionality reduction)
5.4 Model validation
5.5 Data visualization
5.6 Using Python’s NumPy, SciPy, Matplotlib and Scikit-learn

6. Unsupervised (Clustering) Machine Learning in Python
6.1 Dimension reduction techniques
6.2 K-means clustering
6.3 K-Nearest Neighbour
6.4 Hierarchical clustering (Divisive and Agglomerative)
6.5 Case studies with Python
7. Supervised (Classification) Machine Learning in Python
7.1 Linear regression
7.2 Logistic regression
7.3 Decision trees, rules and random forests
7.4 K-nearest neighbour algorithm
7.5 Support vector Machines
7.6 Naive Bayes
7.7 Linear discriminant analysis
7.8 Case studies with Python
8. Associative Rule Mining in Python
8.1 Overview of association rule mining
8.2 A Priori algorithm
8.3 Evaluation of candidate rules
8.4 Applications of association rules
8.5 Validation and testing
9. Deep learning implementation in Python
9.1 Concepts of deep learning
9.2 Introductions to biological neurons
9.3 Artificial neural network
9.4 Network topology
9.5 Convolutional neural networks and architecture
9.6 Activation functions
9.7 Recurrent neural networks
9.8 Case studies with Python

10. Natural language processing in Python
10.1 Sentiment analysis
10.2 Topic modelling
10.3 Text analytics
10.4 Social media analytics
10.5 Recommender systems
10.6 Case studies with Python
]
11. Big Data visualization in Python
11.1 Line plots, bar, pie and donut charts
11.2 Scatter plots and biplots
11.3 Word clouds
11.4 Kernel density estimation and Histogram plots
11.5 Box and whisker plots
11.6 Correlation matrix and heatmaps
11.7 Clustering visualization

12. The law and ethics in Big Data analytics
12.1 Principles of data processing
12.2 Professional ethics and legal frameworks for the data profession

(Visited 3 times, 1 visits today)