Connected successfully
Data Science - Life Cycle, Tools, Applications; Big Data- Types, Characteristics, Tools and Applications; Data Analytics- Types, Tools and Applications; Data and Relations: Data set - Data Scales - Set and Matrix Representations - Relations - Similarity Measures - Dissimilarity Measures - Sequence Relations – Sampling and Quantization Lab Practice: 1. To get the input from user and perform numerical operations (MAX, MIN, AVG, SUM, SQRT, ROUND) using in R or python. 2. To perform data import/export (.CSV, .XLS, .TXT) operations using data frames in R or python. 3. To perform statistical operations (Mean, Median, Mode and Standard deviation) using R or python. 4. To get the input matrix from user and perform Matrix addition, subtraction, multiplication, inverse transpose and division operations using vector concept in R or python
Data pre-processing: Error Types - Error Handling - Filtering - Data Transformation - Data Merging; Data visualization: Diagrams - Principal Component Analysis - Multidimensional Scaling - Sammon Mapping – Auto Associator - Histograms - Spectral Analysis. Lab Practice: 1. To perform data pre-processing operations i) Handling Missing data ii) Min-Max normalization using in R or python
Correlation: Linear Correlation - Correlation and Causality - Chi-Square Test for Independence; Regression: Linear Regression - Non-Linear Substitution - Robust Regression - Neural Networks - Radial Basis Function Networks - Cross Validation - Feature Selection; Forecasting: Finite State Machines - Recurrent Models - Autoregressive Models. Lab Practices 1. To perform dimensionality reduction operation using PCA for Houses Data Set 2. To perform Simple Linear Regression with R.
CLASSIFICATION: Classification Criteria - Naive Bayes‘ Classifier - Linear Discriminant Analysis - Support Vector Machine - Nearest Neighbor Classifier - Learning Vector Quantization - Decision Trees; Clustering: Cluster Partitions - Sequential - Prototype-Based - Fuzzy - Relational - Cluster Tendency Assessment - Cluster Validity - Self Organizing Maps. Lab Practice 1. To perform K-Means clustering operation and visualize for iris data set 2. Write R script to diagnose any disease using KNN classification and plot the results. 3. To perform market basket analysis using Association Rules (Apriori)
Lambda Architecture - Nosql Stores: Key-Value - Columnar - Document - Graph. Case Studies: Riak - Hbase - Mongodb - Neo4j. Mapreduce - Graph Processing - Event Processing - Hadoop - Giraph – Storm. Recommendation Systems - Time Series Analysis – Text Analysis. Lab Practice 1. Set up a pseudo-distributed, single-node Hadoop cluster backed by the Hadoop Distributed File System, running on Ubuntu Linux. After successful installation on one node, configuration of a multi-node Hadoop cluster (one master and multiple slaves). 2. MapReduce application for word counting on Hadoop cluster
Reference Book:
1 Dean J, ―Big Data, Data Mining and Machine learning, Wiley publications, 2014. 2 Provost F and Fawcett T, ―Data Science for Business, O‘Reilly Media Inc, 2013. 3 Janert PK, ―Data Analysis with Open Source Tools, O‘Reilly Media Inc, 2011. . 4 Weiss SM, Indurkhya N and Zhang T, ―Fundamentals of Predictive Text Mining, Springer-Verlag London Limited, 2010.
Text Book:
Runkler TA, ―Data Analytics: Models and algorithms for intelligent data analysis, Springer, 2012. Marz N and Warren J, ―Big Data, Manning Publications, 2015.