Subject Details
Dept     : AIML
Sem      : 5
Regul    : R2019
Faculty : Mr.A.Stephan Rufus
phone  : NIL
E-mail  : srufus.a.aiml@snsct.org
591
Page views
14
Files
5
Videos
2
R.Links

Icon
Syllabus

UNIT
1
INTRODUCTION

Data Science - Life Cycle, Tools, Applications; Big Data- Types, Characteristics, Tools and Applications; Data Analytics- Types, Tools and Applications; Data and Relations: Data set - Data Scales - Set and Matrix Representations - Relations - Similarity Measures - Dissimilarity Measures - Sequence Relations – Sampling and Quantization Lab Practice: 1. To get the input from user and perform numerical operations (MAX, MIN, AVG, SUM, SQRT, ROUND) using in R or python. 2. To perform data import/export (.CSV, .XLS, .TXT) operations using data frames in R or python. 3. To perform statistical operations (Mean, Median, Mode and Standard deviation) using R or python. 4. To get the input matrix from user and perform Matrix addition, subtraction, multiplication, inverse transpose and division operations using vector concept in R or python

UNIT
2
PREPROCESSING AND VISUALIZATION

Data pre-processing: Error Types - Error Handling - Filtering - Data Transformation - Data Merging; Data visualization: Diagrams - Principal Component Analysis - Multidimensional Scaling - Sammon Mapping – Auto Associator - Histograms - Spectral Analysis. Lab Practice: 1. To perform data pre-processing operations i) Handling Missing data ii) Min-Max normalization using in R or python

UNIT
3
CORRELATION, REGRESSION AND FORECASTING

Correlation: Linear Correlation - Correlation and Causality - Chi-Square Test for Independence; Regression: Linear Regression - Non-Linear Substitution - Robust Regression - Neural Networks - Radial Basis Function Networks - Cross Validation - Feature Selection; Forecasting: Finite State Machines - Recurrent Models - Autoregressive Models. Lab Practices 1. To perform dimensionality reduction operation using PCA for Houses Data Set 2. To perform Simple Linear Regression with R.

UNIT
4
CLASSIFICATION AND CLUSTERING

CLASSIFICATION: Classification Criteria - Naive Bayes‘ Classifier - Linear Discriminant Analysis - Support Vector Machine - Nearest Neighbor Classifier - Learning Vector Quantization - Decision Trees; Clustering: Cluster Partitions - Sequential - Prototype-Based - Fuzzy - Relational - Cluster Tendency Assessment - Cluster Validity - Self Organizing Maps. Lab Practice 1. To perform K-Means clustering operation and visualize for iris data set 2. Write R script to diagnose any disease using KNN classification and plot the results. 3. To perform market basket analysis using Association Rules (Apriori)

UNIT
5
SYSTEM ARCHITECTURE AND APPLICATIONS

Lambda Architecture - Nosql Stores: Key-Value - Columnar - Document - Graph. Case Studies: Riak - Hbase - Mongodb - Neo4j. Mapreduce - Graph Processing - Event Processing - Hadoop - Giraph – Storm. Recommendation Systems - Time Series Analysis – Text Analysis. Lab Practice 1. Set up a pseudo-distributed, single-node Hadoop cluster backed by the Hadoop Distributed File System, running on Ubuntu Linux. After successful installation on one node, configuration of a multi-node Hadoop cluster (one master and multiple slaves). 2. MapReduce application for word counting on Hadoop cluster

Reference Book:

1 Dean J, ―Big Data, Data Mining and Machine learning, Wiley publications, 2014. 2 Provost F and Fawcett T, ―Data Science for Business, O‘Reilly Media Inc, 2013. 3 Janert PK, ―Data Analysis with Open Source Tools, O‘Reilly Media Inc, 2011. . 4 Weiss SM, Indurkhya N and Zhang T, ―Fundamentals of Predictive Text Mining, Springer-Verlag London Limited, 2010.

Text Book:

Runkler TA, ―Data Analytics: Models and algorithms for intelligent data analysis, Springer, 2012. Marz N and Warren J, ―Big Data, Manning Publications, 2015.