Subject Details
Dept     : AIML
Sem      : 6
Regul    : R2023
Faculty : Mr.A.Stephan Rufus
phone  : NIL
E-mail  : srufus.a.aiml@snsct.org
209
Page views
13
Files
2
Videos
5
R.Links

Icon
Syllabus

UNIT
1
INTRODUCTION

The NLP Landscape: History and Applications - Text Preprocessing: Tokenization - Stemming - Lemmatization - Stopword Removal - Regular Expressions for Text Pattern Matching - WordNet and Lexical Resources - Basic String Matching and Edit Distance Lab Practical: 1. Text Cleaning: Write a function to clean a raw text corpus (lowercasing - removing non-alphanumeric characters, extra whitespace) 2. Tokenization & Lemmatization: Use NLTK/spaCy to tokenize a paragraph and lemmatize the tokens, Compare results with stemming 3. Regex for Pattern Matching: Write regular expressions to extract email addresses phone numbers, and hashtags from a text document

UNIT
2
STATISTICAL NLP & WORD REPRESENTATIONS

Language Modeling: N-grams and Smoothing Techniques - Basic Text Classification: Naive Bayes - Logistic Regression - Vector Space Models: Bag-of-Words (BoW) - TF-IDF - Word Embeddings: Word2Vec (Skip-gram - CBOW) - GloVe - Evaluation Metrics: Precision - Recall - F1-Score Lab Practical: 1. N-gram Language Model: Build a bigram language model with add-one smoothing to generate random text 2. Text Classification with BoW: Implement a Naive Bayes classifier using Scikit-learn on a dataset SMS Spam with a BoW representation 3. Word2Vec Visualization: Train a Word2Vec model on a sample corpus using Gensim and visualize the embeddings using PCA/T-SNE

UNIT
3
SEQUENCE MODELING & NEURAL NLP

Neural Networks for NLP - Recurrent Neural Networks (RNNs) - LSTMs - and GRUs - Sequence-to-Sequence Models (Seq2Seq) and Attention Mechanism - Introduction to the Transformer Architecture: Self-Attention - Encoder-Decoder Structure - Transfer Learning in NLP Lab Practical: 1. Sentiment Analysis with LSTM: Build a binary sentiment analysis model using an LSTM network in PyTorch/TensorFlow 2. Attention Mechanism Implementation: Implement a basic attention mechanism from scratch for a sequence-to-sequence task 3. Text Generation with RNN: Train a character-level RNN to generate text in the style of a given author

UNIT
4
PRE-TRAINED LANGUAGE MODELS & GENERATIVE AI

Transformer-based Models: BERT - GPT - and T5 - Fine-Tuning Strategies for Downstream Tasks (Text Classification - NER - Q&A) - Prompt Engineering and In-Context Learning - Introduction to Text Generation - Hugging Face Ecosystem: Transformers Library - Datasets - Model Hub Lab Practical: 1. Fine-Tuning BERT for Sentiment: Use the Hugging Face transformers library to fine-tune a pre-trained BERT model on a custom dataset 2. Named Entity Recognition (NER): Fine-tune a pre-trained model DistilBERT for an NER task using the CoNLL-2003 dataset - 3. Prompt Engineering with GPT: Use the OpenAI API or an open-source GPT model to perform tasks like summarization and question-answering through prompt engineering

UNIT
5
ADVANCED APPLICATIONS & RESPONSIBLE AI

Advanced Tasks: Named Entity Recognition (NER) - Sentiment Analysis - Machine Translation - Text Summarization - Building RAG (Retrieval-Augmented Generation) Applications - Bias and Fairness in NLP Models - Explainability and Interpretability for NLP - Current Trends and Ethics Lab Practical: 1. Build a RAG System: Create a simple Retrieval-Augmented Generation system using LangChain - a vector database Chroma - and an open-source LLM 2. Bias Detection in Models: Use the fairness toolkit to audit a pre-trained sentiment analysis model for bias across different demographics 3. Model Interpretability: Use LIME or SHAP to explain the predictions of a text classification model

Reference Book:

1. Eisenstein, J. Introduction to Natural Language Processing. MIT Press, 2019 2. Hugging Face Team. Hugging Face NLP Course. 2023 3. Goldberg, Y. Neural Network Methods for Natural Language Processing. Morgan & Claypool, 2017 4. Lewis, P. Building Advanced RAG Applications. Manning, 2024 5. Bender, E.M. Linguistic Fundamentals for Natural Language Processing II: 100 Essentials from Semantics and Pragmatics. Morgan & Claypool, 2020

Text Book:

Jurafsky, D. & Martin, J.H. Speech and Language Processing (3rd ed.). Pearson, 2024. 2. Tunstall, L., von Werra, L., & Wolf, T. Natural Language Processing with Transformers. O'Reilly, 2022