Skip to content
Introduction to NLP and Text Mining

Introduction to NLP and Text Mining

In this course you will be introduced to the essential techniques of natural language processing (NLP) and text mining with Python.

Overview

In this course you will be introduced to the essential techniques of natural language processing (NLP) and text mining with Python. The course will discuss how to apply unsupervised and supervised modeling techniques to text, and devote considerable attention to data preparation and data handling methods required to transform unstructured text into a form in which it can be mined.

  • Intermediate, Advanced
  • 4 Weeks
  • Expert Instructor
  • Tuiton-Back Guarantee
  • 100% Online
  • TA Support

Learning Outcomes

This course focuses on learning key concepts, tools and methodologies for natural language processing with an emphasis on hands-on learning through guided tutorials and real-world examples.  You will learn how to:

  • Process text data and strings, and perform pattern matching with regular expressions in Python
  • Preprocess and wrangle noisy text data via stemming, lemmatization, tokenization, removal of stop-words and more
  • Represent text data in structured and easy-to-consume formats for machine learning and text mining
  • Represent text documents using features related to text word frequency, parts of speech and sentiment
  • Represent text documents using vectorized features like bag-of-words, TF-IDF, and document similarity
  • Use the concepts of information retrieval and document similarity (e.g. in applications like recommender systems)
  • Perform unsupervised NLP using techniques like keyphrase extraction, topic modeling and text summarization
  • Leverage pre-trained models for part-of-speech (POS) tagging and named entity recognition (NER)
  • Develop supervised models to classify documents

Who Should Take This Course

Data scientists and aspiring data scientists who want to analyze text data and build models that use text data.

Our Instructors

Mr. Dipanjan  Sarkar

Mr. Dipanjan Sarkar

Course Developer

This course was developed in partnership with: Dipanjan (DJ) Sarkar is a Data Science Lead, published author and has been recognized as a Google Developer Expert in Machine Learning by Google in 2019. He has also been recognized as one of the Top Ten Data Scientists in India, 2020 by a few leading technology magazines and publishing houses. Dipanjan has led advanced analytics initiatives working with several Fortune 500 companies like Applied Materials, Intel and Open Source organizations like Red Hat (now IBM). He primarily works on leveraging data science, machine learning and deep learning to build large- scale intelligent systems.

Kuber Deokar - UpThink

Kuber Deokar – UpThink

Teaching Assistant

Individual sections of this course are taught by teaching assistants. This team is managed by:
Mr. Kuber Deokar is Data Science Lead at UpThink EduTech Services Pvt. Ltd. (Pune, India). He holds a masters degree in Statistics from the University of Pune, India, where he also taught undergraduate statistics. He is a co-author of Machine Learning for Business Analytics with Galit Shmueli, Peter Bruce and Nitin Patel.

Course Syllabus

Week 1

Introduction and Text Data Preparation

  • Introduction to NLP & NLP applications
  • Python for NLP
  • NLP basics – Parsing Text and Exploring Text Corpora
  • Tokenization and POS Tags
  • Shallow Parsing
  • Constituency Parsing
  • Corpus Analysis
  • WordNet & Synsets
  • Working with Text and Regular Expressions

Week 2

Feature Engineering and Representation

  • Introduction to text pre-processing and wrangling
  • Text pre-processing and wrangling – methodologies
  • Build your own text pre-processor
  • Non-vectorized text feature engineering
  • Vectorized representations of text features
  • Keyphrase Extraction – Concepts and Methodologies

Week 3

Unsupervised Natural Language Processing

  • Introduction to text pre-processing and wrangling
  • Text pre-processing and wrangling – methodologies
  • Build your own text pre-processor
  • Non-vectorized text feature engineering
  • Vectorized representations of text features
  • Keyphrase Extraction – Concepts and Methologies

Week 4

Information Extraction

  • Introduction to Supervised natural language processing
  • Text Classification – concepts and methodologies
  • Machine Learning for Text Classification
  • Sequential Tagging Models
  • Parts of Speech Tagging
  • Named Entity Recognition

Class Dates

2026

01/09/2026 to 02/06/2026
Instructors: Mr. Kuber Deokar – UpThink
05/08/2026 to 06/05/2026
Instructors: Mr. Kuber Deokar – UpThink
09/04/2026 to 10/02/2026
Instructors: Mr. Kuber Deokar – UpThink

Prerequisites

Predictive Analytics 1 – Machine Learning Tools

This online course introduces the basic paradigm of predictive modeling: classification and prediction.
  • Skill: Intermediate, Advanced
  • Credit Options: ACE, CAP, CEU
Karolis Urbonas
Susan Kamp
Stephen McAllister
Amir Aminimanizani
Elena Rose
Leonardo Nagata
Richard Jackson

Frequently Asked Questions

  • What is your satisfaction guarantee and how does it work?

  • Can I transfer or withdraw from a course?

  • Who are the instructors at Statistics.com?

Visit our knowledge base and learn more.

Register For This Course

Introduction to NLP and Text Mining

Additional Information

Homework

Homework in this course consists of short answer questions to test concepts and guided data analysis problems using software.

In addition to assigned readings, this course also has a get started guide, and supplemental readings available online.

Course Text

The text used for the practical work in this course is Text Analytics with Python (Apress, 2019) by Dipanjan Sarkar, chosen for its wealth of hands on Python illustrations and code.  The code for these illustrations is organized here:

https://github.com/dipanjanS/text-analytics-with-python/tree/master/New-Second-Edition

Note: this text is also used in the follow on course, NLP and Deep Learning.

For a well-written guide to foundational concepts and context, you may wish to consider Fundamentals of Predictive Text Mining (Springer, 2015) by Weiss, Indurkhya and Zhang.

Software

This course provides problems and illustrations in Python, and assumes some familiarity with that language.

Course Fee & Information

Supplemental Information

Literacy, Accessibility, and Dyslexia

At Statistics.com, we aim to provide a learning environment suitable for everyone. To help you get the most out of your learning experience, we have researched and tested several assistance tools. For students with dyslexia, colorblindness, or reading difficulties, we recommend the following web browser add-ons and extensions:

 

Chrome

 

Firefox

 

Safari

  • Navidys (for colorblindness, dyslexia, and reading difficulties)
  • HelperBird for Safari (for colorblindness, dyslexia, and reading difficulties)

Register For This Course

Introduction to NLP and Text Mining