Introduction to NLP and Text Mining

In this course you will be introduced to the essential techniques of natural language processing (NLP) and text mining with Python.

$729 | Enroll now Alert me to upcoming courses Group rates

Overview

In this course you will be introduced to the essential techniques of natural language processing (NLP) and text mining with Python. The course will discuss how to apply unsupervised and supervised modeling techniques to text, and devote considerable attention to data preparation and data handling methods required to transform unstructured text into a form in which it can be mined.

Intermediate, Advanced
4 Weeks
Expert Instructor
Tuiton-Back Guarantee
100% Online
TA Support

Learning Outcomes

This course focuses on learning key concepts, tools and methodologies for natural language processing with an emphasis on hands-on learning through guided tutorials and real-world examples. You will learn how to:

Process text data and strings, and perform pattern matching with regular expressions in Python
Preprocess and wrangle noisy text data via stemming, lemmatization, tokenization, removal of stop-words and more
Represent text data in structured and easy-to-consume formats for machine learning and text mining
Represent text documents using features related to text word frequency, parts of speech and sentiment
Represent text documents using vectorized features like bag-of-words, TF-IDF, and document similarity

Use the concepts of information retrieval and document similarity (e.g. in applications like recommender systems)
Perform unsupervised NLP using techniques like keyphrase extraction, topic modeling and text summarization
Leverage pre-trained models for part-of-speech (POS) tagging and named entity recognition (NER)
Develop supervised models to classify documents

Who Should Take This Course

Data scientists and aspiring data scientists who want to analyze text data and build models that use text data.

Our Instructors

Mr. Dipanjan Sarkar

Dipanjan (DJ) Sarkar is a Data Science Lead, published author and has been recognized as a Google Developer Expert in Machine Learning by Google in 2019. He has also been recognized as one of the Top Ten Data Scientists in India, 2020 by a few leading technology magazines and publishing houses. Dipanjan has led advanced analytics initiatives working with several Fortune 500 companies like Applied Materials, Intel and Open Source organizations like Red Hat (now IBM). He primarily works on leveraging data science, machine learning and deep learning to build large- scale intelligent systems.

He holds a master of technology degree from IIIT Bangalore, with specializations in data science and software engineering and completed his post graduate diploma in machine learning and artificial intelligence from Columbia University in the City of New York.

Dipanjan has been an analytics practitioner and consultant for several years now, specializing in machine learning, natural language processing, computer vision and deep learning. Having a passion for data science and education, he also acts as an AI Advisor, Subject Matter Expert and Instructor at various organizations like Springboard, Propulsion Academy and Statistics.com (The Institute for Statistics Education) where he helps people build their skills on areas in data science and artificial intelligence. Dipanjan also beta-tests new courses on data science for popular MOOC platform, Coursera, before they are released. He is a published author, having authored several books on R, Python, Machine Learning, Natural Language Processing, and Deep Learning which includes Text Analytics with Python 2nd ed.

Course Syllabus

Week 1

Introduction and Text Data Preparation

Introduction to NLP & NLP applications
Python for NLP
NLP basics – Parsing Text and Exploring Text Corpora
Tokenization and POS Tags
Shallow Parsing
Constituency Parsing
Corpus Analysis
WordNet & Synsets
Working with Text and Regular Expressions

Week 2

Feature Engineering and Representation

Introduction to text pre-processing and wrangling
Text pre-processing and wrangling – methodologies
Build your own text pre-processor
Non-vectorized text feature engineering
Vectorized representations of text features
Keyphrase Extraction – Concepts and Methodologies

Week 3

Unsupervised Natural Language Processing

Introduction to text pre-processing and wrangling
Text pre-processing and wrangling – methodologies
Build your own text pre-processor
Non-vectorized text feature engineering
Vectorized representations of text features
Keyphrase Extraction – Concepts and Methologies

Week 4

Information Extraction

Introduction to Supervised natural language processing
Text Classification – concepts and methodologies
Machine Learning for Text Classification
Sequential Tagging Models
Parts of Speech Tagging
Named Entity Recognition

Class Dates

2025

01/10/2025 to 02/07/2025
Instructors: Mr. Dipanjan Sarkar

05/09/2025 to 06/06/2025
Instructors: Mr. Dipanjan Sarkar

09/12/2025 to 10/10/2025
Instructors: Mr. Dipanjan Sarkar

Prerequisites

Predictive Analytics 1 – Machine Learning Tools

This online course introduces the basic paradigm of predictive modeling: classification and prediction.

Skill: Intermediate, Advanced
Credit Options: ACE, CAP, CEU

The Statistics.com courses have helped me a lot, pushing me to the limit and making me learn much more than I expected I could. The knowledge I gained I could immediately leverage in my job … then eventually led to landing a job in my dream company – Amazon.

Karolis Urbonas

This program has been a life and work game changer for me. Within 2 weeks of taking this class, I was able to produce far more than I ever had before.

Susan Kamp

The material covered in the Analytics for Data Science Certificate will be indispensable in my work. I can’t wait to take other courses. Great work!

Stephen McAllister

I learned more in the past 6 weeks than I did taking a full semester of statistics in college, and 10 weeks of statistics in graduate school. Seriously.

Amir Aminimanizani

This is the best online course I have ever taken. Very well prepared. Covers a lot of real-life problems. Good job, thank you very much!

Elena Rose

The more courses I take at Statistics.com, the more appreciation I have for the smart approach, quality of instructors, assistants, admin and program. Well done!

Leonardo Nagata

This course greatly benefited me because I am interested in working in AI. It has given me solid foundational knowledge…After completing this last course, I feel I have gained valuable skills that will enhance my employability in Data Science, opening up diverse career opportunities.

Richard Jackson

Frequently Asked Questions

What is your satisfaction guarantee and how does it work?

We offer a “Student Satisfaction Guarantee” that includes a tuition-back guarantee, so go ahead and take our courses risk free. That’s our commitment to student satisfaction. Students may cancel, transfer, or withdraw from a course under certain conditions. If you’re not satisfied with a course, you may withdraw from the course and receive a tuition refund.

Please see our knowledge center for more information.
Can I transfer or withdraw from a course?
We have a flexible transfer and withdrawal policy that recognizes circumstances may arise to prevent you from taking a course as planned. You may transfer or withdraw from a course under certain conditions.
- Students are entitled to a full refund if a course they are registered for is canceled.
- You can transfer your tuition to another course at any time prior to the course start date or the drop date, however a transfer is not permitted after the drop date.
- Withdrawals on or after the first day of class are entitled to a percentage refund of tuition.
Please see this page for more information.
Who are the instructors at Statistics.com?
Statistics.com has more than 60 instructors who are recruited based on their expertise in various areas in statistics. Our faculty members are:
- Authors of well-regarded texts in their area;
- Advisory board members;
- Senior faculty; and
- Educators who have made important contributions to the field of statistics or online education in statistics.
The majority of our instructors have more than five years of teaching experience online at the Institute.

Please visit our faculty page for more information on each instructor at Statistics.com.

Please see our knowledge center for more information.

Visit our knowledge base and learn more.

FAQs + Knowledge Base

Register For This Course

Introduction to NLP and Text Mining

$729 | Enroll Now

Get Notified

Additional Information

Homework

Homework in this course consists of short answer questions to test concepts and guided data analysis problems using software.

In addition to assigned readings, this course also has a get started guide, and supplemental readings available online.

Course Text

The text used for the practical work in this course is Text Analytics with Python (Apress, 2019) by Dipanjan Sarkar, chosen for its wealth of hands on Python illustrations and code. The code for these illustrations is organized here:

https://github.com/dipanjanS/text-analytics-with-python/tree/master/New-Second-Edition

Note: this text is also used in the follow on course, NLP and Deep Learning.

For a well-written guide to foundational concepts and context, you may wish to consider Fundamentals of Predictive Text Mining (Springer, 2015) by Weiss, Indurkhya and Zhang.

Software

This course provides problems and illustrations in Python, and assumes some familiarity with that language.

Course Fee & Information

Enrollment
Courses may fill up at any time and registrations are processed in the order in which they are received. Your registration will be confirmed for the first available course date unless you specify otherwise.

Transfers and Withdrawals
We have flexible policies to transfer to another course or withdraw if necessary.

Group Rates
Contact us to get information on group rates.

Discounts
Academic affiliation? In most courses you are eligible for a discount at checkout.

New to Statistics.com? Click here for a special introductory discount code.

Invoice or Purchase Order
Add $50 service fee if you require a prior invoice, or if you need to submit a purchase order or voucher, pay by wire transfer or EFT, or refund and reprocess a prior payment.

Supplemental Information

At Statistics.com, we aim to provide a learning environment suitable for everyone. To help you get the most out of your learning experience, we have researched and tested several assistance tools. For students with dyslexia, colorblindness, or reading difficulties, we recommend the following web browser add-ons and extensions:

Chrome

Color Enhancer (for colorblindness)
HelperBird (for colorblindness, dyslexia, and reading difficulties)

Firefox

Mobile Dyslexic
Color Vision Simulation (native accessibility feature)
Other native accessibility features instructions

Safari

Navidys (for colorblindness, dyslexia, and reading difficulties)
HelperBird for Safari (for colorblindness, dyslexia, and reading difficulties)

Register For This Course

Introduction to NLP and Text Mining

Get Notified

Introduction to NLP and Text Mining

Overview

Learning Outcomes

Who Should Take This Course

Our Instructors

Course Syllabus

Week 1

Week 2

Week 3

Week 4

Class Dates

2025

Prerequisites

Predictive Analytics 1 – Machine Learning Tools

Karolis Urbonas

Susan Kamp

Stephen McAllister

Amir Aminimanizani

Elena Rose

Leonardo Nagata

Richard Jackson

Frequently Asked Questions

What is your satisfaction guarantee and how does it work?

Can I transfer or withdraw from a course?

Who are the instructors at Statistics.com?

Register For This Course

Related Courses

NLP and Deep Learning

Additional Information

Homework

Course Text

Software

Course Fee & Information

Supplemental Information

Literacy, Accessibility, and Dyslexia

Register For This Course