Applied Data Science

Explore the theory, languages and concepts of data science while acquiring the Python programming knowledge you need to solve real-world data challenges. *The course requires an undergraduate knowledge of statistics, linear algebra, and probability.

Get Your Brochure

Course Dates



Course Duration


5 months, online
6-8 hours per week

Course Duration

Why enroll for the Applied Data Science course?

Data has been called the new global currency, and its meteoric rise is transforming entire industries—and driving the demand for practitioners who can wield its power. From health care and finance to entertainment, cybersecurity and beyond, the need for data scientists continues to grow in tandem opportunities for career advancement within the field.

To help fill this talent gap and further the use of data science to solve real-world problems, Columbia Engineering Executive Education has partnered with Emeritus to create the Applied Data Science course.

Since this course requires an intermediate knowledge of Python, you will spend the first part of this course learning Python for Data Analytics taught by Emeritus. This will provide you with the programming knowledge required to do the assignments and application projects that are part of the Applied Data Science course. No prior programming knowledge is required.

The course requires an undergraduate knowledge of statistics (descriptive statistics, regression, sampling distributions, hypothesis testing, interval estimation etc.), linear algebra, and probability.


Postings for data scientist jobs jumped by 65% from Jan 2015 to Jan 2018



Expected wage growth for data scientists vs. <2% average wage increase across all occupations

SOURCE: U.S. Bureau of Labor Statistics


Increase in data science jobs by 2020


Course Highlights

200+ Faculty Video Lectures

50 Quizzes / Assignments

24 Moderated Discussion Boards

20 Q&A Sessions with Course Leaders

12 Assignments

Includes Live Online Teaching

Key Takeaways

Explore the theory, languages and concepts of this in-demand field while acquiring the Python programming knowledge you need to solve real-world data challenges. At the end of the course, you will be able to:

  • Utilize Python programming language to code your own algorithms or analytical models to examine data.
  • Manage and manipulate a large amount of data using Python packages.
  • Reveal hidden yet important characteristics of any dataset using visual tools already built into Python.
  • Formulate explanations for past events and determine if it is true using available data.
  • Discover hidden trends and draw useful insights using software packages designed for use with Python, like NumPy and Pandas.
  • Unearth quality information for analysis of text-data like Facebook or Twitter post and comments.
  • Classify data points in a larger dataset; for example, assigning genres to one billion songs.
  • Identify relationships between data points to form groups in a larger data set; for example, grouping customers into segments using their previously buying patterns.


  • Module 1:

    Introduction to Data Science

    Module 2:

    Working with Data Types and Operators in Python

    Module 3:

    Writing Functions in Python

    Module 4:

    Popular Data Science Packages in Python

    Module 5:

    Advanced Functions

    Module 6:

    Data Manipulation and Analysis with Pandas

    Module 7:

    Data Visualization with Matplotlib

    Module 8:

    Random Variables and Statistical Inferences

    Module 9:

    Statistical Distributions and Hypothesis Testing
  • Module 1: Data Analysis & Visualization – Part 1

    Organizing and Analyzing Data with NumPy and Pandas

    Module 2: Data Analysis & Visualization – Part 2

    Cleaning and Visualizing Data with Pandas and Matplotlib

    Module 3: Statistical Distributions

    Understand the shape of data

    Module 4: Statistical Sampling

    What to do when you don’t have or need all the data

    Module 5: Hypothesis Testing

    How to answer common questions about your data

    Module 6: Regression Models in Python

    Introduction to modeling and interpretation

    Module 7: Evaluating Data Models

    Determine and evaluate the right model for your data

    Module 8: Classification with K-nearest Neighbors

    Determine and evaluate the right model for your data

    Module 9: Decision Tree Models

    Deeper dive into classification methods

    Module 10: Clustering Models

    Machine learning methods for representation

    Module 11: Text Mining in Python – Part 1

    Analyzing Sentiment

    Module 12: Text Mining in Python – Part 2

    Topic Modeling
Download Brochure


Data Wrangling using CNC Mill Tool Wear Data

  • Practice using Python’s data framework to process and manipulate data with the CNC Mill Tool Wear dataset.
  • Hone your data wrangling and munging skills using Python’s pandas and NumPy libraries with the CNC Mill Tool Wear dataset.

Hypothesis Testing using Cancer Atlas Data

  • Statistically test the impact of health factors in relation to cancer rates from around the globe.

Data Exploration using Lending Club Loan Data

  • Use Python’s NumPy library to explore and uncover insights in Lending Club’s loan data.
  • Using Python’s powerful Pandas library to wrangle and munch Lending Club’s loan data.

Natural Language Processing (NLP) implementation using Amazon product reviews

  • Implement Natural Language Processing (NLP) techniques to automate the understanding of product reviews from Amazon.

Note: All product and company names are trademarks™ or registered® trademarks of their respective holders. Use of them does not imply any affiliation with or endorsement by them.


Vineet Goyal

Associate Professor, Industrial Engineering And Operations Research

Professor Vineet Goyal has a Bachelor's degree in Computer Science from Indian Institute of Technology, Delhi and a Ph.D. from Carnegie Mellon University... More info

Costis Maglaras

Dean of Columbia Business School, David and Lyn Silfen Professor of Business

Costis Maglaras is the David and Lyn Silfen Professor of Business at Columbia University. His research lies on the interface of stochastic modeling with operations management... More info

Hardeep Johar

Senior Lecturer Of Industrial Engineering And Operations Research

Hardeep Johar received an M.A. in Economics from the Birla Institute of Technology and Science and is a Fellow of the Indian Institute of Management Calcutta... More info

Participants Speak

“What I liked about the course was the clarity in concepts and good use of examples. The course leaders, support team and professors gave a superb explanation and provided overall support during webinars, office hours and on mail.”

— Shahid Mohsin Tanwar, Manager, Mahindra and Mahindra Ltd

“The highlight of this course was the use of grading platform - the ability to create cells and try out ideas and get immediate feedback was a huge help.”

— Steve Greig, Founder, New Machine Factory

“The best part of the course was the practical examples, and detailed explanations, supported by documented material which was helpful.”

— Alejandro Suarez, CIO, Sigma Alimentos Corporativo, S.A. De C.V.



Upon successful completion of the course, participants will receive a verified digital certificate from Emeritus in collaboration with Columbia Engineering Executive Education.

Download Brochure

Apply Now

Early registrations are encouraged. Seats fill up quickly!