CS 4786/5786 — Machine Learning for Data Science

General Information

An introductory course in machine learning, with a focus on data modeling and related methods and learning algorithms for data sciences.


  • Probability Theory (BTRY 3080, ECON 3130, MATH 4710, or strong performance in ENGRD 2700 or equivalent)
  • Linear Algebra (MATH 2940 or equivalent)
  • CS2110 or equivalent programming proficiency

Topics Covered

CS 4786 focuses on Unsupervised Learning. Topics include:

  • Dimensionality Reduction (PCA, CCA, Random Projection)
  • Clustering (K-Means, Single-Link, Spectral)
  • Probabilistic Modeling (Mixture Models, Hidden Markov Models, EM Algorithms)


6 assignments, 2 Kaggle projects, 0 prelims, 0 finals

If you have a background in ML or strong knowledge of probability, the assignments may be light, but otherwise start early just in case. Projects can be a heavy workload, but at least you have a team.

General Advice

  • Go to office hours for assignments
  • Start early on everything, especially the projects!
  • If Kaggle’s format hasn’t changed, you get a number of submissions per day. Use them!
  • Even if you do great on Kaggle, don’t skimp on the writing quality of your report.
  • Even though there aren’t exams, try to review the lectures so you can get the most out of the class!
  • Python has a lot of awesome libraries for ML like numpy and scikit-learn.


While the class was hard for me, the format of the class was enjoyable. Even though at one point I spent two days straight locked in my room trying to implement a working Hidden Markov Model, I learned a lot about machine learning the whole time.

Past Offerings

Semester Time Professor Median Grade Enrollment Course Page
Fall 2017 TR 11:40 AM - 12:55 PM Karthik Sridharan - - http://www.cs.cornell.edu/courses/cs4786/2017fa/
Fall 2016 TR 11:40 AM - 12:55 PM Karthik Sridharan A 92 http://www.cs.cornell.edu/courses/cs4786/2016fa/

Edit this page on Github: classes/CS4786.md

Edit me on GitHub