CS 4786/5786 - Machine Learning for Data Science

General Information

An introductory course in machine learning, with a focus on data modeling and related methods and learning algorithms for data sciences.

Prerequisites

Probability Theory (BTRY 3080, ECON 3130, MATH 4710, or strong performance in ENGRD 2700 or equivalent)
Linear Algebra (MATH 2940 or equivalent)
CS2110 or equivalent programming proficiency

Topics Covered

CS 4786 focuses on Unsupervised Learning. Topics include:

Dimensionality Reduction (PCA, CCA, Random Projection)
Clustering (K-Means, Single-Link, Spectral)
Probabilistic Modeling (Mixture Models, Hidden Markov Models, EM Algorithms)

Workload

6 assignments, 2 Kaggle projects, 0 prelims, 0 finals

If you have a background in ML or strong knowledge of probability, the assignments may be light, but otherwise start early just in case. Projects can be a heavy workload, but at least you have a team.

General Advice

Go to office hours for assignments
Start early on everything, especially the projects!
If Kaggle’s format hasn’t changed, you get a number of submissions per day. Use them!
Even if you do great on Kaggle, don’t skimp on the writing quality of your report.
Even though there aren’t exams, try to review the lectures so you can get the most out of the class!
Python has a lot of awesome libraries for ML like numpy and scikit-learn.

Testimonials

While the class was hard for me, the format of the class was enjoyable. Even though at one point I spent two days straight locked in my room trying to implement a working Hidden Markov Model, I learned a lot about machine learning the whole time.

Past Offerings

Semester	Time	Professor	Median Grade	Enrollment	Course Page
Fall 2017	TR 11:40 AM - 12:55 PM	Karthik Sridharan	-	-	http://www.cs.cornell.edu/courses/cs4786/2017fa/
Fall 2016	TR 11:40 AM - 12:55 PM	Karthik Sridharan	A	92	http://www.cs.cornell.edu/courses/cs4786/2016fa/