General Information
This class studies the mathematical theory behind modern-day big data applications.
Talks an extensive amount about the mathematics behind machine learning. The book (which can be found here) is used at Princeton and CMU for graduate level courses; this course is taught at a lower level.
Prerequisites
In general, the more math classes that you have previously taken, the better prepared you will be for this class. If you are not very comfortable with statistics, this class will be challenging. Having completed a class such as ORIE 3500 (Engineering stats II) or ECE 3100 is very helpful. The math topics that you really should know cold are:
- CS 2800
- Linear algebra
- Basic probability (EngrD 2700 is fine)
-
Big-Oh notation Topics that will come up include:
- Multivariable Calculus
- Generating functions
- Difference equations (Math 2930)
Topics Covered
- High-dimensional space
- Random graphs (G(n,p) model) and threshold probabilities
- Branching processes
- Singular value decomposition
- Random Walks and Markov chains
- Machine Learning (Linear SVMs / Batch Perceptron, Ensemble Learning, Clustering)
- Learning Theory
- Randomized Algorithms (Time permitting, most likely not)
Workload
3 problem sets a week consisting of 2 to 3 problems each.
In Spring 2014, there was one problem set each week consisting of ~6 problems from the textbook. The difficulty of the problems varies greatly, but the overall difficulty of each problem set is approximately equal. There were two in class midterms and a final.
General Advice
The grading scheme is “holistic” meaning there is no set breakdown of how much each assignment is weighted, which means it is hard to predict what grade you will recieve before the course is done. Exams are often difficult.
The grading scheme is completely up in the air, and he may as well pick grades out of a hat. The exams can cover material never gone over in class. However, usually the exams do not have arduous or time-consuming problems.
If you put an honest effort and do the homework assignments, attend lecture, and go to office hours if you need help understanding the material, you’ll do fine. Most people stopped attending lecture and stopped doing homework because Hopcroft had no clear grading rubric. Don’t fall into this trap. Theres a lot of interesting material in this course. Hopcroft is not bad as a lecturer, but he does make some assumptions on how much background in math/stats students have. Its VERY easy to get behind in this class. It would helpful to read bits of the textbook before each lecture.
Hopcroft sometimes expects a level of mathematical maturity the typical undergraduate does not yet have. He does welcome all questions during lecture, even about basic pre-req material.
Clear grading rubrics are a reasonable thing to expect from a professor and Hopcroft is remiss in his duties in not providing one.
If you accept an organized class, where you have some idea of what your grade will be before you get it, and what standard you will be graded on, then don’t take Hopcroft. He is of the philosophical belief that people should not know what standards they are being held to (although he wouldn’t phrase it like that), or get feedback on whether they’re successfully completing their work. He also rejects answers that are correct on tests because they weren’t the answer he was looking for, etc. There were also several homework problems that were actually impossible, or where the answer Hopcroft had in mind was actually wrong.
The basic problem is, while Hopcroft is extremely smart, he has a disdain for things like concrete grading standards and similar, recognizing them as somewhat artificial and tangential to learning but not valuing their practical merit in making the class easy to reckon with. It’s easy to blame people for not doing homework, but students have busy lives they need to prioritize, and when you get literally no value for each homework assignment (no feedback, no guarantee that it will even register that you did it, and no indication of whether you got it right or wrong) but still have to work very hard and long to arrive at solutions (which you are going to be uncertain of because of the nature of the course), YES, people will stop doing it. Hopcroft could have easily prevented this. … I say this as someone who was more than adequately prepared for the material. Professors have duties, too.
The only problem with the course is that while Hopcroft is extremely smart and passionate about teaching, he does not implement concrete course structure and grading standards. This makes it easy for students to get distracted, because 4850 never seems like the most pressing class.
Testimonials
Please take the “advice” on this page with a grain of salt. While the reviews were generally negative in Spring 2013 (the first time it was offered), I found the experience in Spring 2014 to be quite good. Part of the problem in 2013 was that no one had taken the course before the TA’s were generally unprepared compared to other classes in CS where most of the TA’s have taken the class before. In addition, Hopcroft himself was still learning the best way to teach the course and there was no precedent for what would work. An obvious example is the homework policy. In 2013, they were due every class (3 times a week) and mid-way through the semester, he stopped collecting them (though he still assigned them; also, no assignment was ever graded or given comments on). Both of these policies were very unpopular and they were changed in 2014 to 1 hw a week, each of which was graded by hand and returned the following week. I thought grading on both homeworks and exams was done quite fairly. I had quite a positive experience so I would suggest you try it out if the subject interests you.
Nobody seems to know what’s going on. <--- Spring 2013
Requires quite a bit of mathematical intuition to solve the problems, and by quite a bit, I mean A LOT.
The topics were very interesting!
Took it in the spring of 2018 and don’t let the earlier reviews turn you off the class! Class is very chill and my friends and I had a good time. Hopcroft is a great lecturer and the material is interesting. If you attend all the lectures and do the homeworks, then you can expect the homeworks each week to take around 6 hours each and you only really have to study for an hour or two before the exams as they are quite easy.
Past Offerings
Semester | Time | Professor | Median Grade | Course Page | |
---|---|---|---|---|---|
Spring 2013 | MWF 1:25-2:15 | John Hopcroft | B | 110 | http://www.cs.cornell.edu/courses/CS4850/2013sp/ |
Spring 2018 | MWF 1:25-2:15 | John Hopcroft | B+ | 110 | http://www.cs.cornell.edu/courses/cs4850/2018sp/ |
A note
Although this may only apply to a small number of students, it’s probably worth mentioning:
If you’ve taken CS 4780, ORIE 3500, ORIE 3510, and a have a strong intuition of Linear Algebra you’ll know the majority (> 90%) of the material covered in the class and most likely find the class too easy (as this is an undergraduate version).
Specifically, if you’ve groked certain supervised learning approaches (finding maximum-margin hyperplanes via hard-margin linear svms and batch perceptron), ensemble learning, unsupervised learning approaches (clustering, dimension reduction (SVD/Spectral decomp/various matrix factorizations)), “curse of dimensionality”, and learning theory (VC Dimension) you’ll be more than fine.
The homework problems at the 4850 level test basic understanding of the concepts covered in class, but there are a few other unassigned problems that cover more interesting research problems. The exams are extremely straightforward; if you’ve gone to class consistently and understand the deeper ideas, the prelims should take no more than 20-30 minutes as a lot of them are one line answers.
’’’'’If you’ve been involved within the math competition space in high school (usamo/arml/pumac/etc.) or regularly attend the Putnam practices on campus, you’ll have developed the mathematical intuition that’s required to solve a lot of the harder problems in the book. Unfortunately, the vast majority of kids who wish to take the class haven’t and have trouble constructing basic proofs, which hinders their ability to reason about the material.’’’’’
Additionally, a basic understanding of non-measure theoretic probability is required.