Closing Remarks for Unit 2
We want to close unit 2 by noting that we have barely scratched the surface of the classical machine learning models that have been developed. There are a number of specialized topics that are well covered/represented in sklearn and worthy of study.
Two specific classes of models and algorithms that are directly applicable to the classification problems we have focused on are Gradient Boosting and Support Vector Machines (SVM).
These two are arguably the most important model types that we will not be covering, as they have excellent properties, including the ability to model high-dimensional and/or non-linear data as well as to perform relatively well on small datasets.
The rise of deep learning in general and transformer architecture that has produced large language models in particular, has been touted as an architecture that can potentially out-perform all others on all tasks. Thus, we are making the decision to spend some portion of class time on these ideas. However, at least at the time of this writing, some of these classical algorthms (e.g., Gradient Boosting algorithms, in particular), are still considered benchmarks and/or state-of-the-art for some classification problems. It is still an open question whether transformers can be used to out-perform them.
There are many excellent online sources for additional reading. Here we list just a few:
Gradient Boosting Classifier: algorithms ([3]), in sklearn ([4]).
Support Vector Machines: algorithms ([1]), in sklearn ([2]).
Teams for Project 3 and 4
Finally, we would like you all to start thinking about what teams you would like to work in for projects 3 and 4. Please talk to your fellow students and let us know who you would like to pair up with. Also, if you would like our help getting paired up with another students please let us know. Of course, it is fine if you prefer to work alone as well.
References and Additional Resources
Lecture 4 (January 30): The support vector classifier, aka soft-margin support vector machine (SVM); UC Berkeley CS189/289A: Introduction to Machine Learning. https://people.eecs.berkeley.edu/~jrs/189/lec/04.pdf
SVM in SKlearn. https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html
Huan Zhang, Si Si and Cho-Jui Hsieh. “GPU Acceleration for Large-scale Tree Boosting.” SysML Conference, 2018.
Gradient Boosting Classifier in SKlearn. https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html