Project 2 Summary
Overall the class did an excellent job!
20/25 grades were 19 or higher (again)!
Highlights
The Type of Naive Bayes
We were looking for you to use a Monomial or Bernoulli Naive Bayes type, not a Gaussian Naive Bayes. The reason is that class label to predict was discrete/categorical, not continuous.
Ordinal Encoding
Technically, many of the feature variables were ordinal categoricals — i.e., they had a natural
ordering on them. For example, age, tumor_size, etc. Therefore, ideally one should have used
an ordinal categorical encoding instead of one-hot.
This is not a topic we went into much detail on, so we didn’t take off if you used one-hot, but a couple
of people did use ordinal, so nice job!
Hyperparameter Tuning with recall as a Scoring
Just a quick note to say that if you used hyperparameter tuning (i.e., with grid search) and recall as your target metric, you likely improved your results by a pretty significant margin.
Bonus
For the bonus, we were pretty strict. To get a full two points you needed to actually implement a new model using some strategy (e.g., modification to the decision threshold) and compute its performance on train and test (i.e., )
One person used and referenced a paper on ensemble methods: the basic idea is to use multiple models at the same time, and if any of them predict recurrence then the overall model predicts recurrence. Very nice!!