The Gap: Where Machine Learning Education Falls Short

The Gap: Where Machine Learning Education Falls Short

. 7 min read

As the field of machine learning has become ever more popular, a litany of online courses has emerged claiming to teach the skills necessary to “build a career in AI”. But before signing up for such a course, you should know whether the skills acquired will directly allow you to apply machine learning better. These questions are not limited to online courses but rather encompass machine learning classes that have begun to fill lecture halls at many universities. Are these classes that students flock towards actually helping them achieve their practical goals?

The Current State of Machine Learning Education
Having taken the main slate of the seminal machine learning courses at one of the top universities for AI, I have found a general guideline most classes follow. First, they tend to start with linear classifiers and introduce the concepts of both regression and classification along with the concepts of loss functions and optimization. Afterward, a week or two is spent on honing the skill of backpropagation after which they dive into neural networks fully. If the course focuses on deep learning, it tends to spend the majority of the remaining time diving extensively into the different forms of neural networks (RNN, LSTMs, CNNs, etc) and about recently published seminal architectures (ResNet, BERT, etc). If the course instead focuses on more general machine learning principles, it introduces other avenues such as unsupervised and reinforcement learning.

Thus we see that the key topics covered in these courses can be distilled into the following: an overview of supervised learning, a brief introduction to the mathematical foundations underlying supervised learning and neural networks, and then either an introduction to deep learning methodologies or to other areas of machine learning.

Additionally, taking a look at topics covered in the assignments of these courses helps us ascertain the main learning goals. Assignments are often structured as follows:  1) students are provided with a well-structured dataset 2) a model or core machine learning idea is introduced and students work through underpinnings of the concept 3) students implement the concept 4) run the implemented model on the given dataset and do some light hyperparameter tuning 5) plot the results to see how the idea performs.

Having examined both the content covered in courses as well as that present in assignments we have a basis with which to understand the information students are expected to learn. Machine learning courses hope to impart knowledge about the key models being used in the area the class focuses on. This occurs by briefly covering the theoretical underpinnings of said models and having students implement the main key features on assignments.

Skills Needed For Applying Machine Learning
Talking with peers that worked  in machine learning related industrial positions, I have found that there are a couple key skills necessary to be successful. The first pertinent skill is to understand how to properly clean and analyze  data. A fellow classmate related to me that a recent internship required him to spend his first 8 weeks collecting and preprocessing data before he could even begin to apply a model to the dataset. As machine learning models are extremely data dependent, mastering skills that ensure you know how to take advantage of key features of the dataset are extremely important.

Next, at the industry level, we see that large datasets are not available for most tasks. Because of this, many deep learning techniques can not be applied due to the possibility of overfitting and poor generalization. As a result, simpler models such as random forests or logistic regression -- which don’t require large amounts of data -- are often used instead. Thus, being able to properly apply such models using appropriate libraries like sci-kit learn is a valuable skill. In fact, a friend told me that his machine learning internship at Microsoft over the summer involved different variations of only logistic regression. Additionally, with the advent of large pre-trained models in both computer vision and NLP tasks, deep learning can be incorporated in certain scenarios for fine tuning. This further increases the importance of familiarity with seminal models.

Yet, at the research level, where larger data sets are often easily accessible and time constraints aren’t as big of an issue, we can train larger deep-learning models. For instance, consider Open-AI’s GPT-3 model with 175 billion parameters. In order to create such large architectures, the key skill needed is the knowledge of how to engineer such a large-scale deep learning system. This requires intimate knowledge with one of PyTorch or Tensorflow. Doing so allows a researcher to quickly and effectively implement theoreized models.

While being able to implement needed architectures is important, without hyperparameter tuning most models do not perform well. Thus, when creating applied machine learning systems, it is crucial not only to perform hyperparameter tuning but to have intuition on how certain design decisions can be helpful or harmful. Take for example a friend of mine that recently interned at Nvidia. He was having trouble tuning the hyperparameters of the model before realizing that the initialization region he was considering led to the majority of the ReLU activation functions in the architecture to die and hence for learning to stagnate.

The Gap
Having analyzed both the current state of machine learning education as well as the skills needed to create important applied machine learning systems, we now comment on the gap between the two sides. Based on what classes cover and what applications require,  it is clear that students are not taught enough about how to properly manage the data they are working with. Not only do the classes provide students with cleaned up datasets that already have been neatly pre-processed, they don’t promote much exploration beyond visualizing a couple data points. This lack of hands-on learning with how to normalize and explore datasets is detrimental to a student’s practical ability to conduct ML.

Additionally, while classes provide basic intuition on the mathematical background of key frameworks, not enough is done to fully expose students to the theory behind why a given model performs well for a certain task while others don’t. While the student is familiar with a variety of models, they can not discern which models would be the best given a certain dataset and task. Without understanding the mathematical underpinnings of key models and techniques in full detail, students aren’t able to quickly choose the right models for certain scenarios.

What is Already Done Well? What Can We Do To Make it Better?
As we analyze the gap that exists between classroom knowledge and the skills needed to create effective applied systems, it is clear that most courses do a great job of imparting fundamental machine learning knowledge to their students. The concepts covered in these courses ensure that you understand how a learning algorithm works and what is needed for it to attain convergence. The classes ensure that students are familiar with state of the art algorithms for a variety of tasks and have been exposed to a breadth of material so that they are able to continue further study in the area if desired.

However, there is good room for improvement. A lot of classes tend to spend the first couple of weeks on the same material: linear classifiers and backpropagation. While these are undoubtedly key topics, blocking out almost the first third of a class to cover material that should have already been introduced in pre-requisite classes is not the best use of class time. Rather, it will be more useful to have a clear separation of content between undergraduate and graduate classes. Graduate courses should strongly encourage students to be familiar with these concepts prior to taking the class or direct students to first take the undergraduate equivalent. This allows classes to have more room to teach students about relevant concepts such as how to analyze data for the specific field the class focuses on or further exposure to key frameworks in the area. By doing so I believe that machine learning classes can inch towards the ideal relationship between knowledge gained in class and that needed for applying it. Obviously one machine learning class within a whole slew of courses can not make these changes in isolation but rather it must be a coordinated effort by the community to ensure that they are providing students with the most novel and impactful knowledge possible.

Why Should Universities Care?
While addressing the gap mentioned in this article is definitely crucial in increasing students’ applied skills, most courses may simply state that it is not within their scope to teach aspects of machine learning that are mostly used in industry. However, these skills are far more transferable than to just industry knowledge -- in fact, they are useful in mostly any setting. For instance, when developing new models within research you still need to know how to properly preprocess data and discern what techniques will provide more promising results. Additionally, these skills generally are so fundamental that they will make students better machine learning practitioners. For a course, anything that can benefit students’ knowledge should be considered crucial information and thus be taught.

Based on the current state of machine learning courses it is clear that AI courses will get you through the door in your effort to perform cutting edge research or landing a machine learning job, but they won’t teach you everything you need to know. To fill in the knowledge gaps that remain you will have to put in outside effort on your own.

Author Bio
Jupinder Parmar is a Mathematics and Computer Science student at Stanford where he concentrates his studies within probability and machine learning, advised by Amir Dembo . At Stanford he has spent time exploring a variety of areas but with each my goal has been to learn how to leverage my understanding of random phenomena and artificial intelligence to develop exciting ideas and products that will revolutionize our world. He believes that these two concepts can not only guide us to a deeper understanding of the problems that challenge our lives but also give way to robust, innovative solutions.

The main image from this piece is taken from this article.

For attribution in academic contexts or books, please cite this work as

Jupinder Parmar, "The Gap: Where Machine Learning Education Falls Short", The Gradient, 2020.

BibTeX citation:

author = {Parmar, Jupinder},
title = {The Gap: Where Machine Learning Education Falls Short},
journal = {The Gradient},
year = {2020},
howpublished = {\url{} },

If you enjoyed this piece and want to hear more, subscribe to the Gradient and follow us on Twitter.