How to pick a successful AI project, part 3: Working with models

This post is part of a series on ‘how to pick a successful AI project’. Part 1, Part 2.

Here I’ll cover what I’ve learned working with models and how this matters to pick up an AI project that will succeed. I mentored over >165 of them and counting at Data Science Retreat.

Not much theory

If you ask a meetup presenter ‘how did you pick your architecture?’ the most likely answer is something similar to ‘I copied it from a blog post or paper, then tweaked it.’ There’s little theory that guides how to pick an architecture. The field seems to be at the stage of medieval apprenticeships, where apprentices copy the work of masters. Although the field produces 1000s of papers per year, there’s very little in terms of theory. Practitioners generate ‘rules of thumb’ to pick architectures and train models. One good example is Andrew Ng’s book ‘machine learning yearning.’

This book is dense with the closest thing that we have to ‘theory’ to pick up architectures and finetune models.

You need to have a gold standard

Your model must improve a KPI that matters. That means that there’s something observable, measurable that the model does better than the baseline (which often means no model at all).

Supervised learning is better than unsupervised in that you can justify what your model does.

If you use cluster analysis, your boss could always say ‘you show me 3 clusters, why not 5? I think there are 5.’ there’s no correct answer to this. A supervised model has clear performance measures, plus it can often be checked ‘by eye’ (hey, your dog classifier missed this dog that looks like a cranberry cupcake.)

Use a pretrained model

With transfer learning, instead of starting the learning process from scratch, you start from patterns learned when solving a different problem. This way, you leverage previous learning and avoid starting from scratch.

When you’re repurposing a pre-trained model for your own needs, you start by removing the original classifier, and then you add a new classifier that fits your purposes. You save time (in weeks when dealing with big deep networks with millions of parameters).

There are repositories of models in the public domain, for example:

Try also As the perfect name indicates, it’s a searchable collection of papers that have a public implementation, an excellent place to start.

If you have done or many of the other ML courses out there, you know more than enough to start reusing pretrained models. Even if you cannot find a pretrained model that matches your problem, using one that is barely related usually works better than starting from scratch. More so if your data is complex and your architecture will be more than a dozen layers. It takes a long time (and big hardware!) to train big architectures.

It’s good to stay somewhat on top of new models; tracking state of the art is not necessary, but for sure, it’s now easier than ever. Twitter will tell you if anything significant has just popped up. If someone made a great demo, Twitter would catch fire. Just follow a few people who post about these things often.

To navigate arXiV, try arxiv sanity (this is good to pick up trends, I don’t recommend you to make paper reading a priority if you want to be an ML practitioner. You will likely need to move so fast to deliver results that reading papers becomes a luxury you cannot afford.) About talk videos: has now videos for most talks. ‘Processing’ NeurIPS is a giant job, so it’s easier to read summaries from people soon after they attended.

Most projects I supervised (at least in the last year or two) used transfer learning. Think about your former self, 10 years ago. Would you be surprised if you told your past self that in the future, anyone could download a state-of-the-art ML model that took weeks to train? And use it to build anything you want?

Published papers in all scientific disciplines use ML, but their models are not very strong; improving them is a quick win

Take, for example, this paper on how to predict battery life from discharge patterns — published in Nature, one of the best scientific journals. Their machine learning is rudimentary at best; this is actually to be expected, as the authors are in electric engineering, not machine learning. The team focused more on their domain knowledge in electrical engineering than on the machine learning part. A very astute team of participants at Data Science Retreat batch 18 (Hannes Knobloch, Adem Frenk, and Wendy Chang) saw an opportunity: what if we make better predictions with more sophisticated models? They not only managed to beat the performance of the model on the paper; they got an offer from Bosch to continue working on it for 6 months (paid! no equity stake). They refused the offer because they all had better plans after graduation.

There’s an entire world of opportunity doing what Hannes, Adem, and Wendy did; so many papers out there provide data and a (low) benchmark to beat. Forget about doing Kaggle competitions; there’s more opportunity in these high profile papers!

Avoid the gorilla problem

What follows only applies to models that produce results that a user sees. If your models’ end-user is another machine (for example you produce an API that other machines consume) you can skip this section.

Your ML model provides value to your users, but only as long as they trust the results. That trust is fragile, as you will see.

In 2015, Google photos used machine learning to tag the contents of the pictures and improve search. While the algorithm had accuracy levels that made Google execs to approve it for production, it had ‘catastrophic mislabeling.’ You can imagine the PR disaster this was, both for google and for machine learning as a field. Google issued a fix, but the first fix was not sufficient, so Google ultimately decided not to give any photos a “gorilla” tag.

What do we learn from this? If your problem depends on an algo that has any chance of misclassifying something that breaks trust: pick another problem.

In the 200 projects I supervised, when a team brought up an idea that had the ‘gorilla problem,’ I steered them away from it. You can spend months doing stellar ML work that is invalidated by the gorilla problem. Another example is tagging ‘fake news’: if your algo tags one of my opinion leaders (one I trust blindly) as ‘fake news,’ you have lost me forever.

Multiple models doing a single task (example: picking up cigarette butts)

Making the self-driving car work is a constellation of engineering problems. Many different ML models work in unison to take you to where you want to go (pardon the pun).

An example from our labs: Emily, a self-driving toy car that finds and picks up cigarette butts we mentioned before, is doing 3 subtasks:

– Identify cigarette butts

– Move the car close enough so that the cigarette butts are within reach

– Pick the cigarette butt (stabbing)

Each subtask is a model.

Note that cigarette butts are incredibly poisonous (one can contaminate 40 liters of water) and hard to pick with a broom because they are so light). As a result, they accumulate in public areas. A swarm of these robots could have a serious ecological impact. Of course, it’s still early days, and plenty of practical problems remain: would people steal the cars for parts? Even if they don’t, would they share the street with autonomous robots that will make plenty of mistakes and may block their path at crucial times?

One lesson to learn is that combining 3 models lets you solve a problem that was unreachable otherwise. Each problem in isolation may not be that tough; in fact, it might be a solved problem.

Understanding context

What problem is this model trying to solve? You know these ‘product guys’? They think people “hire” products and services to get a job done. What is the job that your model is getting done?

This might be obvious at times, but not so obvious some other times, and there lies opportunity.

Imagine that you work for a hospital. After lots of deliberation, your boss has decided that your next task will be to build a model that predicts when an intensive care patient is going to crash. What is the ‘job’ of this model?

One way to look at it: the job is to save lives.

One other way to look at it: the job is to use the resources of the hospital optimally. When a patient crashes, it takes lots of people to try to get her to be stable again. Every nurse and doctor that have anything to do with this patient rushes to the room, and abandons any task they are doing. Retaking the task is costly. Task switching is very inefficient; chronometer in hand try doing two tasks A and B multiple times, AAAABBBBB vs ABABABAB. The second takes longer, for pretty much any tasks A and B. This is why getting distracted by a notification is so damaging for productivity.

In any case, whether you think your model is saving lives (period) or allocating hospital resources optimally (to save more lives!) makes all the difference.

Because ‘bosses’ who are not ‘close to the metal’ cannot really estimate what the right job for the model is, you will have to do it. It’s a good fit for the analytical mind of the data scientist.

And there you have it; a complete manual to pick a successful AI project, in three installments. I hope this was useful, and that it helps you solve problems real people have with machine learning. There’s never been a better time to be alive. So much low hanging fruit, so much leverage thanks to this flavor of technology, so much productivity gains if we manage to educate a few more thousand people in AI. If this manual has helped, I’d love to hear from you and witness the project you built. Send it to me on twitter at @quesada, my DMs are open.