Back in February when we were in our 3rd lockdowns, my team and I regrouped to think about our next steps. As we are in a fortunate position to meet with dozens of leading DS teams every week to brainstorm and discuss their challenges with scaling ML, we realized there was a need to give a structure to these voices and to create a repository of best practices and “stories from the field.”
At superwise we see ourselves as a team of engineers and data scientists who bear the scars of putting ML models in the real world, and learning from our mistakes. These scars have led us to create a solution that automates the assurance of ML models to help others scale their use of AI in a way that is safer and easier. The ML talks initiative is only a continuation of those efforts, and while the data science and MLOps community is a very vocal one with a wealth of information out there in the shape of blogs, Gits, and slack channels, there is still a real need to consolidate the experience from the trenches, the real stories of the women and men, who have been awake at 3 AM on a Saturday to understand what really is happening with their models.
So far we have interviewed 5 (and counting!) rockstars in the ML world, and have learned something new from every conversation.
Here are some of our key takeaways:
Scaling AI is about making sure that everyone is on board with it. Each and every one of our interviewees mentioned the necessity to facilitate adoption by being transparent to the downstream users. As Maya Bercovitch, Director of Engineering & Data Science, Anaplan notes: “we create a glassbox not a black box”. Clearly, scaling AI is about making it accessible for all the stakeholders. What’s more, in our discussion with Matt Shump, VP Data, Chownow, he notes: “I have not met a sales leader or a marketing leader who's willing for me to black box automate a lead scoring model for them. They want to know what's going on underneath the hood.” From data science, data engineering and operational users, each stakeholder in the organization needs to be aligned on how the models are doing to facilitate adoption and ROI.
In order to avoid delays and errors, the ability to understand how the data fluctuates and how the model behaves is paramount. Yet, the use of in-house tools or solutions that are not dedicated to machine learning tasks, often fails to deliver the right results - especially as the amount of models grows. As Maxim Khalilov, Head of R&D, Glovo notes:“ The nearest priority in terms of time is the monitoring. Because we don't have enough visibility into the technical characteristics of the models, but primarily on what happens with the data, how the data flows through our pipeline, and most importantly, how our model behaves, and how it reacts and changes in the data.”
When asked about what was at the top of her mind, Nufar Gaspar, Head of operational AI, product & strategy, Intel Corporation answers: “A lot of MLOps, as everyone. [...] The ability to have one MLOps across different verticals and different organizations and to ease the access to MLOps for teams without high proficiency in machine learning is key.”
One of the top best practices that Dino Bernicchi, Head of Data Science, Homechoice notes is: “Develop your own AutoML pipelines and systems to deploy and manage solutions in production. This will allow you to rapidly test and deploy models.”
I hope you enjoy reading these as much as we enjoyed conducting them. I want to thank all those who participated. We are only just getting started. So please feel free to contact me if you want to take part in the ML Talks, recommend a co-worker, or share some questions that you would want us to investigate!