“We already have one!” That’s the first sentence most of our customers said when we met to discuss AI assurance solutions. Most AI-savvy organizations today have some form of monitoring. Yet, as they scale their activities, they find themselves at a crossroads: should they invest more in their homegrown solution or receive support from vendor solutions? And if they do choose to invest more, for how long will their DIY solution be “good enough”?
In this blog, we explore how far homegrown solutions can take you and what you need to think about when planning to scale your use of machine learning.
Data science teams spend months researching and training their best models. The production phase and the necessary MLOps/monitoring phase sometimes only comes as an afterthought. In this context, many data science and engineering teams develop initial AI monitoring tools in-house. But while DIY tools may be a decent approach for businesses with a contained use of AI when time has come to expand the use of modeling, homegrown tools fail short of supporting the diversity and complexity of the models and the data used. Here is a shortlist of some of the lessons learned that we have witnessed with customers scaling their AI.
Guess what? Homegrown solutions don’t scale in sync with the models and require more and more maintenance, tweaks, and attention...This is especially true as organizations adopt AI for various use cases: from marketing to core activities embedded in their product.
Models monitoring is not a one-off task. As organizations adopt new models, they need to create a new monitoring paradigm that caters to the different types of data - structured, text, image, video, etc..; all of which require different measures and techniques to analyze the incoming data for the process. In other words, what works for a classification model probably won't work for a regression/clustering one, and a new set of tools will need to be developed. And even for specific structured use cases, different features of the model require different KPIs to analyze the health of the process: numerical/categorical/time/etc…
Regardless of the sophistication of the models, monitoring is an ongoing task that requires 25%-40% of a data science team’s time. The inefficiency and the frustration that comes with the heavy investment in homegrown monitoring solutions is amongst the first reasons that push organizations to turn to vendor solutions. Along with the fact that they would much rather their teams focus on creating models that have an impact on the business.
This is perhaps the most critical point. For organizations that have already engineered a solution that computes specific KPIs for your models, they find themselves struggling to proactively understand when concept drifts happen or when biases start to develop. More often than not, homegrown solutions tend to look at the things that are already known, and the issues that were already anticipated, thus realizing too late when events occur that are beyond this scope. This is often the point where organizations realize the limitations of their own solution, however sophisticated they engineered it to be, as it fails to bring value to the whole ML process.
In environments where data is extremely dynamic, assuring the health of models in production is about leveraging the expertise and best practices to be proactive: be alerted on issues that pertain to the health of the models, gain insights, and diagnose issues promptly.
As mentioned in a previous post, scaling AI poses the question of who owns it when it’s in production: data science teams? data engineering? business analysts? hybrid creatures? Ultimately, as AI use grows, the stakeholders involved also change, regardless of the number of models. Think about the fraud detection and cybersecurity space where analysts are the predominant users of the AI predictions and need to make sure the models are always tuned to a very dynamic data landscape.
For a monitoring solution to be useful, all the stakeholders involved need to derive insights and an understanding of the health of the predictions:
To do so organizations need to create and maintain a view of the ML predictions that everyone involved can access and extract value from, without creating unnecessary noise. Beyond determining if there are sufficient resources, there is also a matter of skill set as all stakeholders often have different perceptions that need to be bridged under one enterprise-wide view. Ultimately, the complexity of these tasks is what drives AI practitioners scaling their activities to select a best-of-breed solution for assuring their models in production.
In industries such as Adtech where models process TBs of data each day, the velocity of the data is a challenge to obtaining a clear picture. Do you have the time and tools necessary to continuously extract, compare, and analyze statistical metrics for your ML process, without impacting your core activities?
Here’s a quick list of considerations you may want to think over as you consider the best way to assure the health of your models in production. At the end of the day, it boils down to a question of resources management and efficiency: how much time should you invest developing a set of tools to monitor your models in production, today? And what will it cost you tomorrow as you add more and more models and use cases?
At Superwise, we specialize in accompanying our customers as they transition from using homegrown solutions--or even nothing!--to a rich AI assurance solution that helps them achieve business impact and grow their AI practice. Enabling them to focus on what they do best: developing and deploying models that help their business grow.