To better understand the landscape of available tools for machine learning production, I decided to look up every AI/ML tool I could find. The resources I used include:
After filtering out applications companies (e.g. companies that use ML to provide business analytics), tools that aren’t being actively developed, and tools that nobody uses, I got 202 tools. See the list.
Disclaimer
This post consists of 6 parts:
I. Overview II. The landscape over time III. The landscape is under-developed IV. Problems facing MLOps V. Open source and open-core VI. Conclusion
In one way to generalize the ML production flow that I agreed with, it consists of 4 steps:
I categorize the tools based on which step of the workflow that it supports. I don’t include Project setup since it requires project management tools, not ML tools. This isn’t always straightforward since one tool might help with more than one step. Their ambiguous descriptions don’t make it any easier: “we push the limits of data science”, “transforming AI projects into real-world business outcomes”, “allows data to move freely, like the air you breathe”, and my personal favorite: “we lived and breathed data science”.
I put the tools that cover more than one step of the pipeline into the category that they are best known for. If they’re known for multiple categories, I put them in the All-in-one category. I also include the Infrastructure category to include companies that provide infrastructure for training and storage. Most of these are Cloud providers.
I tracked the year each tool was launched. If it’s an open-source project, I looked at the first commit to see when the project began its public appearance. If it’s a company, I looked at the year it started on Crunchbase. Then I plotted the number of tools in each category over time.