The CNCF Technical Oversight Committee (TOC) has voted to accept Kubeflow as a CNCF incubating project.
Kubeflow is an open source, community-driven project for deploying and managing a Machine Learning (ML) stack on Kubernetes. The Kubeflow community actively develops and supports Kubernetes-native MLOps for its users who develop and deploy distributed machine learning (ML) in popular frameworks, including TensorFlow, PyTorch, XGBoost, Apache MXNet, and more.
Kubeflow was created in 2017 by Google. Today, there are ten commercial distributions based on Kubeflow. These distribution teams help feed Kubeflow’s base of hundreds of contributors who support the project and its thousands of users. The Kubeflow Community, which includes over 150 companies, has actively worked to support its users by delivering fifteen major releases since its initial release over five years ago. These organizations leverage Kubeflow’s Kubernetes-native scalability, security, resource allocation, and declarative operations to deliver models faster and cost-effectively.
“Kubernetes environments provide repeatability, scalability, and fast delivery, making them the perfect place to run AI and ML initiatives,” said Ricardo Rocha, TOC sponsor. “Kubeflow helps fill a gap by delivering machine learning pipelines and MLOps while working closely with its extensive community and other tools and initiatives to create a more cohesive ecosystem. We’re excited to watch the Kubeflow project grow within CNCF and te see the advancements that come in the MLOps space.”
The project is already well integrated with the CNCF and ML communities. To enhance its Kubernetes foundation, Kubeflow simplifies its installation, scalability, service mesh, security, and workflow management by integrating and packaging the benefits of Kustomize, Knative, Istio, Certificate Manager, and Argo. It also integrates with gRPC, Prometheus, and other communities, and work is underway to integrate Kubeflow with Kuberay and MLflow.
Main Components:
- Kubeflow integrates software from five semi-independent working groups simplifying the end-to-end process of developing and deploying machine learning models using Kubernetes native efficiencies. Working groups include:
- The Notebooks Working Group builds an interactive development environment in Jupyter, VSCode and R-Studio notebooks, which speeds up model development and experimentation. This Working Group also develops Kubeflow’s central dashboard and web applications, which provide users with easier visualization of data.
- The Training Operator Working Group develops Training Operator software to enable distributed ML training on Kubernetes. It leverages various distributed strategies to train large-scale deep neural networks (DNN) models on multi-GPUs. Training Operator allows you to use various scheduling techniques (e.g. Volcano) and elastic training to save compute resources for ML training. It supports all major ML frameworks and provides simple SDKs for Data Scientists to train their models on Kubernetes.
- The AutoML Working Group develops automated model development software called Katib, which includes hyperparameter tuning and other model optimization features like neural architecture search (NAS). Katib offers many optimization algorithms to evaluate the best parameters for ML models and to save compute resources by using various Early Stopping techniques. It also allows users to test many variations of a model’s configuration parameters and evaluate those results via experiment tracking UI or SDK for the best performance option.
- The Kubeflow Pipelines Working Group develops software that converts Python ML scripts into stable workflow templates. Workflow templates are reusable, and Kubeflow Pipelines enable easy experimentation and management of your workloads. During execution, Kubeflow Pipelines simplifies distributed workflow automation with advanced workflow management and monitoring, with efficient Kubernetes operations.
- The Manifests Working Group develops the installation process for Kubeflow, both for individual and all of Kubeflow’s components. As Kubeflow runs on a Kubernetes foundation, it uses Kustomize for its installation process.
- The KServe Project develops a highly scalable and standards based model inference platform on Kubernetes. Although KServe is an independent project, it is incorporated into Kubeflow’s installation and testing processes. KServe plays a vital role in streamlined end-to-end MLOps workflows, significantly simplifying serving the machine learning models on production.
The project can be deployed as independent components or a complete end-to-end system.
Notable Milestones:
- 28K+ GitHub Stars
- 150+ companies contributing
- 15K+ total committers
- 55K+ total GitHub contributions
- 9,000+ Slack Members
- 15 releases since 2017
Looking forward, the Kubeflow project will focus on implementing its v1.8 roadmap, which will be released in October 2023. New features include the Kubeflow Pipelines 2.0 GA, new AutoML experiment features improving scale numbers along with Training Operator enhancements for advanced model parallelism techniques and custom gang schedulers support. The 1.8 release will be tested with defined dependency versions of Kubernetes, Kustomize, Istio, Certificate Manager, Argo, and Knative.
As a CNCF-hosted project, Kubeflow is part of a neutral foundation aligned with its technical interests, as well as the larger Linux Foundation, which provides governance, marketing support, and community outreach. The project joins 38 other incubating technologies, including Backstage, Cilium, Istio, Knative, OpenTelemetry, and more. For more information on maturity requirements for each level, please view the CNCF Graduation Criteria.
Supporting Quotes
“At Microsoft, we are committed to open source and cloud native technologies, especially when they can enable a new generation of AI-powered intelligent apps. We have seen the power of Kubeflow in enabling data scientists and engineers to quickly and easily develop, deploy, and manage machine learning workloads on Kubernetes, particularly with the Kubeflow on AKS project. Now that Kubeflow is part of CNCF, we look forward to continuing our collaboration with the community to further drive innovation with Kubeflow.”
– Lachie Evenson, Principal PDM Manager, Microsoft
“I am thrilled that Kubeflow has been accepted into the Cloud Native Computing Foundation (CNCF) as an incubating project. When we began Kubeflow, we envisioned creating the easiest, most reliable, and most scalable platform for machine learning on Kubernetes. With the help of the community, we have made great strides. However, we can only truly succeed if we have a platform that is governed openly and transparently. With the support of the CNCF, we are now confident that this will be the case. I am looking forward to all the ways that this next step in the platform’s growth and maturity will take it.”
– David Aronchick, CEO, Expanso
“When we started Kubeflow in 2018, one of the core principles was being an open platform for ML. Joining the CNCF as an incubating project is a huge step towards that goal. I’m deeply grateful to the many contributors who have worked tirelessly to push the project to this milestone.”
– Jeremy Lewi, Software Engineer
“Kubeflow is a project that enables professionals to run AI at scale, using open source tooling. It is designed to address pressing industry challenges like workflow automation, portability and code reproducibility. It allows its users to develop and deploy machine learning models, simplifying operations. Canonical is excited to see Kubeflow move to CNCF. We look forward to growing product adoption through our Charmed Kubeflow distribution and make MLOps available to more people around the world, working together with the community.”
– Andreea Munteanu, MLOps Product Manager, Canonical
“Kubeflow joining the CNCF today was enabled by the dedication of many contributors over the last five years and is the beginning of an exciting new chapter for the project! I am eager to watch how the community grows and matures under a vendor-neutral governance model.”
– Mathew Wicks, Kubeflow maintainer
“The Kubeflow project was created to simplify deploying and managing the AI/ML lifecycle on Kubernetes. With the flexibility to choose which components of the Kubeflow ecosystem to leverage and the ability to build your own distributions, Kubeflow enables a wide range of machine learning use cases. As a new CNCF incubating project, I am excited about the opportunity to grow the Kubeflow community and bring even more users, contributors, and leaders to the project.”
– Andrey Velichkevich, Co-Chair of the KubeFlow AutoML and Training Working Groups
“Kubeflow has evolved tremendously over the years to become the defacto MLOps platform on Kubernetes. With the rapid advances in Generative AI (ChatGPT), there is a huge demand to train, tune, and deploy custom large language models (LLMs), which require enormous computing and data power. Kubeflow provides a high-performance, optimized platform to handle the entire ML lifecycle management of these foundation models. It can scale seamlessly across large clusters of GPU machines using state-of-the-art data and model parallelism mechanisms. I am excited about the Kubeflow incubation journey and looking forward to its growth in the CNCF ecosystem.”
– Johnu George, Co-Chair of the KubeFlow Training and AutoML Working Groups and Staff Engineer, Nutanix
“In 2018, we believed Kubeflow would be the de-facto choice for running distributed machine learning workloads on Kubernetes and migrated the vast majority of our distributed model training tasks to Kubeflow. Looking back, we made the right decision to adopt and maintain Kubeflow. Congratulations to the entire Kubeflow community on this momentous achievement! Our heartfelt appreciation goes out to the countless contributors and maintainers who dedicated endless hours over the years to push the project to this milestone. We look forward to seeing the project continue thriving within the CNCF ecosystem.”
– Yuan Tang, Project Lead of Kubeflow and Argo, Founding Engineer at Akuity
“For us at MavenCode, Kubeflow has helped to foster a culture of collaboration, enabling seamless interaction among data scientists, engineers, and other stakeholders. It provides the framework for elastic compute resource scalability needed by our team to handle large-scale ML workflows and datasets, thereby making it easy to quickly unlock new possibilities in our AI journey. It’s great to see how Kubeflow has grown over the years, and we are even more excited about the possibilities ahead now that Kubeflow is part of the CNCF community.”
– Charles Adetiloye, Lead AI & MLOps Platform Engineer, MavenCode
“Kubeflow has been a central part of open source MLOps in the enterprise space for many years. We’re excited to see Kubeflow grow further with the CNCF as it matures in the MLOps ecosystem.”
– Josh Patterson, CEO, Patterson Consulting
“Congratulations to the Kubeflow community for moving into the CNCF. IBM has been a long-term contributor since the early days of the project, and has adopted Kubeflow into our Watson Pipelines product. We have helped to grow the Kubeflow community by supporting its adoption on the Red Hat OpenShift platform and on IBM Power.”
– Brad Topol, Distinguished Engineer and Director of Open Technologies, IBM
“In collaboration with other Kubeflow community members, VMware is delighted to be part of the project’s journey in joining the CNCF. This shared effort reflects our commitment to open collaboration and innovation in the machine learning field. By continuing our involvement in the community, we are excited to be part of the project’s next milestone and the positive impact it will have in shaping the future of machine learning.”
– Anna Jung, Sr. ML Open Source Engineer, VMware
“The goal of Kubeflow is to become the Kubernetes of machine learning and democratize ML Platforms by making them available to everyone, just as Kubernetes did with container platforms. Not only research institutions but also regulated industries like finance and medical and privacy-oriented organizations. I am working to make the project secure by default to ensure user data stays private.”
– Julius von Kohout, Kubeflow Developer
“The feedback from the active Kubeflow Community and users about the project joining CNCF has been overwhelmingly positive. This will give the project a neutral space to live and grow and will allow for more contributors, both in code and leadership. It will also give the project more operational organization and guidance and allow for other Kubeflow-based distributions and related technologies to grow among the Kubernetes ecosystem.”
– Amber Graner, Kubeflow Community Member and Open Source Evangelist and Community Growth and Leadership Specialist
“As AI plays an increasingly important role in Bloomberg’s products, it is critical that we provide our engineers with internal infrastructure that enables distributed training and production-serving, especially over GPUs. Kubeflow’s training operator and integration with KServe play an important role in our cloud native machine learning infrastructure, which manages the lifecycle of machine learning models for our data scientists and AI engineers as they train and deploy models at scale. We are proud to have been members of the Kubeflow community since its inception and continue to make contributions that further develop the Kubeflow ecosystem so it provides a comprehensive solution that will optimize and make the end-to-end ML workflow easier for everyone.”
– Yuzhui Liu & Dan Sun, Team Leads with Bloomberg’s Cloud Native Compute Runtimes Engineering Group
“Aurora adopted Kubeflow because we saw how feature-rich and flexible the platform was for building training pipelines. We are delighted to be part of the Kubeflow community and are excited to celebrate Kubeflow’s acceptance into the Cloud Native Computing Foundation (CNCF) as an incubation project. This is a major milestone for Kubeflow, and we look forward to Kubeflow’s further maturation as the de facto platform for MLOps in the cloud.”
– Vinay Anantharaman, MLOps Lead
“DKube was one of the early adopters of Kubeflow to provide an Enterprise grade MLOps product for its customers globally for on-premise, air-gap or hybrid installations. Over the past few years DKube based Kubeflow implementations have been commercially deployed in biopharma, aerospace, defense, government, retail analytics and many other industries. DKube team is excited to see Kubeflow now becoming a CNCF project that will now further increase adoption.”
– Ajay Tyagi, Senior Director, DKube.IO
“Kubeflow has helped customers to accelerate their Kubernetes-native AI/ML platform development by allowing them to maintain the portability of their workloads, build hybrid workflows to take advantage of cloud-specific innovations, and maximize the flexibility of their deployments with customized configurations that meet their needs. Kubeflow on AWS enables customers who choose to standardize on Kubernetes across their organization to build such ML platform on top of Amazon EKS and solve a wide range of machine learning use cases. Kubeflow has made great progress in the last few years and I am thrilled to see Kubeflow join the CNCF ecosystem and grow further.”
– Suraj Kota, Sr. Software Development Engineer, AWS