Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propose Kubeflow to CNCF Incubation Stage #950

Closed
wants to merge 5 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
223 changes: 223 additions & 0 deletions proposals/incubation/kubeflow.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,223 @@
# Kubeflow CNCF Incubator Project Proposal

## Name of Project:

Kubeflow

## Description

Kubeflow project is an extendable Machine Learning (ML) platform made of multiple distinct components to address specific stages of the ML lifecycle, including:

* **Mode Development** - includes services to create and manage interactive Jupyter notebooks. Users can customize their notebook deployment and their compute resources to suit their data science needs. Supports local workflows with cloud deployment when ready.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you intended to title this section "Model Development" - with an "L" at the end.

* **Model Training**
* **Training Operator** - custom TensorFlow training job operator that can be used to train ML models. Supports distributed jobs. Configures the training controller to use CPUs or GPUs and to suit various cluster sizes.
* **Pipelines** - comprehensive solution for deploying and managing end-to-end ML workflows. Supports rapid and reliable experimentation. Allows users to schedule and compare runs, and examine detailed reports on each run.
* **Multi-framework** - supports models beyond TensorFlow, including PyTorch, Apache MXNet, MPI, XGBoost, Chainer, and more. Integrates with Istio and Ambassador for ingress, Nuclio as a fast multi-purpose serverless framework, and Pachyderm for managing your data science pipelines.

## Statement on alignment with CNCF mission

Since its inception in 2017, when Kubeflow was first [open sourced by Google](https://kubernetes.io/blog/2017/12/introducing-kubeflow-composable/), the project strove to make ML on Kubernetes simple, portable and scalable. In May 2020, **with the [v1.0](https://blog.kubeflow.org/releases/2020/03/02/kubeflow-1-0-cloud-native-ml-for-everyone.html)** release, Kubeflow reached maturity across a core set of its stable applications, and graduated Kubeflow Serving as an independent project ([KServe](https://github.com/kserve/kserve)), which is now incubating in [Linux Foundation AI & Data](https://lfaidata.foundation/) foundation.

Today, the community has close to 200 contributors from over 30 organizations, and hosts regular contributor summits, community meetings, and meetups across the world. The broader Kubeflow ecosystem also includes today a number [distributions](https://www.kubeflow.org/docs/distributions/) across multiple cloud service providers and on-prem environments. Kubeflow’s powerful development experience helps data scientists build, train and deploy their ML models while also enabling ML operation teams in the enterprise to deploy and scale advanced workflows in a variety of infrastructures.

To meet the evolving needs of its community, Kubeflow needs a vendor-neutral foundation. This application to become a CNCF incubating project is the next big milestone for the Kubeflow community, and we believe that CNCF is the right foundation for the project given Kubeflow's roots in the Kubernetes community and the Cloud native ecosystem (see: Kubernetes, Argo, Istio, Knative, Kustomize). Being part of CNCF will bring Kubeflow closer to this project ecosystem, with the potential to increase the velocity of feedback and innovation over time. In turn, Kubeflow can bring years of community expertise in enabling the use of containers for running AI/ML workloads.


## Roadmap

* Kubeflow - [https://github.com/kubeflow/kubeflow/blob/master/ROADMAP.md](https://github.com/kubeflow/kubeflow/blob/master/ROADMAP.md)
* Pipelines - [https://github.com/kubeflow/pipelines/blob/master/ROADMAP.md](https://github.com/kubeflow/pipelines/blob/master/ROADMAP.md)
* Katib - [https://github.com/kubeflow/katib/blob/master/ROADMAP.md](https://github.com/kubeflow/katib/blob/master/ROADMAP.md)
* Notebooks -

mchmarny marked this conversation as resolved.
Show resolved Hide resolved
## Sponsors from TOC

No sponsor, yet.

## Preferred maturity level

Incubating

## License

Kubeflow is licensed under the Apache 2.0 license.

## Source control

[https://github.com/kubeflow](https://github.com/kubeflow) (everything in the Kubeflow org)


## External Dependencies

Kubeflow has external dependencies licensed under the following CNCF-approved licenses:

* Apache 2.0
* BSD License
* ISC License
* MIT License

The dependency for each sub-project and their license information are included in:

* Kubeflow - [https://github.com/kubeflow/kubeflow/search?q=in%3Apath+third_party+license](https://github.com/kubeflow/kubeflow/search?q=in%3Apath+third_party+license)
* Pipelines - [https://github.com/kubeflow/pipelines/tree/master/third_party](https://github.com/kubeflow/pipelines/tree/master/third_party)
* Training-operator - [https://github.com/kubeflow/training-operator/tree/master/third_party_licenses](https://github.com/kubeflow/training-operator/tree/master/third_party_licenses)
* Katib - [https://github.com/kubeflow/katib/blob/master/go.mod](https://github.com/kubeflow/katib/blob/master/go.mod)

When possible, sub-projects in Kubeflow will be brought in line with CNCF guidelines before submission or, in extreme circumstances, the Kubeflow project will seek CNCF governing board exceptions.

## Maintainers

Kubeflow welcome commits from anyone in the community. Due to its multi-repo structure with distinct contributor roles, the project uses "maintainer" as the noun for a member with sufficient permissions to merge PR in specific repo. That role can be defined at any level of each repo using the OWNERS file which in turn grants that contributor permission to “/LGTM” and “/APPROVE” PRs. These comments on PR subsequently trigger automation that actually merges that PR.

The current snapshot of the maintainers as defined above (company affiliation based on user public GitHub profile).

* Kubeflow
* [pdmack](https://github.com/pdmack) - Pete MacKinnon (NVIDIA)
* [james-jwu](https://github.com/james-jwu) - James Wu (Google)
* [zijianjoy](https://github.com/zijianjoy) - James Liu (Google)
mchmarny marked this conversation as resolved.
Show resolved Hide resolved
* Kubeflow Notebooks & Common Kubeflow Components
* [elikatsis](https://github.com/elikatsis) - Ilias Katsakioris (Arrikto)
* [kimwnasptd](https://github.com/elikatsis) - Kimonas Sotirchos (Arrikto)
* [StefanoFioravanzo](https://github.com/StefanoFioravanzo) - Stefano Fioravanzo (Arrikto)
* [thesuperzapper](https://github.com/thesuperzapper) - Mathew Wicks
* [yanniszark](https://github.com/yanniszark) - Yannis Zarkadas (Arrikto)
* Kubeflow Manifests
* [elikatsis](https://github.com/elikatsis) - Ilias Katsakioris (Arrikto)
* [kimwnasptd](https://github.com/elikatsis) - Kimonas Sotirchos (Arrikto)
* [PatrickXYS](https://github.com/PatrickXYS) - Yao Xiao (Google)
* [StefanoFioravanzo](https://github.com/StefanoFioravanzo) - Stefano Fioravanzo (Arrikto)
* [yanniszark](https://github.com/yanniszark) - Yannis Zarkadas (Arrikto)
* Pipelines
* [Adrian555](https://github.com/adrian555) - Adrian Zhuang (IBM)
* [Akartsky](https://github.com/akartsky) - Kartik Kalamadi (AWS)
* [alinakuz](https://github.com/alinakuz) - Alina Kuzhiyil (Google)
* [Andreyvelich](https://github.com/andreyvelich) - Andrey Velichkevich (Apple)
* [Animeshsingh](https://github.com/animeshsingh) - Animesh Singh (IBM)
* [Ark-kun](https://github.com/Ark-kun) - Alexey Volkov (Google)
* [Aronchick](https://github.com/aronchick) - David Aronchick (Protocol Labs)
* [Berndverst](https://github.com/berndverst) - Bernd Verst (Microsoft)
* [Chauhang](https://github.com/chauhang) - Geeta Chauhan
* [chensun](https://github.com/chensun) - Chen Sun (Google)
* [connor-mccarthy](https://github.com/connor-mccarthy) - Connor McCarthy (Google)
* [Dtzar](https://github.com/dtzar) - David Tesar (Microsoft)
* [Dushyanthsc](https://github.com/dushyanthsc) -
* [Eedorenko](https://github.com/eedorenko) - Eugene Fedorenko
* [Gaocegege](https://github.com/gaocegege) - Ce Gao (TensorChord)
* [Gaoning777](https://github.com/gaoning777) - Ning
* [gkcalat](https://github.com/gkcalat) - Ablai Akhazhanov (Google)
* [hilcj](https://github.com/hilcj) - Desmond Liu (Google)
* [Hongye-sun](https://github.com/hongye-sun) -
* [Hougangliu](https://github.com/hougangliu) - Hougang Liu (IBM)
* [IronPan](https://github.com/IronPan) - Yang Pan (Google)
* [Jagadeeshi2i](https://github.com/jagadeeshi2i) - Jagadeesh J (Ideas2IT Technologies)
* [james-jwu](https://github.com/james-jwu) - James Wu (Google)
* [Jeffwan](https://github.com/Jeffwan) - Jiaxin Shan (Bytedance)
* [ji-yaqi](https://github.com/ji-yaqi) -
* [jlyaoyuli](https://github.com/jlyaoyuli) - Joe Li (Google)
* [JOCSTAA](https://github.com/JOCSTAA) - Josh Ogbonda (Google)
* [joeliedtke](https://github.com/joeliedtke) - Joe Liedtke (Google)
* [Johnugeorge](https://github.com/johnugeorge) - Johnu George (Nutanix)
* [Linchin](https://github.com/Linchin) - Lingqing Gan (Google)
* [Luotigerlsx](https://github.com/luotigerlsx) - Shixin
* [Mameshini](https://github.com/mameshini) - Igor Mameshin
* [Mbaijal](https://github.com/mbaijal) - Meghna Baijal (Amazon)
* [Neuromage](https://github.com/neuromage) - Ajay Gopinathan (Google)
* [NikeNano](https://github.com/NikeNano) - Niklas Hansson (Northvolt)
* [Numerology](https://github.com/numerology) - Jiaxiao Zheng (ByteDance)
* [Ojarjur](https://github.com/ojarjur) - Omar Jarjur
* [PatrickXYS](https://github.com/PatrickXYS) - Yao Xiao (Google)
* [RedbackThomson](https://github.com/RedbackThomson) - Nicholas Thomson (Amazon)
* [Rileyjbauer](https://github.com/rileyjbauer) - Riley Bauer
* [rui5i](https://github.com/rui5i) - Rui Fang (Google)
* [sasha-gitg](https://github.com/sasha-gitg) - Sasha Sobran (Google)
* [Saurabh24292](https://github.com/saurabh24292) -
* [Shrinath-suresh](https://github.com/shrinath-suresh) -
* [SinaChavoshi](https://github.com/SinaChavoshi) - Sina Chavoshi (Google)
* [StefanoFioravanzo](https://github.com/StefanoFioravanzo) - StefanoFioravanzo (Arrikto)
* [Sudivate](https://github.com/sudivate) - Sushant Divate
* [Surajkota](https://github.com/surajkota) - Suraj Kota (Amazon)
* [Tomcli](https://github.com/Tomcli) - Tommy Li (IBM)
* [Yhwang](https://github.com/yhwang) - Yihong Wang (IBM)
* [Yilun-msft](https://github.com/yilun-msft) -
* [zichuan-scott-xu](https://github.com/zichuan-scott-xu) -
* [zijianjoy](https://github.com/zijianjoy) - James Liu (Google)
* [zpChris](https://github.com/zpChris) -
* Training-operator
* [Gaocegege](https://github.com/Gaocegege) - Ce Gao (TensorChord)
* [Jeffwan](https://github.com/Jeffwan) - Jiaxin Shan (Bytedance)
* [Johnugeorge](https://github.com/Johnugeorge) - Johnu George (Nutanix)
* [Terrytangyuan](https://github.com/Terrytangyuan) - Yuan Tang (Akuity)
* [zw0610](https://github.com/zw0610) - Wang Zhang (Tencent)
* Katib
* [Andreyvelich](https://github.com/Andreyvelich) - Andrey Velichkevich (Apple)
* [Gaocegege](https://github.com/Gaocegege) - Ce Gao (TensorChord)
* [Hougangliu](https://github.com/Hougangliu) - Hougang Liu (IBM)
* [Johnugeorge](https://github.com/Johnugeorge) - Johnu George (Nutanix)
Comment on lines +151 to +155
Copy link

@tenzen-y tenzen-y Dec 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mchmarny, Could you update this section like the following since we updated the OWNERS file recently?

https://github.com/kubeflow/katib/blob/f941ec61e5fe8aa17919c51a8b347914ae70b876/OWNERS

    * [Andreyvelich](https://github.com/Andreyvelich) - Andrey Velichkevich (Apple)
    * [Gaocegege](https://github.com/Gaocegege) - Ce Gao (TensorChord)
    * [Johnugeorge](https://github.com/Johnugeorge) - Johnu George (Nutanix)
    * [tenzen-y](https://github.com/tenzen-y) - Yuki Iwai (CyberAgent)


## Infrastructure requests (CI/CNCF Cluster)

Kubeflow uses AWS for its CI needs. The community is already working on streamlining the infrastructure provisioning and operations ([https://github.com/kubeflow/testing/issues/1006](https://github.com/kubeflow/testing/issues/1006)). If Kubeflow does get accepted for incubation in CNCF, the community will work towards migrating the ownership of this infrastructure to the CNCF.

Each distribution offering Kubeflow as a service operates their own test/build/release infrastructure. These will continue being the responsibility of each provider and will not transition to CNCF ownership. The maintainers of these infrastructures are available in the platform-specific Slack channels:

* AWS - [https://kubeflow.slack.com/archives/CKBA5D0MU](https://kubeflow.slack.com/archives/CKBA5D0MU)
* Azure - [https://kubeflow.slack.com/archives/CUW6SLCPR](https://kubeflow.slack.com/archives/CUW6SLCPR)
* Platform - [https://kubeflow.slack.com/archives/CKH7V1M7F](https://kubeflow.slack.com/archives/CKH7V1M7F)

## Communication Channels

The Kubeflow community includes a variety of Slack forums for general discussion, feature requests, job postings, and sub-project, platform, or geo-specific channels. Popular Kubeflow Slack channels are listed here - [https://www.kubeflow.org/docs/about/community/#kubeflow-slack](https://www.kubeflow.org/docs/about/community/#kubeflow-slack)


## Issue tracker

* Kubeflow - [https://github.com/kubeflow/kubeflow/issues](https://github.com/kubeflow/kubeflow/issues)
* Pipelines - [https://github.com/kubeflow/pipelines/issues](https://github.com/kubeflow/pipelines/issues)
* Training-operator - [https://github.com/kubeflow/training-operator/issues](https://github.com/kubeflow/training-operator/issues)
* Katib - [https://github.com/kubeflow/katib/issues](https://github.com/kubeflow/katib/issues)

## Community

* Repo - [https://github.com/kubeflow/community](https://github.com/kubeflow/community)
* Website - [https://www.kubeflow.org/docs/about/community/](https://www.kubeflow.org/docs/about/community/)
* Workgroups & SIGs - [https://github.com/kubeflow/community/blob/master/wg-list.md](https://github.com/kubeflow/community/blob/master/wg-list.md)
* Kubeflow Blog - [https://blog.kubeflow.org](https://blog.kubeflow.org)

## Code of Conduct

[https://github.com/kubeflow/community/blob/master/CODE_OF_CONDUCT.md](https://github.com/kubeflow/community/blob/master/CODE_OF_CONDUCT.md)

## Social media accounts

* Youtube - [https://www.youtube.com/kubeflow](https://www.youtube.com/kubeflow)
* Twitter - [https://twitter.com/kubeflow](https://twitter.com/kubeflow)
* LinkedIn - [https://www.linkedin.com/company/kubeflow](https://www.linkedin.com/company/kubeflow)
* Google Groups - [https://groups.google.com/g/kubeflow-discuss](https://groups.google.com/g/kubeflow-discuss)
* Google Workspace (used to manage community calendar, Drive, etc.)

mchmarny marked this conversation as resolved.
Show resolved Hide resolved
## Existing sponsorship

None

## Adopters

* Kubeflow - [https://github.com/kubeflow/kubeflow/network/dependent](https://github.com/kubeflow/kubeflow/network/dependents)
* Pipelines - [https://github.com/kubeflow/pipelines/network/dependents](https://github.com/kubeflow/kubeflow/network/dependents)
* Training-operator - [https://github.com/kubeflow/training-operator/network/dependents](https://github.com/kubeflow/training-operator/network/dependents)
* Katib - [https://github.com/kubeflow/katib/network/dependents](https://github.com/kubeflow/katib/network/dependents)

## Releases

* The release handbook - [https://github.com/kubeflow/community/blob/master/releases/handbook.md](https://github.com/kubeflow/community/blob/master/releases/handbook.md)
* Kubeflow releases - [https://github.com/kubeflow/community/tree/master/releases](https://github.com/kubeflow/community/tree/master/releases)

## Community size

The Kubeflow project has now close to 200 contributors from over 30 organizations, and the Kubeflow community has hosted a number of summits and contributor meetups across the world. The broader Kubeflow ecosystem includes a number distributions across multiple cloud service providers and on-prem environments. For more information see [https://www.kubeflow.org/docs/about/community/](https://www.kubeflow.org/docs/about/community/)

The Kubeflow project repo-level metrics:

* [Kubeflow](https://github.com/kubeflow/kubeflow) - 11.9k stars, 2k forks
* [Pipelines](https://github.com/kubeflow/pipelines) - 3k stars, 1.3k forks
* [Training-operator](https://github.com/kubeflow/training-operator) - 1.2k stars, 500 forks
* [Katib](https://github.com/kubeflow/katib) - 1.2k stars, 362 forks