From 5ee010a5d40ba472c229ad90690a354faa06b781 Mon Sep 17 00:00:00 2001 From: Matt Young Date: Mon, 2 Mar 2020 21:18:22 -0500 Subject: [PATCH 1/5] SIG Observability Charter MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This SIG focuses on topics pertaining to the observation of cloud native workloads. Additionally, it produces supporting material and best practices for end-users and provides guidance and coordination for CNCF projects working within the SIG’s scope. This document is the result of working sessions: 2/14/2020, 8:00-9:00 am (US Pacific), 11:00-12:00 (US East) 2/21/2020, 9:00-10:00 am (US Pacific), 12:00-1:00 (US East) 2/28/2020, 8:00-9:00 am (US Pacific), 11:00-12:00 (US East) --- observability-charter.md | 202 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 202 insertions(+) create mode 100644 observability-charter.md diff --git a/observability-charter.md b/observability-charter.md new file mode 100644 index 0000000..b9b954f --- /dev/null +++ b/observability-charter.md @@ -0,0 +1,202 @@ +# CNCF SIG Observability Charter + +- [CNCF SIG Observability Charter](#cncf-sig-observability-charter) + - [Introduction](#introduction) + - [Mission](#mission) + - [Areas considered in Scope](#areas-considered-in-scope) + - [Areas considered out of Scope](#areas-considered-out-of-scope) + - [Roadmap & Initial Efforts](#roadmap--initial-efforts) + - [Governance](#governance) + - [Operations](#operations) + +*Initially authored by [Matthew Young][matthew young] with grateful review and +contributions from: +[Alex Nauda][Alex Nauda], +[Alois Reitbauer][Alois Reitbauer], +[Bartłomiej (Bartek) Płotka][Bartłomiej (Bartek) Płotka], +[Daniel Khan][Daniel Khan], +[Daniel Prata][Daniel Prata], +[Lincoln Sward][Lincoln Sward], +[Matthias Loibl][Matthias Loibl], +[Michael Hausenblas][Michael Hausenblas], +[Ricardo Aravena][Ricardo Aravena], +[Richard Hartmann][Richard Hartmann], +[Sergey Kanzhelev][Sergey Kanzhelev], +[Steve Flanders][Steve Flanders], +[Ted Young][Ted Young], +[Tigran Najaryan][Tigran Najaryan], +[Tommy Chong][Tommy Chong], +and [Umair Ishaq][Umair Ishaq].* + + +[Matthew Young]: https://github.com/halcyondude +[Alex Nauda]: @ +[Alois Reitbauer]: https://github.com/aloisreitbauer +[Bartłomiej (Bartek) Płotka]: https://github.com/bwplotka +[Daniel Khan]: @ +[Daniel Prata]: @ +[Lincoln Sward]: @ +[Matthias Loibl]: https://github.com/metalmatze +[Michael Hausenblas]: https://github.com/mhausenblas +[Ricardo Aravena]: https://github.com/raravena80 +[Richard Hartmann]: https://github.com/RichiH +[Sergey Kanzhelev]: @ +[Steve Flanders]: https://github.com/flands +[Ted Young]: @ +[Tigran Najaryan]: @ +[Tommy Chong]: https://github.com/techietommy +[Umair Ishaq]: https://github.com/umairishaq + +## Introduction + +This document describes the purpose and operations of the Cloud Native +Computing Foundation ([CNCF]) Special Interest Group ([SIG]) on Observability. + +This [SIG] focuses on topics pertaining to the observation +of [cloud native][cn-def] workloads. Additionally, it produces supporting +material and best practices for end-users and provides guidance and +coordination for CNCF projects working within the SIG’s scope. + +A full list of [CNCF projects][projs] can be found at [landscape.cncf.io]. + +[cncf]: https://www.cncf.io +[projs]: https://www.cncf.io/projects +[landscape.cncf.io]: https://landscape.cncf.io +[sig]: https://github.com/cncf/toc/blob/master/sigs/cncf-sigs.md +[cn-def]: https://github.com/cncf/toc/blob/master/DEFINITION.md + +## Mission + +Consistent with the CNCF [SIG] definition, the mission of SIG Observability +is to: + +- Foster and grow the ecosystem of observability related projects, users, and + maintainers. +- Identify and report gaps in the CNCF's project portfolio on topics of + observability to the TOC and the wider CNCF community. +- Collect, curate, champion, and disseminate patterns and current best practices + related to the observation of cloud-native systems that are effective and + actionable. +- Educate and inform users with unbiased, accurate, and pertinent information. +- Educate and help other CNCF projects in regarding observability techniques and + practices available within the CNCF. +- Provide and maintain a vendor-neutral venue for relevant thought validation, + discussion, and project feedback. +- Provide a ladder for community members to become involved with the technical + oversight of projects within the SIG's scope in an open, transparent, and + inclusive way. + +## Areas considered in Scope + +Observability focuses on patterns, projects, tools, and techniques related to +topics such as: + +- Methodologies for instrumenting, collecting, processing, storing, querying, + curating, and correlating observational data such as metrics, logging/events, + trace spans, and profiling of cloud native workloads. +- Using distributed trace tooling to observe a series of calls between + microservices to understand where time is being spent. +- Managing the complexity, operational cost, and resource consumption of + observability tools and systems at the enterprise scale. +- Best practices for meaningful alerting, queries, and operational dashboards + including how to manage things including rules, definitions, thresholds and + policies. +- How developers, operators, SRE, IT, and other actors comprehend, process, and + reason on distributed cloud-native systems. +- Projects that incorporate novel & insightful approaches to utilizing + observability data such as: + - ML, model training, Bayesian networks, and other data science techniques + that enable anomaly & intrusion detection. + - correlating resource consumption with costing data to reduce the total cost + of cloud native infrastructure + - Using observability data exposed by service meshes, orchestrators, and other + metric sources to inform continuous deployment tooling (e.g. Canary + Predicates/Judges). +- Objective curation and generation of case studies pertaining to delivering + observability tools/systems to end users. +- Best practices around observability and its continuous improvement, e.g. post + mortems, runbooks +- Provide guidance around and foster interoperability between observability + solutions without trying to enforce one specific standard +- Foster understanding of the prerequisites and corner-stones of observability + like SLI/KPI, service objectives, and internal/external commitments. + +The following is a non-exhaustive sample list of activities and deliverables +that are in-scope for this SIG + +- Summary and overview of projects available in the community. +- Catalog of reference architectures that draw from CNCF projects, combining + them in useful and novel ways. +- Definitions of implementations and patterns for best practices for + delivering observability tooling at enterprise scale. +- Tooling composition and tool chain creation based on existing projects. +- Best practices for operations and monitoring workflows using CNCF Projects. +- Organizing and helping to provide visibility to Meetups, Blogs, and Podcasts + related to the scope of the SIG. +- Guidance for application development and architecture that is observable. +- Replicatable reference architectures. +- Patterns for observing application delivery pipelines. +- Education regarding instrumentation cloud native workloads. +- Processing and Accessing relevant observability data at scale. +- Policy and security controls for observabilty data. +- Creating artifacts as part of CI/CD pipelines that facilitate observation of + services. Concrete examples might be: + - service profiles for Linkerd + - debug binaries or other diagnostic metadata. + - representative trace spans from failing CI tests. + +## Areas considered out of Scope + +Anything not explicitly considered in the scope above. + +Examples include: + +- Datastores that are not primarily used for observability. Those datastores + might be in the scope of SIG Storage. +- Security aspects that need to be present when setting up cloud native + infrastructure, these might be more relevant for SIG Security. +- How cloud native applications that need observability are deployed; this would + fall in the scope of SIG App Delivery +- Tools and projects that are used to run cloud native workloads that in some + cases need observability would fall under the scope of SIG-Runtime. + +## Roadmap & Initial Efforts + +- Contribute to [due diligence reports][ddr] to assist the CNCF TOC for projects + in the scope of the SIG. +- Facilitate webinars and presentations from CNCF projects and domain experts in + the scope of the SIG. +- Formation of [SIG working group(s)][sigwg] as resource capacity and member + contribution allows. + + > _SIGs may choose to spawn focussed and time-limited working groups to achieve some of their responsibilities (for example, to produce a specific educational white paper, or portfolio gap analysis report). Working groups should have a clearly documented charter, timeline (typically a few quarters at most), and set of deliverables. Once the timeline has elapsed, or the deliverables delivered, the working group dissolves, or is explicitly re-chartered._ + +[ddr]: https://github.com/cncf/toc/blob/master/process/due-diligence-guidelines.md +[sigwg]: https://github.com/cncf/toc/blob/master/sigs/cncf-sigs.md#responsibilities--empowerment-of-sigs + +## Governance + +- This SIG follows the [standard operating model][som] provided by the TOC + unless otherwise stated here. + +[som]: https://github.com/cncf/toc/blob/master/sigs/cncf-sigs.md#operating-model + +## Operations + +- Formation of the SIG follows the [documented process][sigform]. +- [Roles][sigroles] for SIG Observability + - TOC Liaison: *Jeff Brewer*\* + - SIG Chairs: Matt Young, *Ricardo Aravena*\* + - Tech Leads: Michael Hausenblas, Bartłomiej Płotka, *Richard Hartmann*\* + +\*_**(TODO: need confirmation)**_ + +[sigform]: https://github.com/cncf/toc/tree/master/sigs#sig-formation-process +[sigroles]: https://github.com/cncf/toc/blob/master/sigs/cncf-sigs.md#sig-member-roles + +- Contact + - Slack channel: #sig-observability @ [https://cloud-native.slack.com](https://cloud-native.slack.com) + - Email List: [cncf-sig-observability@lists.cncf.io](mailto:cncf-sig-observability@lists.cncf.io) +- Meeting Schedule: + - TBD - pending feedback from SIG members + - [https://www.cncf.io/community/calendar](https://www.cncf.io/community/calendar/) From 73267373085c4fce82f2de4dca4f421c31ffd67c Mon Sep 17 00:00:00 2001 From: Matt Young Date: Mon, 2 Mar 2020 23:34:33 -0500 Subject: [PATCH 2/5] Update Proposed Roles with latest updates --- observability-charter.md | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/observability-charter.md b/observability-charter.md index b9b954f..83a3f82 100644 --- a/observability-charter.md +++ b/observability-charter.md @@ -184,12 +184,17 @@ Examples include: ## Operations - Formation of the SIG follows the [documented process][sigform]. -- [Roles][sigroles] for SIG Observability + +_Note: all of the roles below are initial proposals and must accepted by the TOC per formation process_ + +- Proposed [Roles][sigroles] for SIG Observability - TOC Liaison: *Jeff Brewer*\* - - SIG Chairs: Matt Young, *Ricardo Aravena*\* - - Tech Leads: Michael Hausenblas, Bartłomiej Płotka, *Richard Hartmann*\* + - SIG Chairs: pending TOC guidance + - _Nominated by charter working group or self:_ Matt Young, Richard Hartmann, *Ricardo Aravena*\* + - Tech Leads: pending TOC guidance + - _Nominated by charter working group or self:_ Bartłomiej Płotka, *Richard Hartmann*\*, Michael Hausenblas, Alois Reitbauer -\*_**(TODO: need confirmation)**_ +\*_**(need confirmation)**_ [sigform]: https://github.com/cncf/toc/tree/master/sigs#sig-formation-process [sigroles]: https://github.com/cncf/toc/blob/master/sigs/cncf-sigs.md#sig-member-roles From 3e1ff41f5a4a82bd57f037d8456b446603e8cde3 Mon Sep 17 00:00:00 2001 From: Matt Young Date: Tue, 3 Mar 2020 11:09:33 -0500 Subject: [PATCH 3/5] Add Steve Flanders in list of folks interested in Chair role --- observability-charter.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/observability-charter.md b/observability-charter.md index 83a3f82..659697f 100644 --- a/observability-charter.md +++ b/observability-charter.md @@ -190,7 +190,7 @@ _Note: all of the roles below are initial proposals and must accepted by the TOC - Proposed [Roles][sigroles] for SIG Observability - TOC Liaison: *Jeff Brewer*\* - SIG Chairs: pending TOC guidance - - _Nominated by charter working group or self:_ Matt Young, Richard Hartmann, *Ricardo Aravena*\* + - _Nominated by charter working group or self:_ Matt Young, Richard Hartmann, Steve Flanders, *Ricardo Aravena*\* - Tech Leads: pending TOC guidance - _Nominated by charter working group or self:_ Bartłomiej Płotka, *Richard Hartmann*\*, Michael Hausenblas, Alois Reitbauer From 653b33b076e202c91806670a30a61e971f5eaaba Mon Sep 17 00:00:00 2001 From: Matt Young Date: Tue, 3 Mar 2020 11:24:42 -0500 Subject: [PATCH 4/5] Nits and minor formatting updates. Thanks @bwplotka! --- observability-charter.md | 31 ++++++++++++++++--------------- 1 file changed, 16 insertions(+), 15 deletions(-) diff --git a/observability-charter.md b/observability-charter.md index 659697f..f2dd639 100644 --- a/observability-charter.md +++ b/observability-charter.md @@ -101,13 +101,14 @@ topics such as: - Best practices for meaningful alerting, queries, and operational dashboards including how to manage things including rules, definitions, thresholds and policies. -- How developers, operators, SRE, IT, and other actors comprehend, process, and - reason on distributed cloud-native systems. +- How developers, operators, Site Reliability Engineers (SRE), IT Engineers, and + other actors comprehend, process, and reason on distributed cloud-native + systems. - Projects that incorporate novel & insightful approaches to utilizing observability data such as: - ML, model training, Bayesian networks, and other data science techniques that enable anomaly & intrusion detection. - - correlating resource consumption with costing data to reduce the total cost + - Correlating resource consumption with costing data to reduce the total cost of cloud native infrastructure - Using observability data exposed by service meshes, orchestrators, and other metric sources to inform continuous deployment tooling (e.g. Canary @@ -117,19 +118,19 @@ topics such as: - Best practices around observability and its continuous improvement, e.g. post mortems, runbooks - Provide guidance around and foster interoperability between observability - solutions without trying to enforce one specific standard + solutions without trying to enforce one specific standard. - Foster understanding of the prerequisites and corner-stones of observability like SLI/KPI, service objectives, and internal/external commitments. The following is a non-exhaustive sample list of activities and deliverables -that are in-scope for this SIG +that are **in-scope** for this SIG - Summary and overview of projects available in the community. - Catalog of reference architectures that draw from CNCF projects, combining them in useful and novel ways. - Definitions of implementations and patterns for best practices for delivering observability tooling at enterprise scale. -- Tooling composition and tool chain creation based on existing projects. +- Tooling composition and toolchain creation based on existing projects. - Best practices for operations and monitoring workflows using CNCF Projects. - Organizing and helping to provide visibility to Meetups, Blogs, and Podcasts related to the scope of the SIG. @@ -138,11 +139,11 @@ that are in-scope for this SIG - Patterns for observing application delivery pipelines. - Education regarding instrumentation cloud native workloads. - Processing and Accessing relevant observability data at scale. -- Policy and security controls for observabilty data. +- Policy and security controls for observability data. - Creating artifacts as part of CI/CD pipelines that facilitate observation of - services. Concrete examples might be: - - service profiles for Linkerd - - debug binaries or other diagnostic metadata. + services. Concrete examples might be: + - Service profiles for Linkerd. + - Debug binaries or other diagnostic metadata. - representative trace spans from failing CI tests. ## Areas considered out of Scope @@ -156,7 +157,7 @@ Examples include: - Security aspects that need to be present when setting up cloud native infrastructure, these might be more relevant for SIG Security. - How cloud native applications that need observability are deployed; this would - fall in the scope of SIG App Delivery + fall in the scope of SIG App Delivery.` - Tools and projects that are used to run cloud native workloads that in some cases need observability would fall under the scope of SIG-Runtime. @@ -183,16 +184,16 @@ Examples include: ## Operations -- Formation of the SIG follows the [documented process][sigform]. +- The formation of the SIG follows the [documented process][sigform]. -_Note: all of the roles below are initial proposals and must accepted by the TOC per formation process_ +_Note: all of the roles below are initial proposals and must be accepted by the TOC per formation process_ - Proposed [Roles][sigroles] for SIG Observability - TOC Liaison: *Jeff Brewer*\* - SIG Chairs: pending TOC guidance - - _Nominated by charter working group or self:_ Matt Young, Richard Hartmann, Steve Flanders, *Ricardo Aravena*\* + - _Nominated by the charter working group or self:_ Matt Young, Richard Hartmann, Steve Flanders, *Ricardo Aravena*\* - Tech Leads: pending TOC guidance - - _Nominated by charter working group or self:_ Bartłomiej Płotka, *Richard Hartmann*\*, Michael Hausenblas, Alois Reitbauer + - _Nominated by the charter working group or self:_ Bartłomiej Płotka, *Richard Hartmann*\*, Michael Hausenblas, Alois Reitbauer \*_**(need confirmation)**_ From 94d29b2aa3eb857457a64a533c19fe13789398b8 Mon Sep 17 00:00:00 2001 From: Matt Young Date: Wed, 1 Apr 2020 00:42:23 -0400 Subject: [PATCH 5/5] Add TOC Liason: Brendan Burns --- observability-charter.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/observability-charter.md b/observability-charter.md index f2dd639..c469628 100644 --- a/observability-charter.md +++ b/observability-charter.md @@ -189,14 +189,12 @@ Examples include: _Note: all of the roles below are initial proposals and must be accepted by the TOC per formation process_ - Proposed [Roles][sigroles] for SIG Observability - - TOC Liaison: *Jeff Brewer*\* + - TOC Liaison: [Brendan Burns](https://github.com/brendandburns) (bburns@microsoft.com) - SIG Chairs: pending TOC guidance - _Nominated by the charter working group or self:_ Matt Young, Richard Hartmann, Steve Flanders, *Ricardo Aravena*\* - Tech Leads: pending TOC guidance - _Nominated by the charter working group or self:_ Bartłomiej Płotka, *Richard Hartmann*\*, Michael Hausenblas, Alois Reitbauer -\*_**(need confirmation)**_ - [sigform]: https://github.com/cncf/toc/tree/master/sigs#sig-formation-process [sigroles]: https://github.com/cncf/toc/blob/master/sigs/cncf-sigs.md#sig-member-roles