Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elafros resources: the missing link #412

Closed
steren opened this issue Mar 17, 2018 · 13 comments
Closed

Elafros resources: the missing link #412

steren opened this issue Mar 17, 2018 · 13 comments
Assignees
Labels
area/API API objects and controllers

Comments

@steren
Copy link
Contributor

steren commented Mar 17, 2018

Here is an illustration showing a very common use case: Within one cluster, developers deployed, and later updated, two different "components of their architecture", for examples, a "function that implements an API endpoint" and a "web frontend".
As a result, from a API resource perspective, there are two Routes, each of them pointing to a different Revision that has been created from a Configuration:

for github_ prime api resources

Notice the orange boxes:
In our example, they represent the "function" and the "web Frontend". They are what Google Cloud Functions or AWS Lambda would call today a "Function". Or what App Engine would call a "Service", and Cloud Foundry an "Application".

We are currently performing some user research to understand how users think about these and how they would name them.

Problem statement

Problem 1: A missing higher level concept

Conceptually, neither the Route nor the Configuration are the top level entities that map exactly to the concept used in my introduction (the "function" and the "web frontend"). In most cases, we want developers to think first about the orange boxes, and later, if needed about the Route and Configuration(s).

Saying that an orange box is equivalent to a Route is not strictly correct because there is no parent-children or ownership relationship enforced by Elafros between a Route and Configuration(s). A Configuration can live completely detached from a Revision.

Problem 2: Name

Neither the names "Route" or "Configuration" are a good fit for this "orange box", these terms should not be the top level "thing" to create or the main collection of things our end users operate against.

I am opening this issue to evaluate solutions to these problems.

@mikehelmick mikehelmick self-assigned this Mar 18, 2018
@mattmoor mattmoor added the area/API API objects and controllers label Mar 19, 2018
@mikehelmick
Copy link
Contributor

mikehelmick commented Mar 23, 2018

Problem Summary

In summary, there is a desire to have a more structured grouping than provided in the Elafros API. This is appealing from the standpoint of product simplicity and potentially reduced cognitive load on our users, leading to a more elegant CLI/UI experience.

This is a proposed solution to Problem 1 as stated above.

Problem 2 will be addressed in a separate comment for that reason, we will refer to this resource simply at “Steve” for purposes of this proposal.

Proposed Solution

Steve will exist as a higher level resource that users interact with in order to make changes to their compute services, rather than interacting with Route and Configuration directly. The ability to restrict muate access to Route and Configuration is up to the cluster operator.

Steve is a composition of 2 pieces of information, a name, and a rollout policy. There will be several kind of rollout policies defined, but Steve can only accept one at a time. This is inspired by the k8s core Volume and associated kinds that appear in it.

Steve is a new CRD that will be hosted at github.com/elafros/steve along with a full OSS implementation of Steve. Use of Steve is optional in Elafros.
While we envision many possibilities for interaction with Steve, we will start with two basic scenarios.

Atomic Rollout Policy

There is a single name for Steve, which creates a corresponding Route, and a single Configuration object. When the Steve object is created, a Route and Configuration will be created, and the appropriate metadata.ownerReferences values will be set.

In this scenario, the user creates a Steve object with a simple rollout type (runLatest) and a single Configuration. In the example below, this results in 4 objects being created: The specified Steve object, a route named “my-function”, a configuration named "my-function", and the first revision. When the first revision is created, it will begin serving.

POST /apis/elafros.dev/v1alpha1/namespaces/default/steve

apiVersion: elafros.dev/v1alpha1
kind: Steve
metadata:
  name: my-function 
spec:
  # With runLatest set, the latest ready revision from the configuration
  # will always be set to 100% traffic allocation in the route. 
  runLatest:
    configuration:
      ...
    
status:
  ... 

Manual

If a user finds themselves in a broken state, they can switch their Steve to have a rollout policy of manual. Manual requires a specific revision, which will upon reconciliation be immediately set to 100% in the route, and no further automatic changes will happen until the Steve is switched to a different kind.

Upon reconciliation, the Route is adjusted to point 100% to the specified revisionName. The revisionName provided must belong to the configuration contained by Steve.

apiVersion: elafros.dev/v1alpha1
kind: Steve
metadata:
  name: my-function 
spec:
  manualRollout:
    revisionName: abc
    configuration:
      ...
status:
  ... 

Changing Between Steves

In general, changing between different specializations of Steve (different rollout policies) is a fine and normal thing to do..

For the case of a user that is in the simple case, using runLatest, and they have a bad push. In order to recover from the bad push, they switch the rollout policy to “manual” and pin their Steve to a specific revision. When the user has sufficiently patched their configuration, they simply change Steve back to runLatest and resume the release train.

Future Expansion

These are just two example rollout policies. There are other rollout policies that will be proposed as individual issues once the new repository is created.

@mattmoor
Copy link
Member

I don't like the name Manual in this context. I would suggest Pinned or something along those lines.

To elaborate on that a bit, I think that Steve represents a higher-level "easy-mode" that groups a whole bunch of concepts together. I see Manual as the user shedding Steve and taking control of the lower-level resources themselves, but even here Steve presents value as a grouping construct (e.g. UX, cascaded deletion, ...).

Ordinarily, my expectation is that when Steve is managing the Route and Configuration, that if something reaches around Steve to manipulate those resources that the Steve controller would quickly correct them.

I'd propose an alternate concept of Manual (perhaps in addition to Pinned) that users can change to/from that let's them take control of the underlying resources:

apiVersion: elafros.dev/v1alpha1
kind: Steve
metadata:
  name: my-function 
spec:
  # The underlying resources become the source of truth.
  manual: {}

If a user changes to Manual, then Steve simply stops managing them and edits are allowed.

If a user changes from Manual, then Steve simply makes the state of the world that of the new specification.

This is a bit weird for Kubernetes objects, but I find this variation surprisingly appealing. In particular, the capacity to switch between this model (Some Ops) and a more managed model (No Ops).

To be clear: I would expect UX to degrade in Manual mode, since it is a lower-level of abstraction.

@vaikas-google @mikehelmick @steren WDYT?

@steren
Copy link
Contributor Author

steren commented Mar 27, 2018

I support allowing access to the underlying resources. This is probably better for tooling compatibility.

I am convinced that the only thing we need is better grouping.

We could consider Steve to be a new resource that has label selectors on Configurations and Routes. (like Kubernetes' Services do with Pods)
This would allow us to easily list Steves, while having Routes and Configuration still as independent resources.

@asciimike
Copy link

asciimike commented Mar 28, 2018

Chatted w/ @mikehelmick and I think we have a nice way to represent automatic and manual Steve that should meet these requirements.

Automatic Steve

This is a copy from @mikehelmick's design above, with the addition of label selectors, and labels being required on resources being created. If the Steve controller creates additional resources, those must have steve labels as well.

apiVersion: elafros.dev/v1alpha1
kind: Steve
metadata:
  name: my-function 
spec:
  selector:
    steve: my-function
  runLatest:
    configuration:
        metadata:
          labels:
            steve: my-function
        spec: 
          ...

Manual Steve

Manual mode is just the label selector piece of automatic Steve. Note that resources will still have the steve label, such that they can be grouped in a Steve.

apiVersion: elafros.dev/v1alpha1
kind: Steve
metadata:
  name: my-function 
spec:
  selector:
    steve: my-function

----------------

apiVersion: elafros.dev/v1alpha1
kind: Configuration
metadata:
  name: my-function 
  labels:
    steve: my-function
spec:
  ...

Switching between modes

Automatic to manual: Automatic Steve will create labels (and/or owner references) on all resources, thus Steve as a grouping concept will remain even after Steve as a simplification concept is no longer needed.

Manual to automatic: Developers will need to re-provide the appropriate Route(s)/Configuration(s), and any resources changed in manual mode will be re-set to the desired configuration.

In either case, the same UI and CLI tools should be able to view both flavors of Steve.

@tliberman
Copy link

Last week I conducted research with 47 developers to assess names for Steve.

As context for the naming exercises, participants were given a walkthrough of a basic UI, CLI, and the diagram @steren shared above. They were then asked to provide their own name for Steve:

chart 1

After providing their own suggestions, participants completed a MaxDiff survey in order to demonstrate their preferences toward some names that were being considered. MaxDiff analysis was performed using this code from E. Bahna, & C. Chapman (2018). "Constructed, Augmented MaxDiff." In B. Orme, ed. (forthcoming), Proceedings of the 2018 Sawtooth Software Conference, Orlando, FL. March 2018.

Result of the MaxDiff Steve naming exercise:
screen shot 2018-03-28 at 3 22 21 pm

Overall, “service” and “component” were the top two candidates both when names were solicited as well as in the MaxDiff preference exercise, with “service” performing slightly better.

@steren
Copy link
Contributor Author

steren commented Mar 28, 2018

Thanks for driving this study @tliberman.

This is a strong case for Service.
But as we know Kubernetes is already defining the concept of Service.

After consulting with @kelseyhightower, we suggest to use Service under a namespace:

  • Kubernetes resource namespaces have been introduce for that
  • we prefer to start with the name that makes more sense based on user data, and leave room for community feedback.

This means that Steve's full name is elafros.dev/Service.
And when the Elafros context is obvious (e.g. in our tooling or docs), we can refer to it as Service.

This solves Problem 2: Name described in my original issue.

@mikehelmick
Copy link
Contributor

I spoke with several of you offline.

The proposed solution still stands, with a name change to service (described above)

What I had named as manual before will become pinned and we will also specify a new manual mode where the user is free to interact with the resources created by the service object directly, and the controller for the service will not try to correct things.

We will be setting up this resource in a new repository, likely elafros/service.

I will close this issue once that repository is created.

@mattmoor
Copy link
Member

One challenge of the name-collision is the kubectl experience, where typing kubectl get services won't WAI (I think you'd need to fully-qualify elafros.dev/v1alpha1/Service). Something I'd suggest here (and possibly for the other CRDs) is that we adopt "shortNames" in our CRD descriptors, which allow you to alias the resource:

    # shortNames allow shorter string to match your resource on the CLI
    shortNames:
    - ela-svc

(not sure if hyphens are allowed)

I've been debating revision => rev and configuration => cfg for some time to save myself some typing, but here I think it's especially useful.

vaikas pushed a commit that referenced this issue Apr 3, 2018
* add types for Service and generate related code. Issue #412

* add types for Service and generate related code. Issue #412

* Convert tabs to spaces, reformat service_types.go

* add tests for service types

* convert tabs to spaces

* fix spelling mistakes in service definition

* rerun codegen after fixing spelling in json types
@bobcatfish
Copy link
Contributor

I've created #591 to add this functionality to the conformance tests.

@vaikas
Copy link
Contributor

vaikas commented Apr 5, 2018 via email

@mattmoor
Copy link
Member

The initial type definition and skeleton controller have gone in, and we're starting to break out smaller issues for missing things (e.g. docs), so I'm going to optimistically close this issue.

If there is some reason for it to still exist, please reopen with a comment explaining why.

@ergannon
Copy link

To expand on the research @tliberman shared and further assess our object model, we conducted a study with 19 participants: 9 with mostly Operator responsibilities, 8 with mostly Developer responsibilities, and 2 with an even split of both.

Qualitative data demonstrated that participants define a service as an entity that operates independently, is opaque to the consumer, sends and receives requests, and performs a defined business function. A diagram labeling exercise revealed that service was the most commonly chosen term to describe the Knative service|steve concept, and load balancer was the most common term for the Kubernetes service concept. These findings were consistent when sampling moderate to advanced Kubernetes users in a follow-up study. See the full methodology and results here.

@evankanderson
Copy link
Member

evankanderson commented Mar 21, 2019

Has any of this research been shared with the Kubernetes sig-apps group?

Chairs

The Chairs of the SIG run operations and processes governing the SIG.

markusthoemmes pushed a commit to markusthoemmes/knative-serving that referenced this issue Mar 11, 2020
This patch contains following changes:
- to remove istio related resources
- to bump servleress-operator to v1.5.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/API API objects and controllers
Projects
None yet
Development

No branches or pull requests

9 participants