Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Function Execution Model #30

Merged

Conversation

PineappleIOnic
Copy link
Member

This is a draft for the RFC which will form the basis of the new Function Execution Model

Copy link
Member

@lohanidamodar lohanidamodar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some queries and feedback

The executor will also act like a router itself, when API Requests that are for functions are received it will be the executor's job to correlate them to the relevant runtimes and forward them for processing. When the Runtime's web server is done processing the request and returns a response the executor will return the response to whatever client requested it.

#### Scaling
Scaling this solution will need to be worked on in the RFC but a solution I thought of would be to have the executor to keep track of resources being used by runtimes and the total resources being used hits a certain threshold another instance of the runtime will be spun up and requests will be load balanced onto the multiple runtime instances.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the executor will act as both router and load balancer as well?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like Fission's executor has two execution types, depending on if the serverless function needs to scale due to load. I think this was their solution to avoid having the executor also act as load balancer.
ref: https://docs.fission.io/docs/architecture/executor/#new-deployment


The executor will have multiple tasks but the main one is to spin up runtimes when they are needed and to update their code whenever functions are updated. Runtimes will now be web servers based of their relevant languages and when code is added it will be added to the Runtime web server's router.

The executor will also act like a router itself, when API Requests that are for functions are received it will be the executor's job to correlate them to the relevant runtimes and forward them for processing. When the Runtime's web server is done processing the request and returns a response the executor will return the response to whatever client requested it.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here we need more technical description of how the executor it self will look, may be API signatures,
how worker/executor will communicate, some sample structures etc?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree here


- A new process will be introduced to execute functions
- All current runtimes will be updated to use a web server as it's core
- The Functions worker will be stripped of all direct docker interactions and will instead call the executor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be we can also talk about how performance will be impacted after this change?


With this redesign of how cloud functions work the functions worker will no longer have to deal with spinning up docker containers but will instead use the executor server. Which will handle spinning up containers on behalf of the functions worker.

The executor will have multiple tasks but the main one is to spin up runtimes when they are needed and to update their code whenever functions are updated. Runtimes will now be web servers based of their relevant languages and when code is added it will be added to the Runtime web server's router.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the executor also be responsible for cleaning up function containers when they're no longer necessary?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should, but with a simple timer that removes all runtimes that didn't get a call every X minutes. The check can also run in an interval for every X minutes.


- A new process will be introduced to execute functions
- All current runtimes will be updated to use a web server as it's core
- The Functions worker will be stripped of all direct docker interactions and will instead call the executor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the Functions worker is stripped of all its direct Docker work, what will be left?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scheduled and async executions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the Architecture diagram, there's no significant workflow difference between sync and async functions - in fact, the Functions worker doesn't appear to have any function at all if we follow that document.

With so few responsibilities, I think it's confusing to have both a Functions worker (responsible for very little) alongside the Executor, which does basically everything else. Maybe there's a more appropriate name for the Functions worker after this change?

Comment on lines 55 to 60
3. If the request is asyncronous then return the API request now with a execution ID like it does now.
4. When we receive a response back from the runtime we will do 1 of two things
If the request is synchronous:
- Wait for the response from the runtime and return it as the response to the API request.
If the request is asynchronous:
- Store the result for later use through the Get Execution API endpoint.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is a function defined as synchronous or asynchronous? Would this be an extra param on POST /v1/functions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing this endpoint should do is either execute the function directly using the executor (sync http request), or send the execution to the workers queue (async - similar to the current implementation).

No need for this endpoint to know if runtime is up or not, the executor will make sure the execution is successful from both the Appwrite API or worker.

Copy link
Member Author

@PineappleIOnic PineappleIOnic Aug 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this will be a extra parameter on POST /v1/functions/{id}/execute I'll update the better RFC to reflect this

5. Finally bring the runtime back up and start the server again.

#### Watchdog
Watchdog will be apart of the executor running in the background dealing with all the running runtimes and monitoring their health.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So how many processes will be running in the Executor container? Best Docker practice is to limit one process per container.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with @kodumbeats. This is a big overhead and probably not the best pattern for a Docker container.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Watchdog could be split away into it's own process if that is the case 👍

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed watchdog from the RFC for now, so we can better focus on the core of the implementation

Watchdog will be apart of the executor running in the background dealing with all the running runtimes and monitoring their health.
Every X minutes (defaults will need to be determined) watchdog will call a health endpoint which will be built into all enviroment web servers.
If the web server returns a '200' then the runtime is health and watchdog will continue onto the next runtime however if the runtime does not respond in a set time or responds incorrectly then the runtime will be restarted and watchdog will continue monitoring it.
If after a certain amount of restarts the runtime still does not respond correctly then watchdog will log a error and inform the user that something is wrong
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this watchdog error propagate back to the user? Will we need additional endpoints for v1/functions/:functionId/executions for this?

- Scaling this solution even more
- Using things such as [Amazon Firecracker](https://firecracker-microvm.github.io/)
- Will the executor handle load balencing itself or will we use another service to deal with it?
- More technical indepth discussion is needed for figuring out API's
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we leaving this as an unresolved question? This is pretty core to the RFC overall.

The executor will also act like a router itself, when API Requests that are for functions are received it will be the executor's job to correlate them to the relevant runtimes and forward them for processing. When the Runtime's web server is done processing the request and returns a response the executor will return the response to whatever client requested it.

#### Scaling
Scaling this solution will need to be worked on in the RFC but a solution I thought of would be to have the executor to keep track of resources being used by runtimes and the total resources being used hits a certain threshold another instance of the runtime will be spun up and requests will be load balanced onto the multiple runtime instances.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the executor store any of this data, or will we just be querying the Docker API for stats? Will we want to monitor with another solution?
ref: https://docs.docker.com/engine/api/v1.41/#operation/ContainerStats

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would try to avoid dealing with the scaling problem at this layer, although it might have some advantages. I think this can be part of a completely separate PR/RFC for autoscaling Cloud Functions or other Appwrite components.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed the Scaling section for now

- Use Cases
- Goals
- Deliverables
- Changes to documentation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we outline the necessary documentation we'll want to release alongside this feature?


#### Flowchart Visualisation:

![Flowchart Visualisation](flowchart.png)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this flowchart not include how the Functions worker fits into the new execution model?

#### Flowchart Visualisation:

![Flowchart Visualisation](flowchart.png)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
This part is also not correct. We don't necessarily need to start the runtime when a tag is activated. It's easier to delegate this responsibility to the first execution. We also don't need to stop the previous runtime. It can be stopped automatically after inactivity by the executor.

The executor will also act like a router itself, when API Requests that are for functions are received it will be the executor's job to correlate them to the relevant runtimes and forward them for processing. When the Runtime's web server is done processing the request and returns a response the executor will return the response to whatever client requested it.

#### Scaling
Scaling this solution will need to be worked on in the RFC but a solution I thought of would be to have the executor to keep track of resources being used by runtimes and the total resources being used hits a certain threshold another instance of the runtime will be spun up and requests will be load balanced onto the multiple runtime instances.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would try to avoid dealing with the scaling problem at this layer, although it might have some advantages. I think this can be part of a completely separate PR/RFC for autoscaling Cloud Functions or other Appwrite components.

Comment on lines 55 to 60
3. If the request is asyncronous then return the API request now with a execution ID like it does now.
4. When we receive a response back from the runtime we will do 1 of two things
If the request is synchronous:
- Wait for the response from the runtime and return it as the response to the API request.
If the request is asynchronous:
- Store the result for later use through the Get Execution API endpoint.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only thing this endpoint should do is either execute the function directly using the executor (sync http request), or send the execution to the workers queue (async - similar to the current implementation).

No need for this endpoint to know if runtime is up or not, the executor will make sure the execution is successful from both the Appwrite API or worker.

2. Check if the server is running, if it is then stop the server.
3. also check if the runtime exists, if it doesn't then create the runtime.
4. Map the new code into the runtime environment using a volume
5. Finally bring the runtime back up and start the server again.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again the Appwrite API shouldn't care if a runtime is up or no. Actually this endpoint shouldn't change at all at the scope of the RFC.

The executor is the only component interacting and responsible for starting a runtime, checking its status or bringing it down.

4. Map the new code into the runtime environment using a volume
5. Finally bring the runtime back up and start the server again.

#### Watchdog
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is out of the context of this RFC. The usage of watchdog as a component in Appwrite will create a tight coupling between the application and monitoring layer + docker can potentially handle this failures. This is also not clear how watchdog is suppose to play well in the bigger picture of our Architecture. Implementing something like watchdog is not something I think we should discuss as part of this RFC.

Copy link
Contributor

@kodumbeats kodumbeats left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get a new architecture diagram in place for the executor, as it contains many of the implementation details we want for this RFC :)


`POST /v1/functions/{id}/executions/` - Execute function API endpoint

The standard execution endpoint for functions will have functionality added for syncronous executions aswell as a new parameter in the JSON body to determine if this execution will be syncronous or asyncronous, this parameter will be called `isSync` and will be a boolean.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that this param should be on the POST /executions route, but rather the POST /tags endpoint. Since sync and async serverless functions will be written differently (async scripts write to console log, sync scripts handle requests&responses), their execution models can't be interchanged. I think that we should store this fact in the same data structure as the rest of the tag info.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, that sounds like a good idea

Comment on lines 58 to 60
1. Extract the code tarball into the enviroment.
2. also check if the runtime exists, if it doesn't then create the runtime.
3. Map the new code into the runtime environment using a volume
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This work happens on first execution for async functions as of now. Let's keep this behavior for sync functions as well?

@PineappleIOnic PineappleIOnic marked this pull request as ready for review November 29, 2021 10:16
@PineappleIOnic PineappleIOnic changed the title [DRAFT] New Function Execution Model New Function Execution Model Jun 9, 2022
@TorstenDittmann TorstenDittmann merged commit 4a2bc63 into appwrite:main Jun 9, 2022
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants