Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[New Scheduler] Run scheduler #5194

Merged
merged 12 commits into from
Feb 11, 2022
Merged

[New Scheduler] Run scheduler #5194

merged 12 commits into from
Feb 11, 2022

Conversation

style95
Copy link
Member

@style95 style95 commented Jan 18, 2022

Description

This PR includes changes to run the new scheduler.
I could run the scheduler on MacOS with these changes.

My changes affect the following components

  • API
  • Controller
  • Message Bus (e.g., Kafka)
  • Loadbalancer
  • Scheduler
  • Invoker
  • Intrinsic actions (e.g., sequences, conductors)
  • Data stores (e.g., CouchDB)
  • Tests
  • Deployment
  • CLI
  • General tooling
  • Documentation

Types of changes

  • Bug fix (generally a non-breaking change which closes an issue).
  • Enhancement or new feature (adds new functionality).
  • Breaking change (a bug fix or enhancement which changes existing behavior).

Checklist:

  • I signed an Apache CLA.
  • I reviewed the style guides and followed the recommendations (Travis CI will check :).
  • I added tests to cover my changes.
  • My changes require further changes to the documentation.
  • I updated the documentation where necessary.

@style95 style95 changed the title Run scheduler [New Scheduler] Run scheduler Jan 18, 2022
@style95 style95 added the wip label Jan 18, 2022
@style95
Copy link
Member Author

style95 commented Jan 18, 2022

I am trying to deploy this version to a distributed environment and add a how-to-deploy document.

receive-buffer-size = 3151796b
maximum-frame-size = 3151796b
remote {
artery {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the Akka version is upgraded, it is now using artery rather than netty which is deprecated.
https://doc.akka.io/docs/akka/current/remoting.html#classic-remoting-deprecated-

BasicHttpService.startHttpService(FPCSchedulerServer.instance(scheduler).route, port, httpsConfig)(actorSystem)
Http()
.newServerAt("0.0.0.0", port = rpcPort)
.bind(scheduler.serviceHandlers)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to bind the gRPC port is used for invokers to communicate with schedulers to fetch activations.

vars:
akka_env:
"CONFIG_akka_actor_provider": "{{ scheduler.akka.provider }}"
"CONFIG_akka_remote_artery_canonical_hostname":
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is required to specify the artery configurations.

"JMX_REMOTE": "{{ jmx.enabled }}"
"PORT": "8080"

"WHISK_SCHEDULER_ENDPOINTS_HOST": "{{ ansible_host }}"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These three data are used to put a scheduler endpoint to ETCD so that other components can refer to.

@@ -435,8 +442,9 @@ metrics:

user_events: "{{ user_events_enabled | default(false) | lower }}"

durationChecker:
timeWindow: "{{ duration_checker_time_window | default('1 d') }}"
zerodowntimeDeployment:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This configuration is used to deploy all components without any downtime in downstream.
Since there are no corresponding ansible steps on the controller and invoker sides, it would not work but I kept it as is.
We can add the required changes in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
zerodowntimeDeployment:
zeroDowntimeDeployment:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How complicated are the required changes? And how exactly does the zero downtime work?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just want to focus on running the new scheduler in this PR.
I believe it's not that difficult.

The procedure is like

  1. Disable some invokers.
  2. Check if any activations running on them.
  3. If yes, wait until they are complete. If not, redeploy invokers.
  4. Disable some schedulers.
  5. Check if any activations are in the schedulers.
  6. If yes, wait until they are forwarded to other schedulers. If not, redeploy schedulers.
  7. Remove some controllers from Nginx.
  8. Disable controllers.
  9. Wait until all activations finish.
  10. Redeploy controllers.
  11. Repeat 1~10 steps until most of the components are redeployed.
  12. For the last controller, scheduler, invoker, change the order to controller -> scheduler -> invoker.

@@ -40,6 +40,10 @@ dependencies {
compile "com.typesafe.akka:akka-actor_${gradle.scala.depVersion}:${gradle.akka.version}"
compile "com.typesafe.akka:akka-stream_${gradle.scala.depVersion}:${gradle.akka.version}"
compile "com.typesafe.akka:akka-slf4j_${gradle.scala.depVersion}:${gradle.akka.version}"
compile "com.typesafe.akka:akka-cluster_${gradle.scala.depVersion}:${gradle.akka.version}"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing to clarify here is, in our(downstream) production system, we are still sticking to akka-2.5.26 and netty-based transport for akka-remote.
So there can be some differences.

queueManager:
maxSchedulingTime: "{{ scheduler_maxSchedulingTime | default('20 second') }}"
maxRetriesToGetQueue: "{{ scheduler_maxRetriesToGetQueue | default(13) }}"
queue:
# the queue's state Running timeout, e.g. if have no activation comes into queue when Running, the queue state will be changed from Running to Idle and delete the decision algorithm actor
idleGrace: "{{ scheduler_queue_idleGrace | default('20 seconds') }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought you said the default was to remove an idle queue after 24 hours?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 24 hours does not fit all cases.
I mentioned it because we are using 24 hours.

I am ok to change the default value but I believe each downstream will also choose the proper timeout other than the default.

The default configuration here means a queue will become idle after 20 seconds and be terminated after another 20 seconds.
In the idle status, the queue will not run any decision-making to add containers and get back to running whenever a new activation comes.

"LIMITS_ACTIONS_SEQUENCE_MAXLENGTH": "{{ limits.sequenceMaxLength }}"

"CONFIG_whisk_couchdb_protocol": "{{ db.protocol }}"
"CONFIG_whisk_couchdb_host": "{{ db.host }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't remember what does the scheduler need to communicate with couchdb for if anything?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going through the code I see one has to get concurrentInvocations limit to track throttling in the scheduler and sometimes the scheduler fails the activation in the queue which writes the activation in the queue and then acks to the controller before ever sending to the invoker

"{{ kafka.ssl.keystore.password }}"
"ZOOKEEPER_HOSTS": "{{ zookeeper_connect_string }}"

"LIMITS_ACTIONS_INVOKES_PERMINUTE": "{{ limits.invocationsPerMinute }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these throttling limits needed to be configured for the scheduler component?


- name: populate environment variables for scheduler
set_fact:
env:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like there's a lot of configs in here not used by the scheduler, but I could be wrong. Just confusing for what's actually used by the scheduler and what is just copy paste from other components that doesn't get used

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would review all configurations again.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reviewed all configurations and removed unnecessary things.

One thing I am uncertain about is this one.
c2e57c1#diff-168a0313e0b46601cf796cd58db3a422a426015b91de821de1951bc88c62db7cR186

It is also being used in a controller, but I couldn't find any reference.
https://github.com/apache/openwhisk/blob/master/ansible/roles/controller/tasks/deploy.yml#L234

@bdoyle0182 Do you have any idea?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the time to update this, this will make my life easier digesting what we need to setup in config since we don't use ansible

@@ -197,6 +209,11 @@ whisk {
retention-bytes = 1073741824
retention-ms = 3600000
}
actions {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what produces to and consumes from this topic? And what is produced?

edit: Looking at the Scheduler.scala file on line 332 it seems like this config is just used as the configuration for the scheduler[idx] topics, not actually topics created called actions? With that said should this config be renamed?

Copy link
Member Author

@style95 style95 Jan 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this configuration is for topics, schedulerN and these topics are used to forward activations from controllers to schedulers.
I feel it would be better to rename it to something like activations or schedulers.

@@ -46,7 +46,8 @@ class FPCPoolBalancer(config: WhiskConfig,
extends LoadBalancer {

private implicit val executionContext: ExecutionContext = actorSystem.dispatcher
private implicit val requestTimeout: Timeout = Timeout(5.seconds)
// This value is given according to the total waiting time at QueueManager for a new queue to be created.
private implicit val requestTimeout: Timeout = Timeout(8.seconds)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this raised because it took longer than 5 seconds to create for you? If so should it be raised quite a bit higher?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just increased this to align it with the retry timeout at the queue manager.
https://github.com/apache/openwhisk/blob/master/core/scheduler/src/main/scala/org/apache/openwhisk/core/scheduler/queue/QueueManager.scala#L421

In the default configuration, the queue manager waits for a new queue to be created for around 8 seconds.

Regarding this value, it should not be too high because if a target scheduler becomes unresponsive, the controller should try to create a queue against another healthy scheduler.

This change might need to be reverted because controllers would retry to create queues after 8 seconds but the activations would already be dropped by the queue manager.

@style95
Copy link
Member Author

style95 commented Jan 24, 2022

I quickly run benchmarks with the new scheduler and got the following results.

Test Environments

  • 3 controllers(VM): 8 cores, 16GB memory
  • 3 schedulers(PM): 40 cores, 128GB memory
  • 10 invokers(VM): 8 cores, 16GB memory.
    • UserMemory: 10240MB
    • Total 400 containers when utilizing 256GB-memory containers

I invoked a different number of actions as follow:

100 actions.

image

1 actions.

image

There are some differences between the two.
For the record, the upstream version is now different from the downstream version.
It is using a different version of the Akka family and there is also a subtle difference in the code base too.

In our downstream version, I could observe around 14000TPS in the same environment for both cases(1 action / 100 actions).

In the upstream version, as you can see, it shows more TPS in the 100 actions case while it shows poor performance in the 1 action case.
In the 100 actions case, it utilized all containers, while only a few containers were utilized in the 1 action case.

I feel there is still room for improvement in terms especially in terms of performance.
But anyway, I could confirm it is working as expected with regards to functionalities.

I would also run the same benchmark with the old scheduler and compare the performance.

@style95
Copy link
Member Author

style95 commented Jan 28, 2022

I ran the same benchmark against the upstream master.
The test environment is same.

Test Environments

  • 3 controllers(VM): 8 cores, 16GB memory
  • 3 DBs(PM): 40 cores, 128GB memory
  • 10 invokers(VM): 8 cores, 16GB memory.
    • UserMemory: 10240MB
    • Total 400 containers when utilizing 256GB-memory containers

1 actions.

image

100 actions.

image
There are a couple of errors because blocking calls became non-blocking calls after the 60s and the response code was 202 rather than 200.

2022-01-28 15:24:45,640 INFO  https://10.105.188.179/api/v1/namespaces/guest/actions/noop44?blocking=true&result=true&volatile=true -> 202 Accepted, 51 bytes
2022-01-28 15:24:45,641 ERROR 
Expected: is <200>
     got: <202>

java.lang.AssertionError: 
Expected: is <200>
     got: <202>

Copy link
Member Author

@style95 style95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiangpengcheng @ningyougang
Could you cross-check the unnecessary configurations?
I could run the scheduler without those configurations.

c2e57c1


- name: populate environment variables for scheduler
set_fact:
env:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reviewed all configurations and removed unnecessary things.

One thing I am uncertain about is this one.
c2e57c1#diff-168a0313e0b46601cf796cd58db3a422a426015b91de821de1951bc88c62db7cR186

It is also being used in a controller, but I couldn't find any reference.
https://github.com/apache/openwhisk/blob/master/ansible/roles/controller/tasks/deploy.yml#L234

@bdoyle0182 Do you have any idea?

@style95 style95 removed the wip label Feb 3, 2022
@style95
Copy link
Member Author

style95 commented Feb 3, 2022

It's ready to review.

@bdoyle0182
Copy link
Contributor

I'm not sure about the config you mentioned me on. As far as I know, we don't use it anywhere. My guess was it was for a prefix to find runtimes in your registry based on what's defined in your manifest. So maybe it's needed in both? Or it's an outdated config that isn't even used anymore.

**common/scala/src/main/resources**
```
whisk.spi {
ArtifactStoreProvider = org.apache.openwhisk.core.database.CouchDbStoreProvider
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mongodb is technically supported now too right for this? It's just documentation example so not a big deal

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think so there is no sepecific dependency with CouchDB.
But I haven't tried with MongoDB.


**common/scala/src/main/resources**
```
whisk.spi {
Copy link
Contributor

@bdoyle0182 bdoyle0182 Feb 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't a provider for the scheduler db be required here? i.e. ETCDProvider?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have the ETCDProvider?
If we need to use a key-value store other than ETCD, we can introduce the layer too.
But for now, I believe the code is sticking with ETCD.

You can enable the new scheduler of OpenWhisk.
It will run one more component called "scheduler" and ETCD.

#### Configure service providers for the scheduler
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will there be any guide on setting up etcd?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you enable the scheduler by configuring enable_scheduler: true, it will be automatically deployed by this.
https://github.com/apache/openwhisk/pull/5194/files#diff-2356bb62c87e471ef37b7973eb51e82282ef1131ee7ab4b62d909102de96967cR23

@codecov-commenter
Copy link

codecov-commenter commented Feb 4, 2022

Codecov Report

Merging #5194 (8da0f4d) into master (e172168) will increase coverage by 28.10%.
The diff coverage is 14.28%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #5194       +/-   ##
===========================================
+ Coverage   44.49%   72.59%   +28.10%     
===========================================
  Files         237      238        +1     
  Lines       13922    13937       +15     
  Branches      593      581       -12     
===========================================
+ Hits         6194    10117     +3923     
+ Misses       7728     3820     -3908     
Impacted Files Coverage Δ
.../scala/org/apache/openwhisk/core/WhiskConfig.scala 95.75% <ø> (+0.05%) ⬆️
...rg/apache/openwhisk/core/scheduler/Scheduler.scala 9.45% <0.00%> (+0.24%) ⬆️
.../openwhisk/core/loadBalancer/FPCPoolBalancer.scala 33.73% <33.33%> (-0.14%) ⬇️
...whisk/connector/kafka/KafkaConsumerConnector.scala 59.15% <0.00%> (-22.54%) ⬇️
...pache/openwhisk/core/invoker/InvokerReactive.scala 57.26% <0.00%> (-20.52%) ⬇️
.../scala/org/apache/openwhisk/utils/Exceptions.scala 20.00% <0.00%> (-20.00%) ⬇️
...ache/openwhisk/core/database/ActivationStore.scala 78.26% <0.00%> (-14.05%) ⬇️
...a/org/apache/openwhisk/http/BasicHttpService.scala 79.36% <0.00%> (-11.12%) ⬇️
...pache/openwhisk/core/entity/ConcurrencyLimit.scala 88.23% <0.00%> (-5.89%) ⬇️
...pache/openwhisk/core/containerpool/Container.scala 85.05% <0.00%> (-5.75%) ⬇️
... and 136 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e172168...8da0f4d. Read the comment docs.

@jiangpengcheng
Copy link
Contributor

I checked that CONFIG_whisk_runtimes_defaultImagePrefix is an outdated configuration

timeWindow: "{{ duration_checker_time_window | default('1 d') }}"
zeroDowntimeDeployment:
enabled: "{{ zerodowntime_deployment_switch | default(true) }}"
solution: "{{ zerodowntime_deployment_solution | default('apicall') }}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems apicall is not readable here, how about use scroll or other readable words? anyway, it means scroll to deploy.

Another deployment solution is half, which means a small blue/green deployment solution

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, this configuration does not take effect due to some missing parts.
I believe we can address what you mentioned in the subsequent PR when we introduce a feature for zero-downtime deployment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.

@style95
Copy link
Member Author

style95 commented Feb 9, 2022

I will merge this PR in 24 hours.
Please share your comment if you have any.

elastic_index_pattern: <your elasticsearch index pattern>
elastic_base_volume: <your elasticsearch volume directory>
elastic_username: <your elasticsearch username>
elastic_password: <your elasticsearch username>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use external elasticsearch?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.
It's exactly the same as the way CouchDB works with OW.

@ningyougang
Copy link
Contributor

LGTM just with 2 small questions.

@style95 style95 merged commit 5332e6d into apache:master Feb 11, 2022
@style95 style95 mentioned this pull request Jul 31, 2022
22 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants