-
Notifications
You must be signed in to change notification settings - Fork 24.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inference autoscaling #109667
Inference autoscaling #109667
Conversation
Squashed commit of the following: commit d98bd3d39d833329ab83a8274885473db41ed08a Author: Jan Kuipers <[email protected]> Date: Mon May 13 17:27:38 2024 +0200 Increase measurement interval to 10secs commit e808ae5be52c5ea4d5ff8ccb881a4a80de0254f9 Author: Jan Kuipers <[email protected]> Date: Mon May 13 17:09:33 2024 +0200 jump -> jumps commit c38cbdebfcec43e6982bb8bd1670519293161154 Author: Jan Kuipers <[email protected]> Date: Mon May 13 14:32:42 2024 +0200 Remove unused estimator commit 16101f32b539481cd4d648ebb5637a3309853552 Author: Jan Kuipers <[email protected]> Date: Mon May 13 14:31:30 2024 +0200 Measure latency periodically + documentation commit bc73bf29fde1d772701f0b71a7c8a0908669eb0f Author: Jan Kuipers <[email protected]> Date: Mon May 13 12:53:19 2024 +0200 Init variance to None commit 0e73fa836fa9deec6ba55ef1161cc0dd71f35044 Author: Jan Kuipers <[email protected]> Date: Mon May 13 11:18:21 2024 +0200 No autodetection of dynamics changes for latency commit 75924a744d26a72835529598a6df1a2d22bdaddc Author: Jan Kuipers <[email protected]> Date: Mon May 13 10:10:34 2024 +0200 Move autoscaling code to own class commit 23553bb8cccd6ed80ac667b12ec38a6d5562dd29 Author: Jan Kuipers <[email protected]> Date: Wed May 8 18:01:55 2024 +0200 Improved autoscaling simulation commit 2db606b2bba69d741fa231f369c633ea793294d5 Author: Tom Veasey <[email protected]> Date: Tue Apr 30 15:01:40 2024 +0100 Correct the dependency on allocations commit 0e45cfbaf901cf9d440efa9b404058a67d000653 Author: Tom Veasey <[email protected]> Date: Tue Apr 30 11:11:05 2024 +0100 Tweak commit a0f23a4a05875cd5df3863e5ad067b46a67c8cda Author: Tom Veasey <[email protected]> Date: Tue Apr 30 11:09:30 2024 +0100 Correction commit f9cdb140d298bd99c64c79f020c058d60bfba134 Author: Tom Veasey <[email protected]> Date: Tue Apr 30 09:57:59 2024 +0100 Allow extrapolation commit 57eb1a661a2b97412f479606c23c54dfb7887f52 Author: Tom Veasey <[email protected]> Date: Tue Apr 30 09:55:17 2024 +0100 Simplify and estimate average duration rather than rate commit 36dff17194f2bcf816013b112cf07d70c9eec161 Author: Tom Veasey <[email protected]> Date: Mon Apr 29 21:42:25 2024 +0100 Kalman filter for simple state model for average inference duration as a function of time and allocation count commit a1b85bd0deeabd5162f2ccd5a28672299025cee5 Author: Jan Kuipers <[email protected]> Date: Mon Apr 29 12:15:59 2024 +0200 Improvements commit 51040655fcfbfd221f2446542a955fb0f19fb145 Author: Jan Kuipers <[email protected]> Date: Mon Apr 29 09:33:10 2024 +0200 Account for virtual cores / hyperthreading commit 7a93407ecae6b6044108299a1d05f72cdf0d752a Author: Jan Kuipers <[email protected]> Date: Fri Apr 26 16:58:25 2024 +0200 Simulator for inference autoscaling.
… changes to the next iteration.
d6df245
to
1ca9da4
Compare
1ca9da4
to
d61c891
Compare
.../src/main/java/org/elasticsearch/xpack/core/ml/action/StartTrainedModelDeploymentAction.java
Outdated
Show resolved
Hide resolved
.../src/main/java/org/elasticsearch/xpack/core/ml/action/StartTrainedModelDeploymentAction.java
Outdated
Show resolved
Hide resolved
...src/main/java/org/elasticsearch/xpack/core/ml/action/UpdateTrainedModelDeploymentAction.java
Show resolved
Hide resolved
.../src/main/java/org/elasticsearch/xpack/core/ml/inference/assignment/AutoscalingSettings.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/autoscaling/Autoscaler.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/autoscaling/Autoscaler.java
Outdated
Show resolved
Hide resolved
...gin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/autoscaling/AutoscalerService.java
Outdated
Show resolved
Hide resolved
...gin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/autoscaling/AutoscalerService.java
Outdated
Show resolved
Hide resolved
...gin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/autoscaling/AutoscalerService.java
Outdated
Show resolved
Hide resolved
...gin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/autoscaling/AutoscalerService.java
Outdated
Show resolved
Hide resolved
… create trained model request
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, left some minor comments
@@ -307,6 +334,12 @@ public ActionRequestValidationException validate() { | |||
if (threadsPerAllocation < 1) { | |||
validationException.addValidationError("[" + THREADS_PER_ALLOCATION + "] must be a positive integer"); | |||
} | |||
ActionRequestValidationException autoscaleException = adaptiveAllocationsSettings == null | |||
? null | |||
: adaptiveAllocationsSettings.validate(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any invalid combination of the adaptive allocation settings and existing allocations/threads settings that should be checked for here. For example if min_number_of_allocations
> number_of_allocations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For simplicity, the behaviour is:
- the initial number of allocation is
number_of_allocations
(the existing setting) - the scaler will adjust it to something in the range
[min, max]
I don't think it's worth the effort to enforce num
to be in the range of [min, max]
in case enabled=true
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't the exact syntax and spelling but a start deployment request looks something like:
{
number_of_allocations: 1,
threads_per_allocation: 1,
autoscaling: {
enabled: true,
min : 1
max: 8
}
}
If the user was to make a request where number_of_allocations
is outside the [min, max] range then they have made an error and should be told that the values are contradictory. This is helpful to the user to prevent mistakes.
{
number_of_allocations: 12,
threads_per_allocation: 1,
autoscaling: {
enabled: true,
min : 1
max: 8
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, guess you're right. I've updated the API like this: when adaptive allocations is enabled, setting number_of_allocations
is not allowed. It's useless anyway, because the system will undo your changes in 10 secs.
...src/main/java/org/elasticsearch/xpack/core/ml/action/UpdateTrainedModelDeploymentAction.java
Show resolved
Hide resolved
...java/org/elasticsearch/xpack/ml/inference/adaptiveallocations/AdaptiveAllocationsScaler.java
Show resolved
Hide resolved
...java/org/elasticsearch/xpack/ml/inference/adaptiveallocations/AdaptiveAllocationsScaler.java
Outdated
Show resolved
Hide resolved
...java/org/elasticsearch/xpack/ml/inference/adaptiveallocations/AdaptiveAllocationsScaler.java
Show resolved
Hide resolved
...g/elasticsearch/xpack/ml/inference/adaptiveallocations/AdaptiveAllocationsScalerService.java
Show resolved
Hide resolved
...g/elasticsearch/xpack/ml/inference/adaptiveallocations/AdaptiveAllocationsScalerService.java
Show resolved
Hide resolved
...g/elasticsearch/xpack/ml/inference/adaptiveallocations/AdaptiveAllocationsScalerService.java
Show resolved
Hide resolved
...g/elasticsearch/xpack/ml/inference/adaptiveallocations/AdaptiveAllocationsScalerService.java
Outdated
Show resolved
Hide resolved
...va/org/elasticsearch/xpack/ml/inference/assignment/TrainedModelAssignmentClusterService.java
Show resolved
Hide resolved
@davidkyle Processed your comments. PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Notes:
x-pack/dev-tools
contains some scripts for testing / experimentation. The most important script (simulate-autoscaling
) comes from this PR.tests are missing (I'm working on that right now)it's named "inference autoscaling" now, but I think some people want to rebrand it; i'll postpone renaming until the name is finalintegration test is missing (I'll discuss with QA)