Inference autoscaling #109667

jan-elastic · 2024-06-13T10:52:16Z

Notes:

~~x-pack/dev-tools contains some scripts for testing / experimentation. The most important script (simulate-autoscaling) comes from this PR.~~
~~tests are missing (I'm working on that right now)~~
~~it's named "inference autoscaling" now, but I think some people want to rebrand it; i'll postpone renaming until the name is final~~
API documentation is missing (can go to follow-up PR)
telemetry is missing (can go to follow-up PR)
~~integration test is missing (I'll discuss with QA)~~

Squashed commit of the following: commit d98bd3d39d833329ab83a8274885473db41ed08a Author: Jan Kuipers <[email protected]> Date: Mon May 13 17:27:38 2024 +0200 Increase measurement interval to 10secs commit e808ae5be52c5ea4d5ff8ccb881a4a80de0254f9 Author: Jan Kuipers <[email protected]> Date: Mon May 13 17:09:33 2024 +0200 jump -> jumps commit c38cbdebfcec43e6982bb8bd1670519293161154 Author: Jan Kuipers <[email protected]> Date: Mon May 13 14:32:42 2024 +0200 Remove unused estimator commit 16101f32b539481cd4d648ebb5637a3309853552 Author: Jan Kuipers <[email protected]> Date: Mon May 13 14:31:30 2024 +0200 Measure latency periodically + documentation commit bc73bf29fde1d772701f0b71a7c8a0908669eb0f Author: Jan Kuipers <[email protected]> Date: Mon May 13 12:53:19 2024 +0200 Init variance to None commit 0e73fa836fa9deec6ba55ef1161cc0dd71f35044 Author: Jan Kuipers <[email protected]> Date: Mon May 13 11:18:21 2024 +0200 No autodetection of dynamics changes for latency commit 75924a744d26a72835529598a6df1a2d22bdaddc Author: Jan Kuipers <[email protected]> Date: Mon May 13 10:10:34 2024 +0200 Move autoscaling code to own class commit 23553bb8cccd6ed80ac667b12ec38a6d5562dd29 Author: Jan Kuipers <[email protected]> Date: Wed May 8 18:01:55 2024 +0200 Improved autoscaling simulation commit 2db606b2bba69d741fa231f369c633ea793294d5 Author: Tom Veasey <[email protected]> Date: Tue Apr 30 15:01:40 2024 +0100 Correct the dependency on allocations commit 0e45cfbaf901cf9d440efa9b404058a67d000653 Author: Tom Veasey <[email protected]> Date: Tue Apr 30 11:11:05 2024 +0100 Tweak commit a0f23a4a05875cd5df3863e5ad067b46a67c8cda Author: Tom Veasey <[email protected]> Date: Tue Apr 30 11:09:30 2024 +0100 Correction commit f9cdb140d298bd99c64c79f020c058d60bfba134 Author: Tom Veasey <[email protected]> Date: Tue Apr 30 09:57:59 2024 +0100 Allow extrapolation commit 57eb1a661a2b97412f479606c23c54dfb7887f52 Author: Tom Veasey <[email protected]> Date: Tue Apr 30 09:55:17 2024 +0100 Simplify and estimate average duration rather than rate commit 36dff17194f2bcf816013b112cf07d70c9eec161 Author: Tom Veasey <[email protected]> Date: Mon Apr 29 21:42:25 2024 +0100 Kalman filter for simple state model for average inference duration as a function of time and allocation count commit a1b85bd0deeabd5162f2ccd5a28672299025cee5 Author: Jan Kuipers <[email protected]> Date: Mon Apr 29 12:15:59 2024 +0200 Improvements commit 51040655fcfbfd221f2446542a955fb0f19fb145 Author: Jan Kuipers <[email protected]> Date: Mon Apr 29 09:33:10 2024 +0200 Account for virtual cores / hyperthreading commit 7a93407ecae6b6044108299a1d05f72cdf0d752a Author: Jan Kuipers <[email protected]> Date: Fri Apr 26 16:58:25 2024 +0200 Simulator for inference autoscaling.

… changes to the next iteration.

.../src/main/java/org/elasticsearch/xpack/core/ml/action/StartTrainedModelDeploymentAction.java

...src/main/java/org/elasticsearch/xpack/core/ml/action/UpdateTrainedModelDeploymentAction.java

.../src/main/java/org/elasticsearch/xpack/core/ml/inference/assignment/AutoscalingSettings.java

x-pack/plugin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/autoscaling/Autoscaler.java

...gin/ml/src/main/java/org/elasticsearch/xpack/ml/inference/autoscaling/AutoscalerService.java

…e-autoscaling

… create trained model request

…e-autoscaling

davidkyle

Looks great, left some minor comments

davidkyle · 2024-07-02T08:45:10Z

.../src/main/java/org/elasticsearch/xpack/core/ml/action/StartTrainedModelDeploymentAction.java

@@ -307,6 +334,12 @@ public ActionRequestValidationException validate() {
 if (threadsPerAllocation < 1) {
 validationException.addValidationError("[" + THREADS_PER_ALLOCATION + "] must be a positive integer");
 }
+ ActionRequestValidationException autoscaleException = adaptiveAllocationsSettings == null
+ ? null
+ : adaptiveAllocationsSettings.validate();


Are there any invalid combination of the adaptive allocation settings and existing allocations/threads settings that should be checked for here. For example if min_number_of_allocations > number_of_allocations

For simplicity, the behaviour is:

the initial number of allocation is number_of_allocations (the existing setting)

the scaler will adjust it to something in the range [min, max]

I don't think it's worth the effort to enforce num to be in the range of [min, max] in case enabled=true.

I don't the exact syntax and spelling but a start deployment request looks something like:

{ number_of_allocations: 1, threads_per_allocation: 1, autoscaling: { enabled: true, min : 1 max: 8 } }

If the user was to make a request where number_of_allocations is outside the [min, max] range then they have made an error and should be told that the values are contradictory. This is helpful to the user to prevent mistakes.

{ number_of_allocations: 12, threads_per_allocation: 1, autoscaling: { enabled: true, min : 1 max: 8 } }

OK, guess you're right. I've updated the API like this: when adaptive allocations is enabled, setting number_of_allocations is not allowed. It's useless anyway, because the system will undo your changes in 10 secs.

...src/main/java/org/elasticsearch/xpack/core/ml/action/UpdateTrainedModelDeploymentAction.java

...java/org/elasticsearch/xpack/ml/inference/adaptiveallocations/AdaptiveAllocationsScaler.java

...g/elasticsearch/xpack/ml/inference/adaptiveallocations/AdaptiveAllocationsScalerService.java

...va/org/elasticsearch/xpack/ml/inference/assignment/TrainedModelAssignmentClusterService.java

…e-autoscaling

jan-elastic · 2024-07-04T13:14:58Z

@davidkyle Processed your comments. PTAL

…e-autoscaling

davidkyle

LGTM

jan-elastic added 8 commits June 13, 2024 10:15

Better process variance upon dynamics changes, and propagate dynamics…

dd94b5b

… changes to the next iteration.

Inference autoscaling (WIP)

aa42a52

Inference autoscaling test scripts

f91b03f

Debug logs

47fa38f

Inference autoscaling API

fa22215

Update Autoscalers upon cluster changes

b2f68eb

Polish code / fix bugs

7cb2676

elasticsearchmachine added needs:triage Requires assignment of a team area label v8.15.0 labels Jun 13, 2024

jan-elastic marked this pull request as draft June 13, 2024 11:49

jan-elastic requested review from davidkyle and tveasey June 13, 2024 11:49

Use correct string formatter

4d3465c

jan-elastic force-pushed the inference-autoscaling branch from d6df245 to 1ca9da4 Compare June 13, 2024 14:18

More fixes

d61c891

jan-elastic force-pushed the inference-autoscaling branch from 1ca9da4 to d61c891 Compare June 13, 2024 15:05

davidkyle reviewed Jun 18, 2024

View reviewed changes

jan-elastic added 5 commits June 18, 2024 12:28

Autoscaling tests

a76c6e7

spotless

1c9a01b

Merge branch 'main' of github.com:elastic/elasticsearch into inferenc…

d90b1e3

…e-autoscaling

Remove scripts (moved to ml-data)

a10e8d5

Rebrand to "adaptive allocations".

1d80936

jan-elastic added :ml Machine learning Team:ML Meta label for the ML team and removed needs:triage Requires assignment of a team area label labels Jun 19, 2024

jan-elastic added 4 commits June 19, 2024 12:36

Merge branch 'main' of github.com:elastic/elasticsearch into inferenc…

3d0fa16

…e-autoscaling

Move serialized field to end

1ad6dc6

Rebranding leftover

3fe07fa

Improve adaptive allocation timing

17c181e

jan-elastic added 8 commits June 27, 2024 10:19

Wire adaptive allocations setting into put inference API

ab674d8

Merge branch 'main' of github.com:elastic/elasticsearch into inferenc…

98a8890

…e-autoscaling

Checkstyle

d881209

Fix serialization of ElserInternalServiceSettings.

ebf1f2e

Propagate adaptive allocations settings from put inference request to…

b897af3

… create trained model request

Fix CustomElandInternalTextEmbeddingServiceSettingsTests

68d03a3

Merge branch 'main' of github.com:elastic/elasticsearch into inferenc…

b53e306

…e-autoscaling

Javadocs

85e06c7

mark-vieira removed the request for review from a team July 2, 2024 18:52

elasticsearchmachine added v8.16.0 and removed v8.15.0 labels Jul 4, 2024

davidkyle reviewed Jul 4, 2024

View reviewed changes

jan-elastic added 2 commits July 4, 2024 15:00

Improvements / fixes

083357d

Merge branch 'main' of github.com:elastic/elasticsearch into inferenc…

a97d93f

…e-autoscaling

jan-elastic added 12 commits July 5, 2024 13:21

Disallow setting num_allocations when adaptive allocations is enabled

3d10f55

Fix AdaptiveAllocationsScalerServiceTests

743ffae

Merge branch 'main' of github.com:elastic/elasticsearch into inferenc…

31fe8d3

…e-autoscaling

spotless

8eaf3f8

NPE fixes

5cd6716

spotless

8315d6e

Allow autoscaler to update num allocations

9b2ee3b

Fix AdaptiveAllocationsScalerServiceTests.

15feeae

Fix bug in inference stats api

b7184cd

Merge branch 'main' of github.com:elastic/elasticsearch into inferenc…

c07c9ba

…e-autoscaling

Fix PyTorchResultProcessorTests

0d9664c

Merge branch 'main' of github.com:elastic/elasticsearch into inferenc…

3a7a832

…e-autoscaling

davidkyle approved these changes Jul 9, 2024

View reviewed changes

jan-elastic merged commit 4cfe6cc into main Jul 9, 2024
17 checks passed

jan-elastic deleted the inference-autoscaling branch July 9, 2024 09:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference autoscaling #109667

Inference autoscaling #109667

jan-elastic commented Jun 13, 2024 •

edited

Loading

davidkyle left a comment

davidkyle Jul 2, 2024

jan-elastic Jul 4, 2024

davidkyle Jul 4, 2024

jan-elastic Jul 5, 2024

jan-elastic commented Jul 4, 2024

davidkyle left a comment

Inference autoscaling #109667

Inference autoscaling #109667

Conversation

jan-elastic commented Jun 13, 2024 • edited Loading

davidkyle left a comment

Choose a reason for hiding this comment

davidkyle Jul 2, 2024

Choose a reason for hiding this comment

jan-elastic Jul 4, 2024

Choose a reason for hiding this comment

davidkyle Jul 4, 2024

Choose a reason for hiding this comment

jan-elastic Jul 5, 2024

Choose a reason for hiding this comment

jan-elastic commented Jul 4, 2024

davidkyle left a comment

Choose a reason for hiding this comment

jan-elastic commented Jun 13, 2024 •

edited

Loading