-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Serve] Add Multiplex metrics into dashboard #37722
[Serve] Add Multiplex metrics into dashboard #37722
Conversation
Signed-off-by: Sihan Wang <[email protected]>
Signed-off-by: Sihan Wang <[email protected]>
Signed-off-by: Sihan Wang <[email protected]>
Signed-off-by: Sihan Wang <[email protected]>
Signed-off-by: Sihan Wang <[email protected]>
Signed-off-by: Sihan Wang <[email protected]>
Signed-off-by: Sihan Wang <[email protected]>
7f0b5f0
to
6ba887e
Compare
Signed-off-by: Sihan Wang <[email protected]>
Signed-off-by: Sihan Wang <[email protected]>
Signed-off-by: Sihan Wang <[email protected]>
Signed-off-by: Sihan Wang <[email protected]>
Signed-off-by: Sihan Wang <[email protected]>
Signed-off-by: Sihan Wang <[email protected]>
Signed-off-by: Sihan Wang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a great start!
@@ -149,6 +149,108 @@ | |||
stack=False, | |||
grid_pos=GridPos(16, 2, 8, 8), | |||
), | |||
Panel( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for most users these panels will be useless.
Is it possible to put them behind a separate grafana dashboard? or maybe a collapsed-by-default grouping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm
Signed-off-by: Sihan Wang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly copy changes to make graph titles seem more like titles
dashboard/modules/metrics/dashboards/serve_deployment_dashboard_panels.py
Outdated
Show resolved
Hide resolved
dashboard/modules/metrics/dashboards/serve_deployment_dashboard_panels.py
Outdated
Show resolved
Hide resolved
dashboard/modules/metrics/dashboards/serve_deployment_dashboard_panels.py
Outdated
Show resolved
Hide resolved
dashboard/modules/metrics/dashboards/serve_deployment_dashboard_panels.py
Outdated
Show resolved
Hide resolved
dashboard/modules/metrics/dashboards/serve_deployment_dashboard_panels.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Sihan Wang <[email protected]>
@edoakes ping for merge ^^ |
Number of models per replica: <img width="1639" alt="image" src="https://github.com/ray-project/ray/assets/6515354/6f45f047-f02e-453c-914a-96098739c3a3"> Number of times models loaded <img width="1630" alt="image" src="https://github.com/ray-project/ray/assets/6515354/e5bb0f9a-6958-4e73-9a6a-294aa02891cc"> Number of times models unloaded <img width="1596" alt="image" src="https://github.com/ray-project/ray/assets/6515354/717a2f45-29a0-4204-8dcf-b3caf3f53035"> Model load latency p99 <img width="1638" alt="image" src="https://github.com/ray-project/ray/assets/6515354/69729506-6f37-4ee7-93c9-979a09c7c234"> Model unloaded latency p99 <img width="1670" alt="image" src="https://github.com/ray-project/ray/assets/6515354/f8ac2a83-42c3-4c49-95b4-2089ca94ff5c"> Registered model <img width="1661" alt="image" src="https://github.com/ray-project/ray/assets/6515354/d1cd94e1-46f4-4431-a78c-b8177291d948"> Cache hit rate <img width="1653" alt="image" src="https://github.com/ray-project/ray/assets/6515354/b38472f4-57e7-45d7-aebb-375a0f96cbd1"> Signed-off-by: NripeshN <[email protected]>
Number of models per replica: <img width="1639" alt="image" src="https://github.com/ray-project/ray/assets/6515354/6f45f047-f02e-453c-914a-96098739c3a3"> Number of times models loaded <img width="1630" alt="image" src="https://github.com/ray-project/ray/assets/6515354/e5bb0f9a-6958-4e73-9a6a-294aa02891cc"> Number of times models unloaded <img width="1596" alt="image" src="https://github.com/ray-project/ray/assets/6515354/717a2f45-29a0-4204-8dcf-b3caf3f53035"> Model load latency p99 <img width="1638" alt="image" src="https://github.com/ray-project/ray/assets/6515354/69729506-6f37-4ee7-93c9-979a09c7c234"> Model unloaded latency p99 <img width="1670" alt="image" src="https://github.com/ray-project/ray/assets/6515354/f8ac2a83-42c3-4c49-95b4-2089ca94ff5c"> Registered model <img width="1661" alt="image" src="https://github.com/ray-project/ray/assets/6515354/d1cd94e1-46f4-4431-a78c-b8177291d948"> Cache hit rate <img width="1653" alt="image" src="https://github.com/ray-project/ray/assets/6515354/b38472f4-57e7-45d7-aebb-375a0f96cbd1"> Signed-off-by: harborn <[email protected]>
Number of models per replica: <img width="1639" alt="image" src="https://github.com/ray-project/ray/assets/6515354/6f45f047-f02e-453c-914a-96098739c3a3"> Number of times models loaded <img width="1630" alt="image" src="https://github.com/ray-project/ray/assets/6515354/e5bb0f9a-6958-4e73-9a6a-294aa02891cc"> Number of times models unloaded <img width="1596" alt="image" src="https://github.com/ray-project/ray/assets/6515354/717a2f45-29a0-4204-8dcf-b3caf3f53035"> Model load latency p99 <img width="1638" alt="image" src="https://github.com/ray-project/ray/assets/6515354/69729506-6f37-4ee7-93c9-979a09c7c234"> Model unloaded latency p99 <img width="1670" alt="image" src="https://github.com/ray-project/ray/assets/6515354/f8ac2a83-42c3-4c49-95b4-2089ca94ff5c"> Registered model <img width="1661" alt="image" src="https://github.com/ray-project/ray/assets/6515354/d1cd94e1-46f4-4431-a78c-b8177291d948"> Cache hit rate <img width="1653" alt="image" src="https://github.com/ray-project/ray/assets/6515354/b38472f4-57e7-45d7-aebb-375a0f96cbd1">
Number of models per replica: <img width="1639" alt="image" src="https://github.com/ray-project/ray/assets/6515354/6f45f047-f02e-453c-914a-96098739c3a3"> Number of times models loaded <img width="1630" alt="image" src="https://github.com/ray-project/ray/assets/6515354/e5bb0f9a-6958-4e73-9a6a-294aa02891cc"> Number of times models unloaded <img width="1596" alt="image" src="https://github.com/ray-project/ray/assets/6515354/717a2f45-29a0-4204-8dcf-b3caf3f53035"> Model load latency p99 <img width="1638" alt="image" src="https://github.com/ray-project/ray/assets/6515354/69729506-6f37-4ee7-93c9-979a09c7c234"> Model unloaded latency p99 <img width="1670" alt="image" src="https://github.com/ray-project/ray/assets/6515354/f8ac2a83-42c3-4c49-95b4-2089ca94ff5c"> Registered model <img width="1661" alt="image" src="https://github.com/ray-project/ray/assets/6515354/d1cd94e1-46f4-4431-a78c-b8177291d948"> Cache hit rate <img width="1653" alt="image" src="https://github.com/ray-project/ray/assets/6515354/b38472f4-57e7-45d7-aebb-375a0f96cbd1"> Signed-off-by: e428265 <[email protected]>
Number of models per replica: <img width="1639" alt="image" src="https://github.com/ray-project/ray/assets/6515354/6f45f047-f02e-453c-914a-96098739c3a3"> Number of times models loaded <img width="1630" alt="image" src="https://github.com/ray-project/ray/assets/6515354/e5bb0f9a-6958-4e73-9a6a-294aa02891cc"> Number of times models unloaded <img width="1596" alt="image" src="https://github.com/ray-project/ray/assets/6515354/717a2f45-29a0-4204-8dcf-b3caf3f53035"> Model load latency p99 <img width="1638" alt="image" src="https://github.com/ray-project/ray/assets/6515354/69729506-6f37-4ee7-93c9-979a09c7c234"> Model unloaded latency p99 <img width="1670" alt="image" src="https://github.com/ray-project/ray/assets/6515354/f8ac2a83-42c3-4c49-95b4-2089ca94ff5c"> Registered model <img width="1661" alt="image" src="https://github.com/ray-project/ray/assets/6515354/d1cd94e1-46f4-4431-a78c-b8177291d948"> Cache hit rate <img width="1653" alt="image" src="https://github.com/ray-project/ray/assets/6515354/b38472f4-57e7-45d7-aebb-375a0f96cbd1"> Signed-off-by: Victor <[email protected]>
Why are these changes needed?
Number of models per replica:
![image](https://private-user-images.githubusercontent.com/6515354/257077803-6f45f047-f02e-453c-914a-96098739c3a3.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjA5Nzk1NjEsIm5iZiI6MTcyMDk3OTI2MSwicGF0aCI6Ii82NTE1MzU0LzI1NzA3NzgwMy02ZjQ1ZjA0Ny1mMDJlLTQ1M2MtOTE0YS05NjA5ODczOWMzYTMucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDcxNCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA3MTRUMTc0NzQxWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9MGE2Zjc5Y2RmYTNlZWM3NjRjYTdmMjM3MzZkYmJjMzJkZmM2YTFkZTFmYjRjYjc3MTVmNDdkMmY0ZTY1OGVkYyZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.Cjx6DaFmSm-RzzXWJt2I9-YzG6I3dEW0cAcefRKwNaY)
![image](https://private-user-images.githubusercontent.com/6515354/257077845-e5bb0f9a-6958-4e73-9a6a-294aa02891cc.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjA5Nzk1NjEsIm5iZiI6MTcyMDk3OTI2MSwicGF0aCI6Ii82NTE1MzU0LzI1NzA3Nzg0NS1lNWJiMGY5YS02OTU4LTRlNzMtOWE2YS0yOTRhYTAyODkxY2MucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDcxNCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA3MTRUMTc0NzQxWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9MjFiMjJlYjIzY2UxYWIzNThkMTdiMTEzNGIzN2I2OTQ5M2YxNWZhNWQ1ZDQ0YTQzYjVjZTBlZTM0ZWE0MzY4OCZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.8xrXKHFdydLQaQ78632tqJyAoonRgJgMi2BWEOJes94)
![image](https://private-user-images.githubusercontent.com/6515354/257077856-717a2f45-29a0-4204-8dcf-b3caf3f53035.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjA5Nzk1NjEsIm5iZiI6MTcyMDk3OTI2MSwicGF0aCI6Ii82NTE1MzU0LzI1NzA3Nzg1Ni03MTdhMmY0NS0yOWEwLTQyMDQtOGRjZi1iM2NhZjNmNTMwMzUucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDcxNCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA3MTRUMTc0NzQxWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9YzA3N2UwODYxNmNmZDU1NTM0NGNiYmNkNGFhZWM2NDIzNjM4ZTQzMzViMzI5MmNkOWVlNDFhMjg0ZGUzMTc0NyZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.zvhWueAbl05AudRL-OvmBcDmy95SWr7BH2H07lyb79o)
![image](https://private-user-images.githubusercontent.com/6515354/257077882-69729506-6f37-4ee7-93c9-979a09c7c234.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjA5Nzk1NjEsIm5iZiI6MTcyMDk3OTI2MSwicGF0aCI6Ii82NTE1MzU0LzI1NzA3Nzg4Mi02OTcyOTUwNi02ZjM3LTRlZTctOTNjOS05NzlhMDljN2MyMzQucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDcxNCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA3MTRUMTc0NzQxWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9ZTc3Nzg0YzQyODZkNzc5YmMxZjdhNDBlZGM2MWE4YjQ3MDc3ZDBkMDQ3ZDIwYTM2YWVmZTE0MTU0MTlmNjcxNyZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.N1jxgD0BixC2pFWnV_h8RB3R48C1NmM-6w0ivr4zuP0)
![image](https://private-user-images.githubusercontent.com/6515354/257077901-f8ac2a83-42c3-4c49-95b4-2089ca94ff5c.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjA5Nzk1NjEsIm5iZiI6MTcyMDk3OTI2MSwicGF0aCI6Ii82NTE1MzU0LzI1NzA3NzkwMS1mOGFjMmE4My00MmMzLTRjNDktOTViNC0yMDg5Y2E5NGZmNWMucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDcxNCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA3MTRUMTc0NzQxWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9ZmE3ZWJlMTFjMTY1ZWY2YjQ4NWVkMzM0MWIyYjU2OGFiNmY4YjM0YjRkNzYzNjM2ODU1MGU2YTk5ZmI2N2Y3YyZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.dzvuSalJbhUVMOz5fsSC9EDhdfd96_keWZuQmIJzuR8)
![image](https://private-user-images.githubusercontent.com/6515354/257077915-d1cd94e1-46f4-4431-a78c-b8177291d948.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjA5Nzk1NjEsIm5iZiI6MTcyMDk3OTI2MSwicGF0aCI6Ii82NTE1MzU0LzI1NzA3NzkxNS1kMWNkOTRlMS00NmY0LTQ0MzEtYTc4Yy1iODE3NzI5MWQ5NDgucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDcxNCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA3MTRUMTc0NzQxWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9OGI5NTJhMWJhNzQ0NTI0MTk0OTU4YjIwZDc2N2U3NzVkZDE1MmY5MmVlMTJhZDhiNjAzNjEwNDU0NTVkZDA5YiZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.NKJRG9yOoc_rVdK6m2-Cv-vws9HzlYV0XdWPK0M1zSQ)
![image](https://private-user-images.githubusercontent.com/6515354/257077928-b38472f4-57e7-45d7-aebb-375a0f96cbd1.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjA5Nzk1NjEsIm5iZiI6MTcyMDk3OTI2MSwicGF0aCI6Ii82NTE1MzU0LzI1NzA3NzkyOC1iMzg0NzJmNC01N2U3LTQ1ZDctYWViYi0zNzVhMGY5NmNiZDEucG5nP1gtQW16LUFsZ29yaXRobT1BV1M0LUhNQUMtU0hBMjU2JlgtQW16LUNyZWRlbnRpYWw9QUtJQVZDT0RZTFNBNTNQUUs0WkElMkYyMDI0MDcxNCUyRnVzLWVhc3QtMSUyRnMzJTJGYXdzNF9yZXF1ZXN0JlgtQW16LURhdGU9MjAyNDA3MTRUMTc0NzQxWiZYLUFtei1FeHBpcmVzPTMwMCZYLUFtei1TaWduYXR1cmU9N2YxZDZlNWUzMjk0ZTc0MzdmZGY3NTlkMDJiZGE2YWYxYjA2ZGZjNzEwMGU1YThjM2ViMmUyNmY4MzM1ZmRhOSZYLUFtei1TaWduZWRIZWFkZXJzPWhvc3QmYWN0b3JfaWQ9MCZrZXlfaWQ9MCZyZXBvX2lkPTAifQ.LCinmqXbwe80diwGxPuA_h9IkHt9CZk7Pf3DO32MTVc)
Number of times models loaded
Number of times models unloaded
Model load latency p99
Model unloaded latency p99
Registered model
Cache hit rate
Related issue number
Closes: #37517
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.