-
Notifications
You must be signed in to change notification settings - Fork 13.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-21794][metrics] Support retrieving slot details via rest api #15249
Conversation
Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community Automated ChecksLast check on commit 2dc50c5 (Wed Mar 17 05:53:20 UTC 2021) Warnings:
Mention the bot in a comment to re-run the automated checks. Review Progress
Please see the Pull Request Review Guide for a full explanation of the review process. The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commandsThe @flinkbot bot supports the following commands:
|
.../org/apache/flink/runtime/resourcemanager/slotmanager/ClusterResourceStatisticsProvider.java
Outdated
Show resolved
Hide resolved
...c/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/DeclarativeSlotManager.java
Outdated
Show resolved
Hide resolved
...-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/slotmanager/SlotManager.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @KarmaGYZ, LGTM.
@vthinkxie, could you help double-check on this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @KarmaGYZ
LGTM
...untime/src/main/java/org/apache/flink/runtime/rest/messages/taskmanager/TaskManagerInfo.java
Outdated
Show resolved
Hide resolved
return taskManagerTracker.getRegisteredTaskManager(instanceID) | ||
.map(TaskManagerInfo::getAllocatedSlots).map(Map::values) | ||
.orElse(Collections.emptyList()).stream() | ||
.map(slot -> new SlotInfo(slot.getJobId(), slot.getResourceProfile())) | ||
.collect(Collectors.toList()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will scale terribly. Do we really need information for every single slot?
Why not slots + number of slots of that type? Or total acquired resources? Or available/total resources of the TM?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, we already have the available/total resources of the TM. For fine-grained resource management, slots are dynamically created with different resource profiles. It would be good to let user know the what type of slot is allocated and which job it is allocated for. I don't think it will help a lot to reorganize it to slots + number of slots of that type. In the worst case, that will increase the data size instead.
After an offline discussion with @vthinkxie , I think we can add only this information to TaskManagerDetailsInfo
. Then, that information will only be returned when user goes into a specific task manager page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But why bother with slots in the first place? Isn't the whole point for fine-grained resource management to not really bother with slots but resources instead, so why then expose it as slots again in the UI?
Why not just say "Job A has allocated X memory and Y CPUs on TM Z to run N tasks."
That is some really digestible information.
What value does it bring to know this instead:
"Job as has N slots on TM Z, where slot 1 uses X_1 memory any Y_1 cpus, and slot 2 uses X_2 memory and Y_2 cpus, and ..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Job as has N slots on TM Z, where slot 1 uses X_1 memory any Y_1 cpus, and slot 2 uses X_2 memory and Y_2 cpus, and ..." could be aggregated and get "Job A has allocated X memory and Y CPUs on TM Z to run N tasks."
while it cannot be pushed backward.
Since TM and resources are implemented on a slots basis, the interface layer should expose atomic properties
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think how to expose that information to the user is depends on the WebUI and might be out of the scope of this PR. In current interface, we give the UI flexibility to choose how to aggregate it.
Also, user might want to know the explicit resource of each task, especially for external resources.
flink-runtime/src/main/java/org/apache/flink/runtime/rest/messages/taskmanager/SlotInfo.java
Outdated
Show resolved
Hide resolved
Thanks for the comments @zentol . PR updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@zentol, would you like to have another look at this?
What is the purpose of the change
(For example: This pull request makes task deployment go through the blob server, rather than through RPC. That way we avoid re-transferring them on each deployment (during recovery).)
Brief change log
(for example:)
Verifying this change
(Please pick either of the following options)
This change is a trivial rework / code cleanup without any test coverage.
(or)
This change is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Does this pull request potentially affect one of the following parts:
@Public(Evolving)
: (yes / no)Documentation