-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cri: get pid count from container metrics #10375
Conversation
Hi @linxiulei. Thanks for your PR. I'm waiting for a containerd member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
f509acd
to
6f8cadc
Compare
/retest |
3 similar comments
/retest |
/retest |
/retest |
default: | ||
return 0, fmt.Errorf("unexpected metrics type: %T from %s", metrics, reflect.TypeOf(metrics).Elem().PkgPath()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the code further up in linuxContainerMetrics (where this function is called), it looks like that function accepts wstats.Statistics
, but here we produce an error for it.
containerd/internal/cri/server/container_stats_list.go
Lines 362 to 370 in faf06a3
case typeurl.Is(stats.Data, (*cg1.Metrics)(nil)): | |
data = &cg1.Metrics{} | |
case typeurl.Is(stats.Data, (*cg2.Metrics)(nil)): | |
data = &cg2.Metrics{} | |
case typeurl.Is(stats.Data, (*wstats.Statistics)(nil)): | |
data = &wstats.Statistics{} | |
default: | |
return nil, errors.New("cannot convert metric data to cgroups.Metrics or windows.Statistics") | |
} |
So probably we should either add the type here (and ignore), or change that switch above to return an error if we can't accept it.
I'm also wondering if we should roll up the unmarshaling inside those branches at the cost of duplicating 3 lines of code, so that we don't have to do the type-assertions multiple times;
if err := typeurl.UnmarshalTo(stats.Data, data); err != nil {
return containerStats{}, fmt.Errorf("failed to extract container metrics: %w", err)
}
In that case we can inline the code to get the number of pids;
var data interface{}
var pids int
switch {
case typeurl.Is(stats.Data, (*cg1.Metrics)(nil)):
data = &cg1.Metrics{}
if err := typeurl.UnmarshalTo(stats.Data, data); err != nil {
return nil, fmt.Errorf("failed to extract container metrics: %w", err)
}
pids = metrics.GetPids().GetCurrent()
case typeurl.Is(stats.Data, (*cg2.Metrics)(nil)):
data = &cg2.Metrics{}
if err := typeurl.UnmarshalTo(stats.Data, data); err != nil {
return nil, fmt.Errorf("failed to extract container metrics: %w", err)
}
pids = metrics.GetPids().GetCurrent()
case typeurl.Is(stats.Data, (*wstats.Statistics)(nil)):
// FIXME: should this be an error on Linux??
data = &wstats.Statistics{}
if err := typeurl.UnmarshalTo(stats.Data, data); err != nil {
return nil, fmt.Errorf("failed to extract container metrics: %w", err)
}
default:
return nil, errors.New("cannot convert metric data to cgroups.Metrics or windows.Statistics")
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case typeurl.Is(stats.Data, (*wstats.Statistics)(nil)):
data = &wstats.Statistics{}
Actually, I am not sure if this code is dead as I think this is for windows platform. At least internal/cri/server/sandbox_stats_linux.go
should never care about wstats.Statistics
since it assumes Linux.
But anyway, I think it's safe to have pids as 0 without returning an error here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, I had a second look and found cpuContainerStats
doesn't accept wstats.Statistics
at all
containerd/internal/cri/server/container_stats_list.go
Lines 475 to 498 in 5258d9f
func (c *criService) cpuContainerStats(ID string, isSandbox bool, stats interface{}, timestamp time.Time) (*runtime.CpuUsage, error) { | |
switch metrics := stats.(type) { | |
case *cg1.Metrics: | |
metrics.GetCPU().GetUsage() | |
if metrics.CPU != nil && metrics.CPU.Usage != nil { | |
return &runtime.CpuUsage{ | |
Timestamp: timestamp.UnixNano(), | |
UsageCoreNanoSeconds: &runtime.UInt64Value{Value: metrics.CPU.Usage.Total}, | |
}, nil | |
} | |
case *cg2.Metrics: | |
if metrics.CPU != nil { | |
// convert to nano seconds | |
usageCoreNanoSeconds := metrics.CPU.UsageUsec * 1000 | |
return &runtime.CpuUsage{ | |
Timestamp: timestamp.UnixNano(), | |
UsageCoreNanoSeconds: &runtime.UInt64Value{Value: usageCoreNanoSeconds}, | |
}, nil | |
} | |
default: | |
return nil, fmt.Errorf("unexpected metrics type: %T from %s", metrics, reflect.TypeOf(metrics).Elem().PkgPath()) | |
} | |
return nil, nil |
It should be safe to return an error early here.
// pids is only valid in linux platform | ||
pids uint64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if this would be a potential candidate for inclusion in runtime.ContainerStats
, so that we don't need to define a wrapper in future 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was tempted to do so as well, but pids doesn't seem a common field for all platforms (i.e. windows) so not sure how feasible it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was curious about the Windows case; maybe there's still an equivalent somewhere, but if not, then it could possibly still be fine as long as it's documented "not used on Windows platforms".
@kevpar @kiashok do you know if there's an equivalent on Windows that could be implemented in future (and/or if it's relevant for Windows?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW; I don't think the current wrapper is too problematic; it's not exported, so only used internally. I merely went looking if we could reduce code-churn somehow, but as the runtime.ContainerStats
is not owned by containerd, that would require upstream changes first, so in the meantime having the wrapper isn't a huge issue (for me at least 😄)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to make sure I'm understanding: pids
is the number of processes in the container, and the question is "does Windows support querying for this as well, so that it would make sense to make it a common field?"?
HCS does support querying for an array of info on each process in the container, so you could use that to determine the count. At the HCS level it is returned as a JSON object with an array of process entries, rather than a single count, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, we can create new issue to track this question. This patch is to reduce api calls. It's not intend to introduce new field in CRI-API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @kevpar yes, I think "yes to both" - if having this information makes sense to both Windows and Linux, possibly it could be included in the type.
Definitely not a blocker (as mentioned; changes in this PR are internal, so perfectly fine for now).
804fc2e
to
a53f4c0
Compare
@thaJeztah would you mind LGTM-ing this? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Left one comment on test. Thank you for the fix.
// pids is only valid in linux platform | ||
pids uint64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, we can create new issue to track this question. This patch is to reduce api calls. It's not intend to introduce new field in CRI-API.
This reduces latency of calling ListPodSandboxStats() by avoiding calling shim API Task(). Signed-off-by: Eric Lin <[email protected]>
/retest |
1 similar comment
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This reduces latency of ListPodSandboxStats() by avoiding calling shim API Task().
Prior to this commit, the latency of calling ListPodSandboxStats() with 35 sandboxes (one container per sandbox) took 450ms. Mostly it was due to calling Task() to each container's shim, which took 10ms.
However, the pid count has already been present in collected cgroup metrics so this commit reuses that info to improve performance. It also means Windows platform would still need to call Task().
After this commit, the latency of ListPodSandboxStats() is about 120ms.