Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: metric name formatting #14

Closed
2 tasks
olix0r opened this issue Jan 20, 2016 · 2 comments
Closed
2 tasks

RFC: metric name formatting #14

olix0r opened this issue Jan 20, 2016 · 2 comments
Assignees
Labels

Comments

@olix0r
Copy link
Member

olix0r commented Jan 20, 2016

By default, Finagle clients format metrics as follows:

clnt/<label>/<metric>.<sfx>
srv/<label>/<metric>.<sfx>

Router stats don't exactly fit into the model--for instance the bindcache stats are currently unscoped (and therefore lossy/confusing when multiple routers exist). I'd like to propose a new naming scheme that more closely reflects reality and carves out space for furhther expansion:

rt/<router-label>/bindcache/...
rt/<router-label>/dst/path/<unbound-dst-path>/...
rt/<router-label>/dst/id/<bound-dst-id>/...
rt/<router-label>/srv/<server-ip>/<server-port/...
rt/<router-label>/...

This allows us to potentially expose multiple scopes of client stats (e.g. logical sr vs cluster sr -- logical is what you monitor, cluster is how you compare code versions).

Later, various stats scopes should be controllable should be controllable via configuration parameter (because for any large site, there may be many many unbound-dst-path values).

In order to complete this, the following things must happen:

  • we need to craft stats scopes during router initialization
  • the admin page must be updated to map client and server stat names.

Note: It is technically feasible to preserve the existing stat name structure, however to do so incurs complexity on both initialization and consumers for marginal value. For example, we could try to construct names like:

clnt/<router-label>/<bound-dst-id>/...
srv/<router-label>/<ip>/<port>/...
rt/<router-label>/bindcache
rt/<router-label/...

Regardless of implementation details, I think it's preferable to establish a single naming hierarchy for router stats. It also happens to be the case that it's much simpler to implement hierarchical stats scoping in router initialization.

Namer and Tracer stats should be scoped by namer/<label> and tracer/<label>, respectively.

This change will render some parts of TwitterServer's Admin page obsolete, but we intend to replace TwitterServer in favor of a config-driven admin page anyway. I don't believe that this change breaks any other APIs that we care about.

@siggy
Copy link
Member

siggy commented Jan 20, 2016

This sounds good. I think we should complete #16 prior to this work. Specifically, booting our own admin server, and serving our own metrics.json endpoint.

@olix0r
Copy link
Member Author

olix0r commented Jan 20, 2016

@siggy I'd like to move forward with this independently of #16, since we need stats receivers to become sane to complete #6. I don't think that there are actually any dependencies we care about between these issues, so they should be able to happen in more-or-less parallel.

olix0r added a commit that referenced this issue Jan 26, 2016
- all router stats are scoped under rt/$label
- client stats are scoped under rt/$label/dst/id/$id
- server stats are scoped under rt/$label/srv/$host/$port
- later, additional client stats will be scoped under rt/$label/dst/path/$path.

The admin interface has been changed so that metrics processing results in
information about routers (rather than just clients and servers).  This data is
used to power the same UI without changes, although UI changes should follow
this in a followup review.

If a Router is configured with a StatsReceiver or Tracer, this configuration is
not currently propagated to Servers.  This Router configuration should be
shared with its servers so that defaults may be specified and so that, later,
these may be configured in a per-router fashion.  This change allows Tracers
and scoped StatsReceivers to be shared from a Router to its clients and
servers.

Fixes #14
@olix0r olix0r removed the reviewable label Jan 26, 2016
Tim-Brooks pushed a commit to Tim-Brooks/linkerd that referenced this issue Dec 20, 2018
Problem:
Simulate proxy would seemingly hang when used.
In simulate-proxy we were using rand.Uint32() to generate Count. This is way too big (in telemetry/server.go we call latencyStat.observe() Count times, so this loop was taking fovever).

Solution:
Use a count of 1 (as the surrounding loop will generate count requests)

Validation:
Script now works without hanging.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants