Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PrometheusLayer: Add a new metrics called bytes_duration_seconds #4644

Closed
Xuanwo opened this issue May 27, 2024 · 4 comments · Fixed by #4691
Closed

PrometheusLayer: Add a new metrics called bytes_duration_seconds #4644

Xuanwo opened this issue May 27, 2024 · 4 comments · Fixed by #4691
Labels
core good first issue Good for newcomers help wanted Extra attention is needed

Comments

@Xuanwo
Copy link
Member

Xuanwo commented May 27, 2024

          > Another blocker is that the layer observes the timer before returning `Writer`/`Reader`. So it doesn't record the time spent by the `Writer`/`Reader`.

The idea of tracking the duration for a reader and writer seems illogical to me because it might include irrelevant time periods. For instance, if you create a new writer and have them write every 10 seconds, measuring their duration doesn't make sense.

The current time only measures the cost of read/write, users can use this to know if they spent too much time on opening a reader/writer.

I think we can add a new metrics called bytes_duration_seconds, so we can observe the time we spent on reading/writing bytes.

Originally posted by @Xuanwo in GreptimeTeam/greptimedb#4041 (comment)

@Xuanwo Xuanwo added good first issue Good for newcomers help wanted Extra attention is needed core labels May 27, 2024
@evenyag
Copy link
Contributor

evenyag commented May 27, 2024

The idea of tracking the duration for a reader and writer seems illogical to me because it might include irrelevant time periods. For instance, if you create a new writer and have them write every 10 seconds, measuring their duration doesn't make sense.

Understood. I also have the same concerns.

The current time only measures the cost of read/write, users can use this to know if they spent too much time on opening a reader/writer.

I think we can add a new metrics called bytes_duration_seconds, so we can observe the time we spent on reading/writing bytes.

I found that the three different metrics layer in OpenDAL observes the request duration in different ways.

MetricsLayer

The MetricsLayer observes the time spent on reading/writing bytes to opendal_requests_duration_seconds. It doesn't observe the time creating the reader/writer.

impl<R: oio::Write> oio::Write for MetricWrapper<R> {
async fn write(&mut self, bs: Buffer) -> Result<usize> {
let start = Instant::now();
self.inner
.write(bs)
.await
.map(|n| {
self.bytes_counter.increment(n as u64);
self.requests_duration_seconds
.record(start.elapsed().as_secs_f64());
n
})
.map_err(|err| {
self.handle.increment_errors_total(self.op, err.kind());
err
})
}

There is one exception: The block_write
fn blocking_write(&self, path: &str, args: OpWrite) -> Result<(RpWrite, Self::BlockingWriter)> {
self.handle.requests_total_blocking_write.increment(1);
let start = Instant::now();
let result = self.inner.blocking_write(path, args);
let dur = start.elapsed().as_secs_f64();
self.handle
.requests_duration_seconds_blocking_write
.record(dur);

PrometheusLayer

The PrometheusLayer only records the time creating the reader/writer to requests_duration_seconds

let timer = self
.stats
.requests_duration_seconds
.with_label_values(&labels)
.start_timer();
let write_res = self
.inner
.write(path, args)
.map(|v| {
v.map(|(rp, r)| {
(
rp,
PrometheusMetricWrapper::new(
r,
Operation::Write,
self.stats.clone(),
self.scheme,
&path.to_string(),
),
)
})
})
.await;
timer.observe_duration();

PrometheusClientLayer

The PrometheusClientLayer records both the writer/reader creation time and bytes reading/writing time to opendal_request_duration_seconds

async fn write(&mut self, bs: Buffer) -> Result<usize> {
let start = Instant::now();
self.inner
.write(bs)
.await
.map(|n| {
self.metrics.observe_bytes_total(self.scheme, self.op, n);
self.metrics
.observe_request_duration(self.scheme, self.op, start.elapsed());
n
})

impl<R> Drop for PrometheusMetricWrapper<R> {
fn drop(&mut self) {
self.metrics
.observe_bytes_total(self.scheme, self.op, self.bytes_total);
self.metrics
.observe_request_duration(self.scheme, self.op, self.start_time.elapsed());
}

Document

According to the document, users might expect to observe the time spent on reading/writing bytes from requests_duration_seconds.

/// requests_duration_seconds records the duration seconds of successful
/// requests.
///
/// # NOTE
///
/// this metric will track the whole lifetime of this request:
///
/// - Building request
/// - Sending request
/// - Receiving response
/// - Consuming response
static METRIC_REQUESTS_DURATION_SECONDS: &str = "opendal_requests_duration_seconds";

/// | requests_duration_seconds | Histogram | Histogram of the time spent on specific operation | scheme, operation |

Questions

  • Maybe we should add a new metric or label to record the cost of opening a reader/writer? e.g. label value open_writer/open_blocking_writer
  • Should we use a consistent way to record the request duration?

@Xuanwo
Copy link
Member Author

Xuanwo commented May 27, 2024

Yes, there are historical reasons why the two layers did not evolve simultaneously. And we do need to use a consistent way to record the request duration.

Our pattern will be:

  • opendal_requests_duration_seconds{operation}: operation could be Operation or ReadOperation, like read and Read::read.

We don't need seperate metric for open_writer or open_blocking_writer, they should be covered by opendal_requests_duration_seconds{operation="write"} and opendal_requests_duration_seconds{operation="blocking_write"}.

@evenyag
Copy link
Contributor

evenyag commented May 29, 2024

Do we still need bytes_duration_seconds?

opendal_requests_duration_seconds{operation}: operation could be Operation or ReadOperation, like read and Read::read.

Do you mean that opendal_requests_duration_seconds{operation="read"} measures the cost of opening a reader, and opendal_requests_duration_seconds{operation="Read::read"} measures the time of reading bytes?

Will we have a metric opendal_requests_duration_seconds{operation="Write::write"} to record the time of writing bytes?

@Xuanwo
Copy link
Member Author

Xuanwo commented Jun 5, 2024

Hi, @evenyag, your comment is correct. I have fixed in PR #4691, welcome to check it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants