Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/oracledb] Fix incorrect values for a couple of metrics #32028

Merged
merged 2 commits into from
Mar 28, 2024

Conversation

crobert-1
Copy link
Member

Description:

Values were being scraped incorrectly for the metrics oracledb.tablespace_size.limit and oracledb.tablespace_size.usage. The changes these metrics to be scraped from the DBA_TABLESPACE_USAGE_METRICS table. This results in a slight loss of granularity in these metrics, as values will always be in multiples of the respective tablespace's block size, but I think the clarity and simplicity is worth the trade off.

Note: The value of the usage metric was generally close to the expected value, but the limit was being calculated as potential theoretical capacity, unbound by server capacity. For example, in testing in a docker container on my local machine, limit was set to 17TB. This doesn't line up with user expectations.

Link to tracking Issue:
Fixes #31451

Testing:
Updated existing tests, added a couple new ones.

Also, the original issue filed was comparing DBA_TABLESPACE_USAGE_METRICS output for percent used to what we got from usage/limit * 100. Here's the local testing outputs compared to show they now line up.

2024-03-27T16:31:57.938-0700    info    oracledbreceiver/scraper.go:285 DBA_TABLESPACE_USAGE_METRICS: Tablespace name: SYSTEM, used space: 111288, tablespace size: 3518587, percent used: 3.16286054600895188892586711654422641816    {"kind": "receiver", "name": "oracledb", "data_type": "metrics"}
Metric #20
Descriptor:
     -> Name: oracledb.tablespace_size.usage
     -> Description: Used tablespace in bytes.
     -> Unit: By
     -> DataType: Gauge
NumberDataPoints #0
Data point attributes:
     -> tablespace_name: Str(SYSTEM)
StartTimestamp: 2024-03-27 23:31:56.873576 +0000 UTC
Timestamp: 2024-03-27 23:32:12.523295 +0000 UTC
Value: 911671296
Metric #19
Descriptor:
     -> Name: oracledb.tablespace_size.limit
     -> Description: Maximum size of tablespace in bytes, -1 if unlimited.
     -> Unit: By
     -> DataType: Gauge
NumberDataPoints #0
Data point attributes:
     -> tablespace_name: Str(SYSTEM)
StartTimestamp: 2024-03-27 23:31:56.873576 +0000 UTC
Timestamp: 2024-03-27 23:32:12.523295 +0000 UTC
Value: 28824264704

Doing the same calculation, we get:

(911671296 / 28824264704) * 100 = ~3.16%

@@ -233,7 +233,6 @@ metrics:
enabled: true
gauge:
value_type: int
input_type: string
Copy link
Member Author

@crobert-1 crobert-1 Mar 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic for setting this metric's value now requires two DB values to be multiplied by each other. This means we have to manually convert both values to int64, multiply them, then use that result to set the value here. Converting back to string just to convert to int64 again is unnecessary, so I've removed this option.

@@ -38,8 +38,11 @@ const (
consistentGets = "consistent gets"
sessionCountSQL = "select status, type, count(*) as VALUE FROM v$session GROUP BY status, type"
systemResourceLimitsSQL = "select RESOURCE_NAME, CURRENT_UTILIZATION, LIMIT_VALUE, CASE WHEN TRIM(INITIAL_ALLOCATION) LIKE 'UNLIMITED' THEN '-1' ELSE TRIM(INITIAL_ALLOCATION) END as INITIAL_ALLOCATION, CASE WHEN TRIM(LIMIT_VALUE) LIKE 'UNLIMITED' THEN '-1' ELSE TRIM(LIMIT_VALUE) END as LIMIT_VALUE from v$resource_limit"
tablespaceUsageSQL = "select TABLESPACE_NAME, BYTES from DBA_DATA_FILES"
tablespaceMaxSpaceSQL = "select TABLESPACE_NAME, (BLOCK_SIZE*MAX_EXTENTS) AS VALUE FROM DBA_TABLESPACES"
tablespaceUsageSQL = `
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feedback on indentation here would be helpful, not sure what's the most clear here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was readable enough. I had a bit of hesitation that the query itself is 2 separate subqueries, joined via a where statement with a bigger query. It feels like an inner join could do the job, but my SQL is rusty and this is an improvement over the broken functionality. Maybe something to explore if it becomes a performance flare up, which I doubt.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point. I added a benchmark test locally to compare this query vs. inner join, here are the results.
Inner join:

BenchmarkSQLUsageMetrics-16    	     150	   6881141 ns/op
BenchmarkSQLUsageMetrics-16    	     180	   6503225 ns/op
BenchmarkSQLUsageMetrics-16    	     172	   6594906 ns/op
BenchmarkSQLUsageMetrics-16    	     154	   6556032 ns/op

Two separate queries:

BenchmarkSQLUsageMetrics-16    	     132	   8800545 ns/op
BenchmarkSQLUsageMetrics-16    	     122	   8808408 ns/op
BenchmarkSQLUsageMetrics-16    	     118	   9132146 ns/op
BenchmarkSQLUsageMetrics-16    	     120	   8762194 ns/op

I'll update to use the inner join 👍

@atoulme atoulme changed the title [recevier/oracledb] Fix incorrect values for a couple of metrics [receiver/oracledb] Fix incorrect values for a couple of metrics Mar 28, 2024
@dmitryax dmitryax merged commit ac74006 into open-telemetry:main Mar 28, 2024
142 checks passed
dmitryax pushed a commit that referenced this pull request Apr 16, 2024
**Description:** 
An additional permission is now required in some cases of running the
Oracle DB receiver, as a result of
#32028.
We should document it in the README.

I also upgraded this change to a breaking change to make it clear to
users that new permissions are needed.

**Link to tracking Issue:** Resolves #32373
rimitchell pushed a commit to rimitchell/opentelemetry-collector-contrib that referenced this pull request May 8, 2024
…n-telemetry#32028)

**Description:** 
Values were being scraped incorrectly for the metrics
`oracledb.tablespace_size.limit` and `oracledb.tablespace_size.usage`.
The changes these metrics to be scraped from the
[`DBA_TABLESPACE_USAGE_METRICS`](https://docs.oracle.com/en/database/oracle/oracle-database/19/refrn/DBA_TABLESPACE_USAGE_METRICS.html#GUID-FE479528-BB37-4B55-92CF-9EC19EDF4F46)
table. This results in a slight loss of granularity in these metrics, as
values will always be in multiples of the respective tablespace's block
size, but I think the clarity and simplicity is worth the trade off.

Note: The value of the usage metric was generally close to the expected
value, but the limit was being calculated as potential theoretical
capacity, unbound by server capacity. For example, in testing in a
docker container on my local machine, limit was set to **17TB**. This
doesn't line up with user expectations.

**Link to tracking Issue:**
Fixes
open-telemetry#31451

**Testing:** 
Updated existing tests, added a couple new ones.

Also, the original issue filed was comparing
`DBA_TABLESPACE_USAGE_METRICS` output for percent used to what we got
from `usage/limit * 100`. Here's the local testing outputs compared to
show they now line up.
```
2024-03-27T16:31:57.938-0700    info    oracledbreceiver/scraper.go:285 DBA_TABLESPACE_USAGE_METRICS: Tablespace name: SYSTEM, used space: 111288, tablespace size: 3518587, percent used: 3.16286054600895188892586711654422641816    {"kind": "receiver", "name": "oracledb", "data_type": "metrics"}
```

```
Metric open-telemetry#20
Descriptor:
     -> Name: oracledb.tablespace_size.usage
     -> Description: Used tablespace in bytes.
     -> Unit: By
     -> DataType: Gauge
NumberDataPoints #0
Data point attributes:
     -> tablespace_name: Str(SYSTEM)
StartTimestamp: 2024-03-27 23:31:56.873576 +0000 UTC
Timestamp: 2024-03-27 23:32:12.523295 +0000 UTC
Value: 911671296
```

```
Metric open-telemetry#19
Descriptor:
     -> Name: oracledb.tablespace_size.limit
     -> Description: Maximum size of tablespace in bytes, -1 if unlimited.
     -> Unit: By
     -> DataType: Gauge
NumberDataPoints #0
Data point attributes:
     -> tablespace_name: Str(SYSTEM)
StartTimestamp: 2024-03-27 23:31:56.873576 +0000 UTC
Timestamp: 2024-03-27 23:32:12.523295 +0000 UTC
Value: 28824264704
```
Doing the same calculation, we get:
```
(911671296 / 28824264704) * 100 = ~3.16%
```
rimitchell pushed a commit to rimitchell/opentelemetry-collector-contrib that referenced this pull request May 8, 2024
…elemetry#32389)

**Description:** 
An additional permission is now required in some cases of running the
Oracle DB receiver, as a result of
open-telemetry#32028.
We should document it in the README.

I also upgraded this change to a breaking change to make it clear to
users that new permissions are needed.

**Link to tracking Issue:** Resolves open-telemetry#32373
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants