Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JMX Metric Insight #6573

Merged
merged 21 commits into from
Nov 16, 2022
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
d496304
JMX Metric Insight
PeterF778 Sep 9, 2022
58f9e19
JMX Metric Insight
PeterF778 Sep 9, 2022
5f29c94
JMX Metric Insight
PeterF778 Sep 23, 2022
9dfe6aa
Updating the documentation to clarify that examples are just for illu…
PeterF778 Sep 26, 2022
7cebb63
Code refactoring and changes, moving to own module :instrumentation:jmx.
PeterF778 Sep 28, 2022
0ca7a0d
Cleanup in runtime-metrics.
PeterF778 Sep 28, 2022
009dff6
Removing kafka-consumer and kafka-producer metrics from the list of s…
PeterF778 Oct 4, 2022
b2ea55d
Fixing the issue with always null logger in AttributeValueExtractor.
PeterF778 Oct 5, 2022
5c0783d
Correcting a typo
PeterF778 Oct 5, 2022
7f89168
Merging libraries jmx-engine and jmx-yaml into one.
PeterF778 Oct 6, 2022
7e151ae
Cleanup - removing leftover file.
PeterF778 Oct 6, 2022
46198c2
Making changes to the accepted YAML syntax: replacing 'label' with 'a…
PeterF778 Oct 7, 2022
112a04d
Using "metricAttribute" in place of "label" or "attribute", "beanattr…
PeterF778 Oct 28, 2022
252be45
Refining comments
PeterF778 Oct 31, 2022
239b42a
Merge remote-tracking branch 'upstream/main' into jmx-metric-insight
trask Oct 31, 2022
830a18f
Post-review code changes
PeterF778 Nov 3, 2022
fcd6ca0
Revert "Post-review code changes"
PeterF778 Nov 3, 2022
5afd3ce
Post-review code changes
PeterF778 Nov 3, 2022
87f166a
Post-review code changes, renaming, more comments.
PeterF778 Nov 7, 2022
a3b40f3
Post-review code changes, more renaming.
PeterF778 Nov 9, 2022
c509c8b
Post-review change: class renaming
PeterF778 Nov 15, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
272 changes: 272 additions & 0 deletions instrumentation/jmx-metrics/javaagent/README.md

Large diffs are not rendered by default.

17 changes: 17 additions & 0 deletions instrumentation/jmx-metrics/javaagent/activemq.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# ActiveMQ Metrics

Here is the list of metrics based on MBeans exposed by ActiveMQ.

| Metric Name | Type | Attributes | Description |
| ---------------- | --------------- | ---------------- | --------------- |
| activemq.ProducerCount | UpDownCounter | destination, broker | The number of producers attached to this destination |
| activemq.ConsumerCount | UpDownCounter | destination, broker | The number of consumers subscribed to this destination |
| activemq.memory.MemoryPercentUsage | Gauge | destination, broker | The percentage of configured memory used |
| activemq.message.QueueSize | UpDownCounter | destination, broker | The current number of messages waiting to be consumed |
| activemq.message.ExpiredCount | Counter | destination, broker | The number of messages not delivered because they expired |
| activemq.message.EnqueueCount | Counter | destination, broker | The number of messages sent to this destination |
| activemq.message.DequeueCount | Counter | destination, broker | The number of messages acknowledged and removed from this destination |
| activemq.message.AverageEnqueueTime | Gauge | destination, broker | The average time a message was held on this destination |
| activemq.connections.CurrentConnectionsCount | UpDownCounter | | The total number of current connections |
| activemq.disc.StorePercentUsage | Gauge | | The percentage of configured disk used for persistent messages |
| activemq.disc.TempPercentUsage | Gauge | | The percentage of configured disk used for non-persistent messages |
9 changes: 9 additions & 0 deletions instrumentation/jmx-metrics/javaagent/build.gradle.kts
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
plugins {
id("otel.javaagent-instrumentation")
}

dependencies {
implementation(project(":instrumentation:jmx-metrics:library"))

compileOnly("io.opentelemetry:opentelemetry-sdk-extension-autoconfigure")
}
15 changes: 15 additions & 0 deletions instrumentation/jmx-metrics/javaagent/hadoop.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Hadoop Metrics

Here is the list of metrics based on MBeans exposed by Hadoop.

| Metric Name | Type | Attributes | Description |
|-----------------------------------|---------------|------------------|-------------------------------------------------------|
| hadoop.capacity.CapacityUsed | UpDownCounter | node_name | Current used capacity across all data nodes |
| hadoop.capacity.CapacityTotal | UpDownCounter | node_name | Current raw capacity of data nodes |
| hadoop.block.BlocksTotal | UpDownCounter | node_name | Current number of allocated blocks in the system |
| hadoop.block.MissingBlocks | UpDownCounter | node_name | Current number of missing blocks |
| hadoop.block.CorruptBlocks | UpDownCounter | node_name | Current number of blocks with corrupt replicas |
| hadoop.volume.VolumeFailuresTotal | UpDownCounter | node_name | Total number of volume failures across all data nodes |
| hadoop.file.FilesTotal | UpDownCounter | node_name | Current number of files and directories |
| hadoop.file.TotalLoad | UpDownCounter | node_name | Current number of connection |
| hadoop.datanode.Count | UpDownCounter | node_name, state | The Number of data nodes |
16 changes: 16 additions & 0 deletions instrumentation/jmx-metrics/javaagent/jetty.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Jetty Metrics

Here is the list of metrics based on MBeans exposed by Jetty.

| Metric Name | Type | Attributes | Description |
|--------------------------------|---------------|--------------|------------------------------------------------------|
| jetty.session.sessionsCreated | Counter | resource | The number of sessions established in total |
| jetty.session.sessionTimeTotal | Counter | resource | The total time sessions have been active |
| jetty.session.sessionTimeMax | Gauge | resource | The maximum amount of time a session has been active |
| jetty.session.sessionTimeMean | Gauge | resource | The mean time sessions remain active |
| jetty.threads.busyThreads | UpDownCounter | | The current number of busy threads |
| jetty.threads.idleThreads | UpDownCounter | | The current number of idle threads |
| jetty.threads.maxThreads | UpDownCounter | | The maximum number of threads in the pool |
| jetty.threads.queueSize | UpDownCounter | | The current number of threads in the queue |
| jetty.io.selectCount | Counter | resource, id | The number of select calls |
| jetty.logging.LoggerCount | UpDownCounter | | The number of registered loggers by name |
32 changes: 32 additions & 0 deletions instrumentation/jmx-metrics/javaagent/kafka-broker.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Kafka Broker Metrics

Here is the list of metrics based on MBeans exposed by Kafka broker. <br /><br />
Broker metrics:

| Metric Name | Type | Attributes | Description |
|------------------------------------|---------------|------------|----------------------------------------------------------------------|
| kafka.message.count | Counter | | The number of messages received by the broker |
| kafka.request.count | Counter | type | The number of requests received by the broker |
| kafka.request.failed | Counter | type | The number of requests to the broker resulting in a failure |
| kafka.request.time.total | Counter | type | The total time the broker has taken to service requests |
| kafka.request.time.50p | Gauge | type | The 50th percentile time the broker has taken to service requests |
| kafka.request.time.99p | Gauge | type | The 99th percentile time the broker has taken to service requests |
| kafka.request.queue | UpDownCounter | | Size of the request queue |
| kafka.network.io | Counter | direction | The bytes received or sent by the broker |
| kafka.purgatory.size | UpDownCounter | type | The number of requests waiting in purgatory |
| kafka.partition.count | UpDownCounter | | The number of partitions on the broker |
| kafka.partition.offline | UpDownCounter | | The number of partitions offline |
| kafka.partition.underReplicated | UpDownCounter | | The number of under replicated partitions |
| kafka.isr.operation.count | UpDownCounter | operation | The number of in-sync replica shrink and expand operations |
| kafka.lag.max | Gauge | | The max lag in messages between follower and leader replicas |
| kafka.controller.active.count | UpDownCounter | | The number of controllers active on the broker |
| kafka.leaderElection.count | Counter | | The leader election count |
| kafka.leaderElection.unclean.count | Counter | | Unclean leader election count - increasing indicates broker failures |
<br />
Log metrics:

| Metric Name | Type | Attributes | Description |
|---------------------------|---------|------------|----------------------------------|
| kafka.logs.flush.count | Counter | | Log flush count |
| kafka.logs.flush.time.50p | Gauge | | Log flush time - 50th percentile |
| kafka.logs.flush.time.99p | Gauge | | Log flush time - 99th percentile |
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
/*
* Copyright The OpenTelemetry Authors
* SPDX-License-Identifier: Apache-2.0
*/

package io.opentelemetry.instrumentation.javaagent.jmx;

import static java.util.logging.Level.CONFIG;
import static java.util.logging.Level.FINE;

import com.google.auto.service.AutoService;
import io.opentelemetry.api.GlobalOpenTelemetry;
import io.opentelemetry.instrumentation.jmx.engine.JmxMetricInsight;
import io.opentelemetry.instrumentation.jmx.engine.MetricConfiguration;
import io.opentelemetry.instrumentation.jmx.yaml.RuleParser;
import io.opentelemetry.javaagent.extension.AgentListener;
import io.opentelemetry.sdk.autoconfigure.AutoConfiguredOpenTelemetrySdk;
import io.opentelemetry.sdk.autoconfigure.spi.ConfigProperties;
import java.io.File;
import java.io.InputStream;
import java.nio.file.Files;

/** An {@link AgentListener} that enables JMX metrics during agent startup. */
@AutoService(AgentListener.class)
public class JmxMetricInsightInstaller implements AgentListener {

@Override
public void afterAgent(AutoConfiguredOpenTelemetrySdk autoConfiguredSdk) {
ConfigProperties config = autoConfiguredSdk.getConfig();

if (config.getBoolean("otel.jmx.enabled", true)) {
JmxMetricInsight service =
JmxMetricInsight.createService(GlobalOpenTelemetry.get(), beanDiscoveryDelay(config));
MetricConfiguration conf = buildMetricConfiguration();
service.start(conf);
}
}

private static long beanDiscoveryDelay(ConfigProperties configProperties) {
Long discoveryDelay = configProperties.getLong("otel.jmx.discovery.delay");
if (discoveryDelay != null) {
return discoveryDelay;
}

// If discovery delay has not been configured, have a peek at the metric export interval.
// It makes sense for both of these values to be similar.
long exportInterval = configProperties.getLong("otel.metric.export.interval", 60000);
return exportInterval;
}

private static String resourceFor(String platform) {
return "/jmx/rules/" + platform + ".yaml";
}

private static void addRulesForPlatform(String platform, MetricConfiguration conf) {
String yamlResource = resourceFor(platform);
try (InputStream inputStream =
JmxMetricInsightInstaller.class.getResourceAsStream(yamlResource)) {
if (inputStream != null) {
JmxMetricInsight.getLogger().log(FINE, "Opened input stream {0}", yamlResource);
RuleParser parserInstance = RuleParser.get();
parserInstance.addMetricDefs(conf, inputStream);
} else {
JmxMetricInsight.getLogger().log(CONFIG, "No support found for {0}", platform);
}
} catch (Exception e) {
JmxMetricInsight.getLogger().warning(e.getMessage());
}
}

private static void buildFromDefaultRules(MetricConfiguration conf) {
String targetSystem = System.getProperty("otel.jmx.target.system", "").trim();
PeterF778 marked this conversation as resolved.
Show resolved Hide resolved
String[] platforms = targetSystem.length() == 0 ? new String[0] : targetSystem.split(",");
PeterF778 marked this conversation as resolved.
Show resolved Hide resolved

for (String platform : platforms) {
addRulesForPlatform(platform, conf);
}
}

private static void buildFromUserRules(MetricConfiguration conf) {
String jmxDir = System.getProperty("otel.jmx.config");
PeterF778 marked this conversation as resolved.
Show resolved Hide resolved
if (jmxDir != null) {
JmxMetricInsight.getLogger().log(CONFIG, "JMX config file name: {0}", jmxDir);
PeterF778 marked this conversation as resolved.
Show resolved Hide resolved
RuleParser parserInstance = RuleParser.get();
try (InputStream inputStream = Files.newInputStream(new File(jmxDir.trim()).toPath())) {
parserInstance.addMetricDefs(conf, inputStream);
} catch (Exception e) {
JmxMetricInsight.getLogger().warning(e.getMessage());
}
}
}

private static MetricConfiguration buildMetricConfiguration() {
MetricConfiguration conf = new MetricConfiguration();

buildFromDefaultRules(conf);

buildFromUserRules(conf);

return conf;
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
---
rules:

- beans:
- org.apache.activemq:type=Broker,brokerName=*,destinationType=Queue,destinationName=*
- org.apache.activemq:type=Broker,brokerName=*,destinationType=Topic,destinationName=*
label:
destination: param(destinationName)
broker: param(brokerName)
prefix: activemq.
mapping:
ProducerCount:
unit: '{producers}'
type: updowncounter
desc: The number of producers attached to this destination
ConsumerCount:
unit: '{consumers}'
type: updowncounter
desc: The number of consumers subscribed to this destination
MemoryPercentUsage:
metric: memory.MemoryPercentUsage
unit: '%'
type: gauge
desc: The percentage of configured memory used
QueueSize:
metric: message.QueueSize
unit: '{messages}'
type: updowncounter
desc: The current number of messages waiting to be consumed
ExpiredCount:
metric: message.ExpiredCount
unit: '{messages}'
type: counter
desc: The number of messages not delivered because they expired
EnqueueCount:
metric: message.EnqueueCount
unit: '{messages}'
type: counter
desc: The number of messages sent to this destination
DequeueCount:
metric: message.DequeueCount
unit: '{messages}'
type: counter
desc: The number of messages acknowledged and removed from this destination
AverageEnqueueTime:
metric: message.AverageEnqueueTime
unit: ms
type: gauge
desc: The average time a message was held on this destination

- bean: org.apache.activemq:type=Broker,brokerName=*
prefix: activemq.
unit: '%'
type: gauge
mapping:
CurrentConnectionsCount:
metric: connections.CurrentConnectionsCount
type: updowncounter
unit: '{connections}'
desc: The total number of current connections
StorePercentUsage:
metric: disc.StorePercentUsage
desc: The percentage of configured disk used for persistent messages
TempPercentUsage:
metric: disc.TempPercentUsage
desc: The percentage of configured disk used for non-persistent messages
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
---
rules:
- bean: Hadoop:service=NameNode,name=FSNamesystem
unit: 1
prefix: hadoop.
label:
node_name: param(tag.Hostname)
mapping:
CapacityUsed:
metric: capacity.CapacityUsed
type: updowncounter
unit: By
desc: Current used capacity across all data nodes
CapacityTotal:
metric: capacity.CapacityTotal
type: updowncounter
unit: By
BlocksTotal:
metric: block.BlocksTotal
type: updowncounter
unit: '{blocks}'
desc: Current number of allocated blocks in the system
MissingBlocks:
metric: block.MissingBlocks
type: updowncounter
unit: '{blocks}'
desc: Current number of missing blocks
CorruptBlocks:
metric: block.CorruptBlocks
type: updowncounter
unit: '{blocks}'
desc: Current number of blocks with corrupt replicas
VolumeFailuresTotal:
metric: volume.VolumeFailuresTotal
type: updowncounter
unit: '{volumes}'
desc: Total number of volume failures across all data nodes
label:
direction: sent
FilesTotal:
metric: file.FilesTotal
type: updowncounter
unit: '{files}'
desc: Current number of files and directories
TotalLoad:
metric: file.TotalLoad
type: updowncounter
unit: '{operations}'
desc: Current number of connections
NumLiveDataNodes:
metric: datenode.Count
type: updowncounter
unit: '{nodes}'
desc: The Number of data nodes
label:
state: live
NumDeadDataNodes:
metric: datenode.Count
type: updowncounter
unit: '{nodes}'
desc: The Number of data nodes
label:
state: dead
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
---
rules:

- bean: org.eclipse.jetty.server.session:context=*,type=sessionhandler,id=*
unit: s
prefix: jetty.session.
type: updowncounter
label:
resource: param(context)
mapping:
sessionsCreated:
unit: '{sessions}'
type: counter
desc: The number of sessions established in total
sessionTimeTotal:
type: counter
desc: The total time sessions have been active
sessionTimeMax:
type: gauge
desc: The maximum amount of time a session has been active
sessionTimeMean:
type: gauge
desc: The mean time sessions remain active

- bean: org.eclipse.jetty.util.thread:type=queuedthreadpool,id=*
prefix: jetty.threads.
unit: '{threads}'
type: updowncounter
mapping:
busyThreads:
desc: The current number of busy threads
idleThreads:
desc: The current number of idle threads
maxThreads:
desc: The maximum number of threads in the pool
queueSize:
desc: The current number of threads in the queue

- bean: org.eclipse.jetty.io:context=*,type=managedselector,id=*
prefix: jetty.io.
label:
resource: param(context)
id: param(id)
mapping:
selectCount:
type: counter
unit: 1
desc: The number of select calls

- bean: org.eclipse.jetty.logging:type=jettyloggerfactory,id=*
prefix: jetty.logging.
mapping:
LoggerCount:
type: updowncounter
unit: 1
desc: The number of registered loggers by name
Loading