Skip to content

Commit

Permalink
feat(performance-metrics): system resource tracker configuration (#15598
Browse files Browse the repository at this point in the history
)

# Overview

Adds the ability to configure the behavior of the system resource
tracker monitoring script through environment variables.

Closes [EXEC-597](https://opentrons.atlassian.net/browse/EXEC-597)

# Test Plan

- [x] Pull this branch
- [x] Go to `performance-metrics` project
- [x] Run `make teardown && make setup` to ensure python env is up to
date
- [x] Ensure performance metrics ff is enabled by running `make
set-performance-metrics-ff host=<robot_ip>`
- [x] Push changes to Flex or Dev Kit by running ` make
setup-remote-flex host=10.10.10.151 ssh_key=~/.ssh/flex_ssh_key`
- [ ] ssh to the robot
- [x] Wipe any existing performance metrics data `rm -fr
/data/performance_metrics_data`
- [x] Go to `/opt/opentrons-robot-server`
- [x] Run `python3 -m performance_metrics.system_resource_tracker`
  - [x] Verify you get a log message that looks like the following
```
2024-07-09 14:31:40,347 - __main__ - main() - INFO - Running with the following configuration: 
enabled=False
process_filters=('/opt/opentrons*', 'python3*')
refresh_interval=10.0
storage_dir=/data/performance_metrics_data
logging_level=INFO
```
  - [x] Kill the process with CTRL + C 
- [ ] Ensure that system resources are captured by looking at
system_resource_tracker files in /data/performance_metrics
- [ ] Run `OT_SYSTEM_RESOURCE_TRACKER_REFRESH_INTERVAL=15.0
OT_SYSTEM_RESOURCE_TRACKER_LOGGING_LEVEL=DEBUG python3 -m
performance_metrics.system_resource_tracker`
- [ ] Verify the initial log message about the configuration has changed
to something like:
```
2024-07-09 14:31:40,347 - __main__ - main() - INFO - Running with the following configuration: 
enabled=False
process_filters=('/opt/opentrons*', 'python3*')
refresh_interval=15.0
storage_dir=/data/performance_metrics_data
logging_level=DEBUG
```
- [ ] Verify you are now getting a ton more log messages but less
frequently
  - [ ] Kill the process with CTRL + C 
- [ ] Ensure that more system resources captures have been added by
looking at system_resource_tracker files in /data/performance_metrics

# Changelog

- Add make targets to assist in setting up performance-metrics on robot
- Add `performance_metrics.system_resource_tracker._config` containing
SystemResourceTrackerConfiguration class which has all values that can
be configured for the SystemResourceTracker
- Update SystemResourceTracker to use SystemResourceTrackerConfiguration
- Add pulling config from environment variables and using it in
__main__.py
- Add logging for debugging purposes
- Add tests

# Review requests

I configured all the performance metrics loggers to use the same name so
I can update the log level of all of them inside of main. Is this the
correct way to ensure all of them get updated? Or is there a better way

# Risk assessment

Low


[EXEC-597]:
https://opentrons.atlassian.net/browse/EXEC-597?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ
  • Loading branch information
DerekMaggio committed Jul 10, 2024
1 parent deb34b0 commit 3a750e7
Show file tree
Hide file tree
Showing 9 changed files with 386 additions and 41 deletions.
29 changes: 28 additions & 1 deletion performance-metrics/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -112,4 +112,31 @@ unset-performance-metrics-ff:

.PHONY: test
test:
$(pytest) tests
$(pytest) tests

.PHONY: setup-remote-flex
setup-remote-flex:

@echo "Setting up remote Flex..."
@echo "Pushing performance_metrics to Flex"

@$(MAKE) push-no-restart-ot3 host=$(host) ssh_key=$(ssh_key) 2>&1 | grep -v "Permanently added" > /dev/null

@echo "Pushing api to Flex"
@$(MAKE) -C ../api push-no-restart-ot3 host=$(host) ssh_key=$(ssh_key) 2>&1 | grep -v "Permanently added" > /dev/null

@echo "Pushing robot-server to Flex"
@$(MAKE) -C ../robot-server push-ot3 host=$(host) ssh_key=$(ssh_key) 2>&1 | grep -v "Permanently added" > /dev/null


.PHONY: start-remote-system-resource-tracker
start-remote-system-resource-tracker:
@echo "Triggering system resource tracker on host $(host)..."
@ssh -i $(ssh_key) root@$(host) \
"cd /opt/opentrons-robot-server; \
OT_SYSTEM_RESOURCE_TRACKER_ENABLED=true \
$${refresh_interval:+OT_SYSTEM_RESOURCE_TRACKER_REFRESH_INTERVAL=$$refresh_interval} \
$${process_filters:+OT_SYSTEM_RESOURCE_TRACKER_PROCESS_FILTERS=$$process_filters} \
$${storage_dir:+OT_SYSTEM_RESOURCE_TRACKER_STORAGE_DIR=$$storage_dir} \
$${logging_level:+OT_SYSTEM_RESOURCE_TRACKER_LOGGING_LEVEL=$$logging_level} \
python3 -m performance_metrics.system_resource_tracker"
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,17 @@
import logging
import logging.config

LOGGER_NAME = "performance_metrics"


def log_init(level_value: int = logging.INFO) -> None:
"""Initialize logging for the system resource tracker."""
"""Initialize logging for performance-metrics."""
logging_config = {
"version": 1,
"disable_existing_loggers": False,
"formatters": {
"standard": {
"format": "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
"format": "%(asctime)s - %(module)s - %(funcName)s() - %(levelname)s - %(message)s"
},
},
"handlers": {
Expand Down
10 changes: 10 additions & 0 deletions performance-metrics/src/performance_metrics/_metrics_store.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,12 @@

import csv
import typing
import logging
from ._data_shapes import MetricsMetadata
from ._types import SupportsCSVStorage
from ._logging_config import LOGGER_NAME

logger = logging.getLogger(LOGGER_NAME)

T = typing.TypeVar("T", bound=SupportsCSVStorage)

Expand All @@ -26,6 +30,9 @@ def add_all(self, context_data: typing.Iterable[T]) -> None:

def setup(self) -> None:
"""Set up the data store."""
logger.info(
f"Setting up metrics store for {self.metadata.name} at {self.metadata.storage_dir}"
)
self.metadata.storage_dir.mkdir(parents=True, exist_ok=True)
self.metadata.data_file_location.touch(exist_ok=True)
self.metadata.headers_file_location.touch(exist_ok=True)
Expand All @@ -37,5 +44,8 @@ def store(self) -> None:
self._data.clear()
rows_to_write = [context_data.csv_row() for context_data in stored_data]
with open(self.metadata.data_file_location, "a") as storage_file:
logger.debug(
f"Writing {len(rows_to_write)} rows to {self.metadata.data_file_location}"
)
writer = csv.writer(storage_file, quoting=csv.QUOTE_ALL)
writer.writerows(rows_to_write)
1 change: 1 addition & 0 deletions performance-metrics/src/performance_metrics/_types.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
_UnderlyingFunctionParameters, _UnderlyingFunctionReturn
]


RobotContextState = typing.Literal[
"ANALYZING_PROTOCOL",
"GETTING_CACHED_ANALYSIS",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,30 +2,35 @@

import logging
import time
from pathlib import Path
from ._logging_config import log_init
from .._logging_config import log_init, LOGGER_NAME
from ._config import SystemResourceTrackerConfiguration
from ._system_resource_tracker import SystemResourceTracker

log_init()
logger = logging.getLogger(__name__)

def main() -> None:
"""Main function."""
config = SystemResourceTrackerConfiguration.from_env()

if __name__ == "__main__":
logger.info("Starting system resource tracker...")
tracker = SystemResourceTracker(
storage_dir=Path("/data/performance_metrics_data"),
process_filters=("/opt/opentrons*", "python3*"),
should_track=True,
refresh_interval=5,
)
log_init(logging._nameToLevel[config.logging_level])
logger = logging.getLogger(LOGGER_NAME)
logger.setLevel(config.logging_level)

logger.info(f"Running with the following configuration: {config}")

tracker = SystemResourceTracker(config)

logger.info("Starting system resource tracker...")
try:
while True:
tracker.get_and_store_system_data_snapshots()
time.sleep(tracker.refresh_interval)
time.sleep(tracker.config.refresh_interval)
except KeyboardInterrupt:
logger.info("Manually stopped.")
except Exception:
logger.error("Exception occurred: ", exc_info=True)
finally:
logger.info("System resource tracker is stopping.")


if __name__ == "__main__":
main()
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
import os
import typing
import dataclasses
from pathlib import Path, PurePosixPath
import logging
from .._logging_config import LOGGER_NAME


logger = logging.getLogger(LOGGER_NAME)

_ENV_VAR_PREFIX: typing.Final[str] = "OT_SYSTEM_RESOURCE_TRACKER"

ENABLED_ENV_VAR_NAME: typing.Final[str] = f"{_ENV_VAR_PREFIX}_ENABLED"
PROCESS_FILTERS_ENV_VAR_NAME: typing.Final[str] = f"{_ENV_VAR_PREFIX}_PROCESS_FILTERS"
REFRESH_INTERVAL_ENV_VAR_NAME: typing.Final[str] = f"{_ENV_VAR_PREFIX}_REFRESH_INTERVAL"
STORAGE_DIR_ENV_VAR_NAME: typing.Final[str] = f"{_ENV_VAR_PREFIX}_STORAGE_DIR"
LOGGING_LEVEL_ENV_VAR_NAME: typing.Final[str] = f"{_ENV_VAR_PREFIX}_LOGGING_LEVEL"


def default_filters() -> typing.Tuple[str, str]:
"""Get default filters."""
return ("/opt/opentrons*", "python3*")


class EnvironmentParseError(Exception):
"""An error occurred while parsing an environment variable."""

...


def _eval_enabled(value: str) -> bool:
"""Parse the enabled environment variable.
Returns:
bool: The parsed value or None if the environment variable is not set.
"""
if (coerced_enabled := value.lower()) not in ("true", "false"):
raise EnvironmentParseError(
f"{ENABLED_ENV_VAR_NAME} environment variable must be 'true' or 'false.' "
f"You specified: {value}"
)

enabled = coerced_enabled == "true"
logger.debug(f"Enabled: {enabled}")
return enabled


def _eval_process_filters(value: str) -> typing.Tuple[str, ...]:
"""Parse the process filters environment variable.
Returns:
typing.Tuple[str, ...]: The parsed value or None if the environment variable is not set.
"""
coerced_process_filters = tuple(
[filter.strip() for filter in value.split(",") if filter.strip() != ""]
)

if len(coerced_process_filters) == 0:
raise EnvironmentParseError(
f"{PROCESS_FILTERS_ENV_VAR_NAME} environment variable must be a comma-separated list of process names (globbing is supported) to monitor. "
f"You specified: {value}"
)

logger.debug(f"Process filters: {coerced_process_filters}")
return coerced_process_filters


def _eval_refresh_interval(value: str) -> float:
"""Parse the refresh interval environment variable.
Returns:
float | None: The parsed value or None if the environment variable is not set.
"""
try:
coerced_refresh_interval = float(value)
except ValueError:
raise EnvironmentParseError(
f"{REFRESH_INTERVAL_ENV_VAR_NAME} environment variable must be a number. "
f"You specified: {value}"
)

if coerced_refresh_interval < 1.0:
raise EnvironmentParseError(
f"{REFRESH_INTERVAL_ENV_VAR_NAME} environment variable must be greater than or equal to 1.0. "
f"You specified: {value}"
)

logger.debug(f"Refresh interval: {coerced_refresh_interval}")
return coerced_refresh_interval


def _eval_storage_dir(value: str) -> PurePosixPath:
"""Parse the storage directory environment variable.
Returns:
PurePosixPath: The parsed value or None if the environment variable is not set.
"""
coerced_storage_dir = PurePosixPath(value)

if not coerced_storage_dir.is_absolute():
raise EnvironmentParseError(
f"{STORAGE_DIR_ENV_VAR_NAME} environment variable must be an absolute path to a directory.\n"
f"You specified: {coerced_storage_dir}."
)

logger.debug(f"Storage dir: {coerced_storage_dir}")
return coerced_storage_dir


def _eval_logging_level(value: str) -> str:
"""Parse the logging level environment variable.
Returns:
str: The parsed value or None if the environment variable is not set.
"""
if value not in logging._nameToLevel:
raise EnvironmentParseError(
f"{LOGGING_LEVEL_ENV_VAR_NAME} environment variable must be one of {list(logging._nameToLevel.keys())}. "
f"You specified: {value}"
)

logger.debug(f"Logging level: {value}")
return value


@dataclasses.dataclass(frozen=True)
class SystemResourceTrackerConfiguration:
"""Environment variables for the system resource tracker."""

enabled: bool = False
process_filters: typing.Tuple[str, ...] = dataclasses.field(
default_factory=default_filters
)
refresh_interval: float = 10.0
storage_dir: Path = Path("/data/performance_metrics_data/")
logging_level: str = "INFO"

def __str__(self) -> str:
"""Get a string representation of the configuration."""
return (
"\n"
f"enabled={self.enabled}\n"
f"process_filters={self.process_filters}\n"
f"refresh_interval={self.refresh_interval}\n"
f"storage_dir={self.storage_dir}\n"
f"logging_level={self.logging_level}\n"
)

@classmethod
def from_env(cls) -> "SystemResourceTrackerConfiguration":
"""Create a SystemResourceTrackerConfiguration instance from environment variables.
Returns:
SystemResourceTrackerConfiguration: An instance of SystemResourceTrackerConfiguration.
"""
kwargs: typing.Dict[str, typing.Any] = {}

if (enabled := os.environ.get(ENABLED_ENV_VAR_NAME)) is not None:
kwargs["enabled"] = _eval_enabled(enabled)

if (
process_filters := os.environ.get(PROCESS_FILTERS_ENV_VAR_NAME)
) is not None:
kwargs["process_filters"] = _eval_process_filters(process_filters)

if (
refresh_interval := os.environ.get(REFRESH_INTERVAL_ENV_VAR_NAME)
) is not None:
kwargs["refresh_interval"] = _eval_refresh_interval(refresh_interval)

if (storage_dir := os.environ.get(STORAGE_DIR_ENV_VAR_NAME)) is not None:
kwargs["storage_dir"] = _eval_storage_dir(storage_dir)

if (logging_level := os.environ.get(LOGGING_LEVEL_ENV_VAR_NAME)) is not None:
kwargs["logging_level"] = _eval_logging_level(logging_level)

return cls(**kwargs)
Loading

0 comments on commit 3a750e7

Please sign in to comment.