Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cu): introduce an EventVacuum that parses well-formatted event logs for transport to other services #1017

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

arielmelendez
Copy link

Motivation:

Processes are essentially applications, and applications need various forms of observability tools - where "Observability" can be defined as "the ability to answer novel, open-ended questions about a system". The AO team is continuing to develop solutions for monitoring generic performance metrics for Processes, but a gap currently exists in the ability to measure richer contextual information from the internals of a Process.

Message handling in Processes is similar in many ways to handling HTTP requests on a server. A great way to get observability over a system like that is to use wide, "Structured Events" that are rich with relevant information about the inner workings of the process that you wouldn't be able to get from Process inputs (e.g. searchable and aggregatable from GQL data) or generic performance metrics. For more background reading on this approach and its benefits see:
https://charity.wtf/2022/08/15/live-your-best-life-with-structured-events/
and
https://docs.honeycomb.io/get-started/basics/observability/concepts/events-metrics-logs/

The challenge with extracting this type of information from Processes is that they run in a sandbox environment without access to a network or file system that can be connected to the outside world. Therefore, existing intra-AO-Process solutions such as AO subscribables don't quite fit this model AND would require gas for the messaging necessary to facilitate it. However, AO CU's have direct access to Process memory and outputs, including Process log streams. As such, log streams can be used as a transport mechanism to shuttle observability data out of AO and to the outside world.

Technical Contributions

This pull request introduces:

  • An EventVacuum class that:
    • is opt IN via ENV var settings
    • parses newline-delimited json (ndjson) events out of Process log streams that contain a _e: 1 key/value flag and sends them off to a transport layer
  • A set of event transport implementations:
    • CompositeTransport: takes a list of transports and fans the events out to each of them
    • ConsoleTransport: print events out to the CU's logger
    • HoneycombTransport:
      • sends structured events to Honeycomb for analysis
      • provides a sqlite database integration to prevent from sending duplicate events to Honeycomb in subsequent runs of the CU

Results From Preliminary Testing

I created a utility module to produce and print compliant ndsjon events and instrumented a new AO token via the token.lua blueprint with it. You can find the code for those here: permaweb/aos#350
Preliminary test results using the Honeycomb Transport have been great. Here are some examples of what you can do with the integration:

List the errors that have been raised during processing, grouping by nonce, sender, Action, and error reason:
image

Aggregate the total value of the token that has been transferred in the last 48 hours:
image

Surface internal analytics for how how many times each specific handler has been successfully triggered on the Process:
image

... and that's just the start of what's possible.

I strongly believe that when other builders see that this kind of open-ended introspection is possible with these kinds of tools, they will want to give it a try! I'm also open to discussing other means of achieving this form of event transportation in AO.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant