Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IHE Gateway V2 Monitoring #2200

Draft
wants to merge 3 commits into
base: develop
Choose a base branch
from
Draft

IHE Gateway V2 Monitoring #2200

wants to merge 3 commits into from

Conversation

jonahkaye
Copy link
Member

@jonahkaye jonahkaye commented Jun 4, 2024

Ticket: #1667

TODO

  • also add to report analysis of DQs and DRs

Description

  • scheduled lambda for evaluating ihe v2 success rates.
  • removing noisy captures/sentry alerts + adding logging
  • introduces infra pattern of using read only endpoint
  • added an index on created_at for patient_discovery_result, document_query_result, document_retrieval_result

Testing

  • Local
    • ran reporting script with test utis script
    • performed sql migration and ran a speed test. 140x speed improvement in local env with index on the querries in this PR
  • Staging
    • branch to staging - logs of it working on a 10 min increment schedule
    • test on staging

Release Plan

  • Merge this

@jonahkaye jonahkaye marked this pull request as ready for review June 4, 2024 17:17
@jonahkaye jonahkaye changed the title 1667 report monitoring IHE Gateway V2 Monitoring Jun 4, 2024
…ces added on created_at

Refs: #1667
Signed-off-by: Jonah Kaye <[email protected]>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename this file and folder monitor

Copy link
Member

@leite08 leite08 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should avoid adding more code that uses Sentry on lambdas. We have this ticket to move to pg so we're lighter when hitting the DB from lambdas: https://github.com/metriport/metriport-internal/issues/1561

WRT to the reports themselves, I'd try to store more raw results in a .csv and send/log a link to it. That's because by putting it on a Spreadsheet we have more flexibility to work the data. Example: we're sending percentage of failure/success, but we might want to see the amount over time as well.

But NABD, just trying to avoid back-and-forth with code updates, PRs, releases, just to change how we see the data.

packages/infra/lib/api-stack.ts Show resolved Hide resolved
dbCredsSecret.grantRead(scheduledReportLambda);

const rule = new events.Rule(this, "ScheduledReportRule", {
schedule: events.Schedule.rate(Duration.hours(12)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this run every 12h from the time its deployed? If so, wouldn't it be better to have specific times on the day so we have more predictable Ops?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to run at 12am and 2pm UTC (7am PST and 5pm PST

Copy link
Member Author

@jonahkaye jonahkaye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made requested changes, especially switching from sequelize to pg.
WRT to csv, I think I am gonna hold off on doing that for now.
The format this is in works for me and I dont see the value in overcomplicating things with csvs since this is really just for us to track things and isnt for outside reporting or analytics.

Comment on lines +79 to +82
SELECT request_id
FROM ${tableName}
WHERE created_at >= $1
ORDER BY created_at DESC
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be referencing the TableColumns enum, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the Query, wouldn't be better to random on the request IDs filtered by created at?

`;
const values = [since, limit];
const { rows } = await pool.query(query, values);
return rows.map(row => row.request_id);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto: row is probably any, right? We could use this instead, to at least make sure we're accessing the correct property:

... row[TableColumns.requestId]);

Comment on lines +63 to +64
request_id = "request_id",
created_at = "created_at",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NABD, but the prop for the column name could follow camelCase

Comment on lines +57 to +59
PatientDiscoveryResult = "patient_discovery_result",
DocumentQueryResult = "document_query_result",
DocumentRetrievalResult = "document_retrieval_result",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: props/enum items should be camelCase; PascalCase is for Classes/Types only.

Copy link
Member

@leite08 leite08 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont see the value in overcomplicating things with csvs

I actually think this code is more complicated than appending a line at the end of 3-6 files. This PR calculates percentages and summaries, while we could just add lines w/ raw data to files on S3.

Second, this assumes we're not changing how we assess the result/success/usage of the IHE GW. As we all know, things tend to change after we ship them. 😄

Lastly, how to you plan to look at data over time? Looking at logs and calculating in your head, or copy/pasting and parsing the results? If we have a CSV where we add a line for each new run, we already have data over time there.

Its not a blocker for me, but my experience tells me that we'd be better served with a handful of CSVs instead of logs on CloudWatch.

Comment on lines +156 to +164
result.patientMatch === true ||
result.operationOutcome?.issue?.some(issue => issue.code === "not-found")
);

const failures = results.filter(
(result: OutboundPatientDiscoveryResp) =>
!(
result.patientMatch === true ||
result.operationOutcome?.issue?.some(issue => issue.code === "not-found")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're replicating the condition in verbatim, might as well write a small function isSuccess or something that we can use in both places, to avoid updating one and forgetting the other one.

);

const errorOids = failures.map((error: OutboundPatientDiscoveryResp) => ({
oid: (error.gateway as XCPDGateway).oid,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to cast here, right?

);

const errorOids = failures.map((error: OutboundDocumentQueryResp) => ({
oid: (error.gateway as XCAGateway).homeCommunityId,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto: no need to cast

);

const errorOids = failures.map((error: OutboundDocumentRetrievalResp) => ({
oid: (error.gateway as XCAGateway).homeCommunityId,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto: cast

Comment on lines +57 to 59
const errorString = errorToString(error);
log(errorString);
capture.error("Failed to send PD response to Internal Carequality Endpoint", {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also send the message to the log? Just the error w/o context is not as helpful.

We have a "pattern" for this:

      const msg = "Failed to send PD response to Internal Carequality Endpoint";
      log(`${msg} - ${errorToString(error)}`);
      capture.error(msg, {

]);

fs.writeFileSync(
"./runs/carequality-report/patient-discovery-report.json",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NABD, but carequality-report is tied to a specific HIE, but this is IHE GW, which in theory could be other HIEs in the mid-term.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, no date on the file name?

Comment on lines +14 to +16
async function main() {
const sqlDBCreds = getEnvVarOrFail("DB_CREDS");
const readReplicaEndpoint = getEnvVarOrFail("DB_READ_REPLICA_ENDPOINT");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing initRunsFolder() - see other scripts that use the "runs" folder.

const dbReadOnlyEndpoint = getEnvVarOrFail("DB_READ_REPLICA_ENDPOINT");
const region = getEnvVarOrFail("AWS_REGION");

capture.setExtra({ lambdaName: "scheduled-report-lambda" });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an env var set by AWS for this, see other lambdas - example:

// Automatically set by AWS
const lambdaName = getEnvOrFail("AWS_LAMBDA_FUNCTION_NAME");
const region = getEnvOrFail("AWS_REGION");
// Set by us
...

Comment on lines +2 to +3
import { DbCreds, dbCredsSchema, dbReadReplicaEndpointSchema } from "./sequelize";
// Initialize the main DB pool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

supernit: new line

Comment on lines +6 to +12
const pool = new Pool({
user: parsedDbCreds.username,
host: parsedDbCreds.host,
database: parsedDbCreds.dbname,
password: parsedDbCreds.password,
port: parsedDbCreds.port,
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to have a const and set the pool size (property max). Even though the default 10 connections is fine for this use case, we're building something that might be used afterwards, so being explicit about this, with a link to the docs, is helpful - I had to search it up online to make sure there was more than 1 connection by default.

@jonahkaye jonahkaye marked this pull request as draft July 3, 2024 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants