Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix minimum_id algorithm #65

Merged
merged 23 commits into from
Nov 28, 2022
Merged

Fix minimum_id algorithm #65

merged 23 commits into from
Nov 28, 2022

Conversation

jhlee-mitre
Copy link
Contributor

Summary

New algorithm to examine if a FHIR resource is "minimum" of the other FHIR resource

New behavior

Given two FHIR resource instances A and B, A is considered a minimum of B if and only if:
B has all the elements of A, exactly the same element names and values (codes). No consideration of range or pre/post-coordination.
Hierarchies and levels of the elements from A must be the same in B. Otherwise it is considered as different element.
Order doesn’t matter.
Duplication doesn’t matter. If B has two duplicate sets (which isn’t realistically the case of FHIR though), the algorithm stops when it find the first set.

Code changes

Added minimum_id.rb under lib. To be included into assertion.rb after testing.

Testing guidance

ruby lib/testscript_engine/minimum_id.rb

@karlnaden
Copy link
Contributor

I added example files for test cases from the ticket I raised: #62. These appear to pass, which is a good sign.

Granted that I'm reading the code too late at night to develop a deep understanding, I'm not fully understand the implementation approach you've taken here. I'd be interested in a narrative description of what steps you intend the algorithm to take because I think that would help me understand it better.

Initial observations:

  • you seem to take this approach of doing a depth-first traversal on the specification instance (min_obj) and each time you find an individual value, then you do a depth-first search on the actual instance being checked (tar_obj). This feels inefficient, compared with say recursing through both instances together. Additionally, the way you've implemented, you don't take into account values in fields at higher levels that provide meaning to lower levels. I added two examples that try to elucidate the problems: patient_two_names_min.json indicates a "official" name (Joseph) and a "usual" name (Joey) but patient_two_names_jumbled.json lists Joey as the "official" name and vice-versa - this should fail because even if you find "Joey" at the right level, it isn't correct if it is found as a part of the official name entry (note based on output, the logic may not even be comparing list entries that are primitive values; a capability statement example mCODE_CapabilityStatement_exampleServer_shouldFail.json I moved the "code:in" SearchParameter into the Observation resource entry instead of the Condition resource entry - this should fail for the same reason as above, but code:in is still found and validated even though it is not in the right place.
  • You take what I would call a state-full or imperative approach to the implementation, where you have a flag variable that gets set. I think we will be better served here by a stateless / functional approach where the functions return the answer to a minimumId check on a specific part of the tree and then for evaluating e.g., a hash / json object, you check the minimumId property on all of the keys in the min_obj and if any return false, the whole function returns false. That will I think help make the code easier to understand and easier to test.

@jhlee-mitre
Copy link
Contributor Author

I added example files for test cases from the ticket I raised: #62. These appear to pass, which is a good sign.

Granted that I'm reading the code too late at night to develop a deep understanding, I'm not fully understand the implementation approach you've taken here. I'd be interested in a narrative description of what steps you intend the algorithm to take because I think that would help me understand it better.

Initial observations:

  • you seem to take this approach of doing a depth-first traversal on the specification instance (min_obj) and each time you find an individual value, then you do a depth-first search on the actual instance being checked (tar_obj). This feels inefficient, compared with say recursing through both instances together. Additionally, the way you've implemented, you don't take into account values in fields at higher levels that provide meaning to lower levels. I added two examples that try to elucidate the problems: patient_two_names_min.json indicates a "official" name (Joseph) and a "usual" name (Joey) but patient_two_names_jumbled.json lists Joey as the "official" name and vice-versa - this should fail because even if you find "Joey" at the right level, it isn't correct if it is found as a part of the official name entry (note based on output, the logic may not even be comparing list entries that are primitive values; a capability statement example mCODE_CapabilityStatement_exampleServer_shouldFail.json I moved the "code:in" SearchParameter into the Observation resource entry instead of the Condition resource entry - this should fail for the same reason as above, but code:in is still found and validated even though it is not in the right place.
  • You take what I would call a state-full or imperative approach to the implementation, where you have a flag variable that gets set. I think we will be better served here by a stateless / functional approach where the functions return the answer to a minimumId check on a specific part of the tree and then for evaluating e.g., a hash / json object, you check the minimumId property on all of the keys in the min_obj and if any return false, the whole function returns false. That will I think help make the code easier to understand and easier to test.

Great comments! It will need a few iterations to be quality algorithm, let me go back and try second round of polishing the algorithm/code.

@jhlee-mitre
Copy link
Contributor Author

Short description about the new commit:

What the algorithm does:

  1. For given two sets of FHIR resources, it is flatten with path. For example
{
  "resourceType": "Patient",
  "identifier": [
    {
      "use": "usual",
    }
  ]
}

may be converted to a Hash:

{ 
"resourceType" => "Patient",
"identifier.0.use" -> "usual"
}

For now this algorithm is "order-unaware". If is easy to change it order-aware. Later we can make it configurable.

  1. exam_minimum() compares two flatten Hashes to evaluate whether one is minimum of the other.

@jhlee-mitre jhlee-mitre changed the title WIP Fix algorithm to examine minimum_id (WIP) Fix algorithm to examine minimum_id Nov 16, 2022
@jhlee-mitre jhlee-mitre self-assigned this Nov 16, 2022
@jhlee-mitre jhlee-mitre changed the title (WIP) Fix algorithm to examine minimum_id (WIP) Fix minimum_id algorithm Nov 16, 2022
@jhlee-mitre jhlee-mitre changed the title (WIP) Fix minimum_id algorithm Fix minimum_id algorithm Nov 21, 2022
@jhlee-mitre jhlee-mitre marked this pull request as ready for review November 21, 2022 18:53
@jhlee-mitre
Copy link
Contributor Author

Integrated the recursive minimum_id check into the engine. The previous one (deep_merge) was replaced.

@jhlee-mitre
Copy link
Contributor Author

It doesn't contain a TestScript to test, yet.

@karlnaden
Copy link
Contributor

Added some TestScripts for minimumId that uses just fixtures including

  • a reflexive TestScript that should succeed (a minimumId a)
  • a symmetric TestScript that should fail (a minimumId b, but not b minimumId a)

These didn't quite work as expected and the logic wasn't giving any hints as to why the unexpected failures, so I started to add some basic error details and ended up fixing a few things while doing that:

  1. Needed to convert FHIR object representations to hashes before passing into the check_minimum_id function
  2. Both check_minimum_id_hash and check_minimum_id_array were returning the result for the last key/index checked, rather than keeping track of whether any failures had occurred
  3. Both check_minimum_id_hash and check_minimum_id_array were returning false when the check succeeded instead of true to be consistent with the top-level check_minimum_id function

Finally, note that rather than try to use the template Jack created for the compare results, I created a minimumId-specific return.

Please take a look and let me know what you think.

A last thing I'd like to do somewhere as a part of this commit is provide brief documentation of the logic behind our implementation of the minimumId assertion. It is possible that this should live in a FHIR implementation guide eventually, but I think that would be something we tackle in the new year, so I'd probably just put it in the README for simplicity.

@jhlee-mitre
Copy link
Contributor Author

I tested on my end and it looks good. Just added one more test for fun.

  1. Minor thing: if an error occurs, it's always when a resource in the minimum fixture is missing in the actual fixture. So it's about difference, but one direction. Would this be even clear wording for error msg:

Actual resource differed from content in fixture '#{assert.minimumId}'
-->
Actual resource from content in fixture '#{assert.minimumId}' wasn't found in the response '#{assert.sourceId}'

  1. For the templates for error message handling, we may consider refactoring it to accommodate more flexibility. I agree with handling them individually meanwhile.
  2. It sounds good to add the documentation into README. If it's overwhelming, we may use Wiki for more details (e.g. algorithm) and keep link from README.

@karlnaden
Copy link
Contributor

Minor thing: if an error occurs, it's always when a resource in the minimum fixture is missing in the actual fixture. So it's about difference, but one direction.

Tried some new wording, see what you think. For each location in the minimumId fixture, there are two failure cases:

  1. Corresponding data found at that location in the actual resource, but it is different
  2. No data found at that location in the actual resource
    I collapse that to "differences". While I agree that could potentially be misleading, I think realistically, the feedback we're giving for minimumId right now likely still needs to be improved for realistic use. However, I don't think it is worth spending further time on it now.

For the templates for error message handling, we may consider refactoring it to accommodate more flexibility. I agree with handling them individually meanwhile.

Agreed - could be done better, but not a priority now

It sounds good to add the documentation into README. If it's overwhelming, we may use Wiki for more details (e.g. algorithm) and keep link from README.

Agreed. Recommend we avoid setting up a new thing for documentation at this point

jhlee-mitre and others added 2 commits November 26, 2022 11:26
Polished wording; Added minimum_id logic explanation.
Copy link
Contributor Author

@jhlee-mitre jhlee-mitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found "TestScript Engine" and "About the project" were overlapping and merged. Thanks for adding another polish. It looks great! If you think the PR is good, approve and I will merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants