Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ExpandValueSet is called aggressively #840

Open
EvanMachusak opened this issue Nov 8, 2022 · 3 comments
Open

ExpandValueSet is called aggressively #840

EvanMachusak opened this issue Nov 8, 2022 · 3 comments
Labels
cql The issue relates to or requires a change or clarification on the CQL specification enhancement

Comments

@EvanMachusak
Copy link

EvanMachusak commented Nov 8, 2022

Consider this CQL:

define function "code in value set"(code Code, codes List<Code>):
    code in codes

When this function is called with a value set, e.g.:

valueset "Value set": 'url'
define "call it":
    "code in value set"(Errors."Unit mismatch", "Value set")

The operand in the ELM to the "code in value set" call is an ExpandValueSet.

The value set does not in fact need to be expanded, because this is valid:

define function "example"(code Code):
    code in "Value set"

The expression of "example" is an InValueSet.

Eagerly expanding the value set ahead of time introduces a performance penalty for this pattern.

It's possible to declare the parameter as a System.ValueSet which will avoid the ExpandValueSet operand, but because of the automatic expansion of value sets with this usage, these two signatures become ambiguous:

define "call it":
    "overload"(Errors."Unit mismatch", "Value set")

define function "overload"(code Code, codes List<Code>):
    code in codes

define function "overload"(code Code, valueSet System.ValueSet):
    code in valueSet

CQL-to-ELM picks the second overload in this case. That's probably correct, but unintuitive.

We use the pattern of passing value sets to functions taking a List of Code often. It's a more flexible signature since it lets us pass either a value set or a manually constructed list of codes.

This is a significant performance penalty when run on the CQF ruler as prematurely expanding value sets which can be large (50k+ rows) results in linear checking of a list of codes instead of an optimized hash lookup of valueset membership.

@JPercival JPercival added enhancement cql The issue relates to or requires a change or clarification on the CQL specification labels Jan 21, 2024
@brynrhodes
Copy link
Member

brynrhodes commented Feb 7, 2024

I think it's important to call out here that code in List<Code> is a different operator than code in ValueSet, and that there's a really important distinction there. I wonder if it would make more sense to always use the ValueSet overload, and to support a way to convert a List<Code> to a ValueSet, so something like:

define fluent function toValueSet(List<Code>): ValueSet

Then you can 1) always use the optimization of lazy expansion, and 2) ensure that you're always using terminology semantics for the in operator.

@brynrhodes
Copy link
Member

Having said that, there is, by design, no way to construct a ValueSet with the codes that it contains. So let me ask the question, why not have first-class value set definitions for all the lists of codes you want to support?

@EvanMachusak
Copy link
Author

EvanMachusak commented Mar 8, 2024

Because we can implicitly convert a ValueSetRef to a List<Code>, it makes List<Code> the more attractive choice for all APIs because we can either do:

define function Foo(codes List<Code>): ...`

define Bar: Foo("My ValueSet")
define Baz: Foo({ "DRC1", "DRC2" })

The issue I raised here is related to the fact that ExpandValueSet is an expensive operation. Because there's an implicit conversion, the programmer isn't aware it's happening.

In our SDK, we implemented an implicit conversion between ValueSet to List<Code> in such a way that we preserve the fact that the source object is still a ValueSet, e.g. you can think of it like our ValueSet representation implements IList<Code> so we can use it anywhere that a CQL List<Code> (represented in .NET as IList<Code>) is expected. We actually do this with composition and a facade, but it amounts to the same result.

This lets us implement the In operator for CQL's List<Code> more efficiently if that List<Code> is actually a ValueSet.

We currently don't implement ExpandValueSet the way I think it's intended to be (e.g., you reach out to the terminology server and ask for an expansion). We don't support terminology servers in general at the moment, because HEDIS measures would never allow you to expand your own value sets. We provide the explicit expansion that you must use, and you are not allowed to use any other codes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cql The issue relates to or requires a change or clarification on the CQL specification enhancement
Projects
None yet
Development

No branches or pull requests

3 participants