Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot convert nested resource from protobuf to JSON #25

Closed
anthem-bmk opened this issue Nov 5, 2020 · 15 comments
Closed

Cannot convert nested resource from protobuf to JSON #25

anthem-bmk opened this issue Nov 5, 2020 · 15 comments

Comments

@anthem-bmk
Copy link

There are likely other places where this would also be the case, but I found it when using resource in Parameters.Parameter. Trying to pass a resource (e.g. Parameters) into resource throws an error, and if you do it directly mypy will give you an error as well. From what I can tell the proper way is to pack it into an Any protobuf, but then when printing to JSON it prints invalid JSON.

Here's some sample code that reproduces the issue:

from google.fhir.r4 import json_format
from proto.google.fhir.proto.r4.core.resources.bundle_and_contained_resource_pb2 import (  # noqa: E501
    Bundle,
    ContainedResource,
)
from google.protobuf.any_pb2 import Any as AnyPB
from proto.google.fhir.proto.r4.core.resources.parameters_pb2 import Parameters
from proto.google.fhir.proto.r4.core.datatypes_pb2 import Uri

from uuid import uuid4

raw_nested_resource = Parameters()

packed_nested_resource = AnyPB()
packed_nested_resource.Pack(raw_nested_resource)


def convert_nested_resource_to_json(nested_resource) -> None:
    entry = Bundle.Entry(
        full_url=Uri(value=f"urn:uuid:{uuid4()}"),
        resource=ContainedResource(
            parameters=Parameters(
                parameter=[Parameters.Parameter(resource=nested_resource)]
            )
        ),
    )

    print(json_format.pretty_print_fhir_to_json_string(entry))


print("Converting packed nested resource to json:")
convert_nested_resource_to_json(packed_nested_resource)

print("\nConverting raw nested resource to json:")
convert_nested_resource_to_json(raw_nested_resource)

and the output:

$ python3 test_fhir.py
Converting packed nested resource to json:
{
  "fullUrl": "urn:uuid:eca063c6-48da-4619-9ae6-89bde23b689a",
  "resource": {
    "resourceType": "Parameters",
    "parameter": [
      {
        "resource":
      }
    ]
  }
}

Converting raw nested resource to json:
Traceback (most recent call last):
  File "test_fhir.py", line 35, in <module>
    convert_nested_resource_to_json(raw_nested_resource)
  File "test_fhir.py", line 23, in convert_nested_resource_to_json
    parameter=[Parameters.Parameter(resource=nested_resource)]
TypeError: Parameter to MergeFrom() must be instance of same class: expected google.protobuf.Any got google.fhir.r4.core.Parameters.

If you take out the variable and hardcode nested_resource to Parameters(), mypy will also give you an error:

Argument "resource" to "Parameter" has incompatible type "Parameters"; expected "Optional[google.protobuf.any_pb2.Any]"

@anthem-bmk
Copy link
Author

Doing a little more research, I printed the (non FHIR compliant) JSON representation of the protobuf that worked with Any, and got this:

PB JSON:

full_url {
  value: "urn:uuid:c5112836-2038-4ae6-a9b5-89ce3cdff627"
}
resource {
  parameters {
    parameter {
      name {
        value: "nested-resource"
      }
      resource {
        type_url: "type.googleapis.com/google.fhir.r4.core.Parameters"
      }
    }
  }
}

FHIR JSON:
{
  "fullUrl": "urn:uuid:c5112836-2038-4ae6-a9b5-89ce3cdff627",
  "resource": {
    "resourceType": "Parameters",
    "parameter": [
      {
        "name": "nested-resource",
        "resource":
      }
    ]
  }
}

I was able to go directly from FHIR JSON to protobuf and back just fine, and the resource type looks like this instead:

PB JSON:

full_url {
  value: "urn:uuid:a22007a5-3fe1-4fc8-b16c-19bd3ebac92e"
}
resource {
  parameters {
    parameter {
      name {
        value: "nested-resource"
      }
      resource {
        type_url: "type.googleapis.com/google.fhir.r4.core.ContainedResource"
        value: "\262\006\000"
      }
    }
  }
}

FHIR JSON:
{
  "fullUrl": "urn:uuid:a22007a5-3fe1-4fc8-b16c-19bd3ebac92e",
  "resource": {
    "resourceType": "Parameters",
    "parameter": [
      {
        "name": "nested-resource",
        "resource": {
          "resourceType": "Parameters",

        }
      }
    ]
  }
}

code snippet:

def convert_from_json_to_pb_and_back():
    entry_dict = {
        "fullUrl": f"urn:uuid:{uuid4()}",
        "resource": {
            "resourceType": "Parameters",
            "parameter": [
                {"name": "nested-resource", "resource": {"resourceType": "Parameters"}}
            ],
        },
    }

    entry_json = json.dumps(entry_dict)
    entry_pb = json_format.json_fhir_string_to_proto(entry_json, Bundle.Entry)

    print(entry_pb)
    print(json_format.pretty_print_fhir_to_json_string(entry_pb))

So even though the original protobuf seemed to work OK, it looks like it should have been a ContainedResource packed into the Any. I'm not sure why it was Any though, maybe it should be ContainedResource?

@anthem-bmk
Copy link
Author

Confirmed, packing ContainedResource worked:

raw_nested_resource = Parameters()

packed_nested_resource = AnyPB()
packed_nested_resource.Pack(ContainedResource(parameters=raw_nested_resource))
Converting packed nested resource to json:
full_url {
  value: "urn:uuid:ce0b4f84-c8bf-499f-8c0e-3e423db73169"
}
resource {
  parameters {
    parameter {
      name {
        value: "nested-resource"
      }
      resource {
        type_url: "type.googleapis.com/google.fhir.r4.core.ContainedResource"
        value: "\262\006\000"
      }
    }
  }
}

{
  "fullUrl": "urn:uuid:ce0b4f84-c8bf-499f-8c0e-3e423db73169",
  "resource": {
    "resourceType": "Parameters",
    "parameter": [
      {
        "name": "nested-resource",
        "resource": {
          "resourceType": "Parameters",

        }
      }
    ]
  }
}

This might still be considered a bug though with the protobuf types (e.g. Parameters.Parameter.resource should be ContainedResource, not Any). Let me know what you think

@anthem-bmk
Copy link
Author

Ah, actually it's still invalid JSON - notice the trailing comma in "resourceType": "Parameters",, both in the pb->json as well as the json->pb->json conversions.

This could be because the Parameters type is empty, but it's unclear to me if that's valid or not - Parameters only requires a list of parameter, which can be empty, but is a [] still required or can it be null/undefined/not included? It's also interesting that the fhir->pb->fhir passed validation if the [] is truly required.

https://hl7.org/fhir/parameters.html

@anthem-bmk
Copy link
Author

And with some further testing, using Parameters(parameter=[]) or { "resourceType": "Parameters", "parameter": [] } produces the same invalid JSON, without a parameter field.

@anthem-bmk
Copy link
Author

I believe this should fix it:

diff --git a/py/google/fhir/json_format/_json_printer.py b/py/google/fhir/json_format/_json_printer.py
index ea224565..19f1e617 100644
--- a/py/google/fhir/json_format/_json_printer.py
+++ b/py/google/fhir/json_format/_json_printer.py
@@ -364,14 +364,18 @@ class JsonPrinter:
     """Prints the representation of message."""
     self.generator.open_json_object()
 
+    set_fields = msg.ListFields()
+
     # Add the resource type preamble if necessary
     if (annotation_utils.is_resource(msg) and
         self.json_format == _FhirJsonFormat.PURE):
-      self.generator.add_field('resourceType', f'"{msg.DESCRIPTOR.name}",')
-      self.generator.add_newline()
+      self.generator.add_field('resourceType', f'"{msg.DESCRIPTOR.name}"')
+
+      if len(set_fields):
+        self.generator.push(',')
+        self.generator.add_newline()
 
     # print all fields
-    set_fields = msg.ListFields()
     for (i, (set_field, value)) in enumerate(set_fields):
       if (annotation_utils.is_choice_type_field(set_field) and
           self.json_format == _FhirJsonFormat.PURE):

I hope you don't mind I didn't make a PR because wrapping my head around the test suite and getting things to pass will take a bit more time, but if there's no movement on this for a while I'll look into doing a complete PR, maybe next week.

@nickgeorge
Copy link
Collaborator

nickgeorge commented Nov 5, 2020

Awesome, great find and great fix. Unfortunately we're not really set up to accept PRs quite yet, so we'll handle on our end, but I really appreciate you digging in and making our job easier :)

@anthem-bmk
Copy link
Author

Sounds good. And actually that diff should just fix the JSON parsing, but I still wonder if resource should be of type ContainedResource rather than Any... that might be a breaking change though.

@nickgeorge
Copy link
Collaborator

So, short story:
In STU3, all Resource fields were typed as ContainedResources. But, since every resource has a contained field of type Resource, and protos don't have C++-like header files, this means that every resource depends on every other resource, there's a massive circular dependency, and the end result is all proto compilation is done in serial rather than in parallel, in resources.proto. This worked out OK in STU3 but was slow. In R4, it was untenable. So, we made the decision to inline ContainedResource fields as "Any" so that each individual resource can be compiled independent of other resources, and in parallel. So in general, anytime you see an Any in R4, it's a ContainedResource.

@anthem-bmk
Copy link
Author

OK, that's interesting. So it sounds like, aside from the bug I found a fix for, there's still a bug in converting to JSON (If you give it an Any it doesn't know how to convert to a contained resource and just ends up being empty), and another one converting to proto (it converts the contained resource to ContainedResource instead of Any).

@nickgeorge
Copy link
Collaborator

FYI the fix to this change has now been pushed to HEAD - we'll cut a release this afternoon that contains this bugfix.

@nickgeorge
Copy link
Collaborator

RE: If you give it an Any it doesn't know how to convert to a contained resource and just ends up being empty

This is actually working as intended. The usage of Any here is not to allow an arbitrary resource, there's still only ONE correct proto type to use here, ContainedResource. The only reason we type it as Any rather than ContainedResource is to avoid the dependency loop I described above. If we allow that interdependency via ContainedResource, it took like 15 minutes to build the resources :(

@anthem-bmk
Copy link
Author

anthem-bmk commented Nov 11, 2020

Ah, that's awesome, thanks for fixing that! OK, thanks for the clarification on Any vs ContainedResource - that makes sense.

@anthem-bmk
Copy link
Author

@nickgeorge I was looking into updating to get this bug fix, but it looks like 0.6.2 is still the latest version, which still had that bug in it. Was there something holding back the release?

@nickgeorge
Copy link
Collaborator

Uh sorry, we wanted to finalize some testing procedures for the pypi distribution so got delayed a bit. Shld be today if things go according to plan...

@anthem-bmk
Copy link
Author

Thanks, no worries. I'll keep an eye out

Cam2337 pushed a commit that referenced this issue Jun 21, 2021
…ith incorrect trailing comma. See #25

PiperOrigin-RevId: 380668911
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants