Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Observation Redesign #148

Open
chanskw opened this issue Nov 1, 2017 · 7 comments
Open

Observation Redesign #148

chanskw opened this issue Nov 1, 2017 · 7 comments

Comments

@chanskw
Copy link
Collaborator

chanskw commented Nov 1, 2017

I'd like to start a discussion on how to redesign the Observation Type.

The Observation Type is our universal data type and is currently defined as follows:

{
  "patientId" : string,
  "device" : {
    "id" : string,
    "locationId" : string
  },
  "readingSource" : {
    "id" : string,
    "sourceType" : string,
    "deviceId" : string
  },
  "reading" : {
    "ts" : numeric,
    "readingType" : {
      "system" : string,
      "code" : string
    },
    "value" : numeric,
    "uom" : string
  }
}

The challenge is that the Observation type is too numeric centric. In some cases, when we ingest data from clinical notes, the values are non-numeric. There is currently no way to really represent those values. If we look into the FHIR specification, Observation can have any of the following value types:

 "component" : [{ // Component results
    "code" : { CodeableConcept }, // R!  Type of component observation (code / type)
    // value[x]: Actual component result. One of these 10:
    "valueQuantity" : { Quantity },
    "valueCodeableConcept" : { CodeableConcept },
    "valueString" : "<string>",
    "valueRange" : { Range },
    "valueRatio" : { Ratio },
    "valueSampledData" : { SampledData },
    "valueAttachment" : { Attachment },
    "valueTime" : "<time>",
    "valueDateTime" : "<dateTime>",
    "valuePeriod" : { Period },
    "dataAbsentReason" : { CodeableConcept }, // C? Why the component result is missing
    "interpretation" : { CodeableConcept }, // High, low, normal, etc.
    "referenceRange" : [{ Content as for Observation.referenceRange }] // Provides guide for interpretation of component result

Question is how do we represent all these different value types in Streams? Should we have one data type per value type? Should we extend Observation to handle all the different value types? Should we add valueString to Observation, and for anything that cannot be represented with a numeric value, we put it in a String.

The simplest thing to do is to change Observation to the following:

{
  "patientId" : string,
  "device" : {
    "id" : string,
    "locationId" : string
  },
  "readingSource" : {
    "id" : string,
    "sourceType" : string,
    "deviceId" : string
  },
  "reading" : {
    "ts" : numeric,
    "readingType" : {
      "system" : string,
      "code" : string,
      "valueType": string,        // represent the value type of the reading
    },
    "value" : numeric,
    "valueString" : string,      // anything that cannot be represented as a numeric get stored here
    "uom" : string
  }
}

Or should we be more elaborate and simply duplicate the FHIR observation specification?

@ddebrunner
Copy link
Member

Can you expand on what you mean by this:

There is currently no way to really represent those values.

JSON allows any values, so is it solely when you are converting to an SPL schema that causes problems?

@chanskw
Copy link
Collaborator Author

chanskw commented Nov 1, 2017

Yes... this is mainly a problem in SPL... we define the same time in Java, SPL and Python.

@chanskw
Copy link
Collaborator Author

chanskw commented Nov 1, 2017

  • how would someone interpret this value... when all the data is coming in from the same data stream. i.e. Some can be numeric, while others may be not, but they can come in from the same topic.

@ddebrunner
Copy link
Member

Doesn't it depend on context, the value of other fields in the object if a field is a number or text?

@YuanChiChang
Copy link

If one treats JSON (or XML, if you belong to that camp) as nothing but a hierarchy of name-value pairs, from Streams' internal representation point of view, why don't we just model them as such? For example, a sensor data normally converted to int32 device_id, rstring notes, int32 heart_beat, int32 temperature can be converted to intValues[enum.device_id], intValues[enum.heart_beat], intValues[enum.temperature], stringValues[enum.notes]. intValues, stringValues are declared with map<enum, int32> and map<enum, rstring>.

This dumbed down data model, much like JSON, XML, gives the maximal flexibility and trades off poor runtime efficiency.

@chanskw
Copy link
Collaborator Author

chanskw commented Nov 17, 2017

To prototype with NLP support, added valueStr attribute in Reading to allow for representation of non-numeric observation.

@YuanChiChang
Copy link

Part of NLP support is also about the flexibility of carrying tags throughout the processing pipeline. Tags may be further organized as flat lists or lexical trees. The key is however a flexible data payload structure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants