Observation Redesign #148

chanskw · 2017-11-01T20:40:06Z

I'd like to start a discussion on how to redesign the Observation Type.

The Observation Type is our universal data type and is currently defined as follows:

{
  "patientId" : string,
  "device" : {
    "id" : string,
    "locationId" : string
  },
  "readingSource" : {
    "id" : string,
    "sourceType" : string,
    "deviceId" : string
  },
  "reading" : {
    "ts" : numeric,
    "readingType" : {
      "system" : string,
      "code" : string
    },
    "value" : numeric,
    "uom" : string
  }
}

The challenge is that the Observation type is too numeric centric. In some cases, when we ingest data from clinical notes, the values are non-numeric. There is currently no way to really represent those values. If we look into the FHIR specification, Observation can have any of the following value types:

 "component" : [{ // Component results
    "code" : { CodeableConcept }, // R!  Type of component observation (code / type)
    // value[x]: Actual component result. One of these 10:
    "valueQuantity" : { Quantity },
    "valueCodeableConcept" : { CodeableConcept },
    "valueString" : "<string>",
    "valueRange" : { Range },
    "valueRatio" : { Ratio },
    "valueSampledData" : { SampledData },
    "valueAttachment" : { Attachment },
    "valueTime" : "<time>",
    "valueDateTime" : "<dateTime>",
    "valuePeriod" : { Period },
    "dataAbsentReason" : { CodeableConcept }, // C? Why the component result is missing
    "interpretation" : { CodeableConcept }, // High, low, normal, etc.
    "referenceRange" : [{ Content as for Observation.referenceRange }] // Provides guide for interpretation of component result

Question is how do we represent all these different value types in Streams? Should we have one data type per value type? Should we extend Observation to handle all the different value types? Should we add valueString to Observation, and for anything that cannot be represented with a numeric value, we put it in a String.

The simplest thing to do is to change Observation to the following:

{
  "patientId" : string,
  "device" : {
    "id" : string,
    "locationId" : string
  },
  "readingSource" : {
    "id" : string,
    "sourceType" : string,
    "deviceId" : string
  },
  "reading" : {
    "ts" : numeric,
    "readingType" : {
      "system" : string,
      "code" : string,
      "valueType": string,        // represent the value type of the reading
    },
    "value" : numeric,
    "valueString" : string,      // anything that cannot be represented as a numeric get stored here
    "uom" : string
  }
}

Or should we be more elaborate and simply duplicate the FHIR observation specification?

The text was updated successfully, but these errors were encountered:

ddebrunner · 2017-11-01T21:00:55Z

Can you expand on what you mean by this:

There is currently no way to really represent those values.

JSON allows any values, so is it solely when you are converting to an SPL schema that causes problems?

chanskw · 2017-11-01T21:01:33Z

Yes... this is mainly a problem in SPL... we define the same time in Java, SPL and Python.

chanskw · 2017-11-01T21:02:59Z

how would someone interpret this value... when all the data is coming in from the same data stream. i.e. Some can be numeric, while others may be not, but they can come in from the same topic.

ddebrunner · 2017-11-01T21:33:33Z

Doesn't it depend on context, the value of other fields in the object if a field is a number or text?

YuanChiChang · 2017-11-03T15:43:26Z

If one treats JSON (or XML, if you belong to that camp) as nothing but a hierarchy of name-value pairs, from Streams' internal representation point of view, why don't we just model them as such? For example, a sensor data normally converted to int32 device_id, rstring notes, int32 heart_beat, int32 temperature can be converted to intValues[enum.device_id], intValues[enum.heart_beat], intValues[enum.temperature], stringValues[enum.notes]. intValues, stringValues are declared with map<enum, int32> and map<enum, rstring>.

This dumbed down data model, much like JSON, XML, gives the maximal flexibility and trades off poor runtime efficiency.

chanskw · 2017-11-17T21:50:34Z

To prototype with NLP support, added valueStr attribute in Reading to allow for representation of non-numeric observation.

YuanChiChang · 2017-11-27T14:58:32Z

Part of NLP support is also about the flexibility of carrying tags throughout the processing pipeline. Tags may be further organized as flat lists or lexical trees. The key is however a flexible data payload structure.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Observation Redesign #148

Observation Redesign #148

chanskw commented Nov 1, 2017 •

edited

Loading

ddebrunner commented Nov 1, 2017

chanskw commented Nov 1, 2017 •

edited

Loading

chanskw commented Nov 1, 2017

ddebrunner commented Nov 1, 2017

YuanChiChang commented Nov 3, 2017

chanskw commented Nov 17, 2017

YuanChiChang commented Nov 27, 2017

Observation Redesign #148

Observation Redesign #148

Comments

chanskw commented Nov 1, 2017 • edited Loading

ddebrunner commented Nov 1, 2017

chanskw commented Nov 1, 2017 • edited Loading

chanskw commented Nov 1, 2017

ddebrunner commented Nov 1, 2017

YuanChiChang commented Nov 3, 2017

chanskw commented Nov 17, 2017

YuanChiChang commented Nov 27, 2017

chanskw commented Nov 1, 2017 •

edited

Loading

chanskw commented Nov 1, 2017 •

edited

Loading