- 0.2.0 - Major overhaul
- 0.2.1 -
Observation.referenceRange
temporarily removed to allow batch loads - 0.2.2 - Added
conf_file
utility and removed hard-coded configuration files. Now available viapip
- 0.2.3 - Partial refactoring and test cleanup
- 0.3.0 - Adds concepts to observation fact table
This package creates an i2b2 ontology from the FHIR STU3/R4 resource model. It uses a combination of the FHIR W5 (who, what, why, where, when) ontology and the FHIR Resource Ontology to create an i2b2 equivalent.
i2FHIRb2 has been tested with the postgres version of i2b2 Software release 1.7.08. While it will theoretically work with earlier versions, you may encounter issues, including:
- The FHIR data includes UTF-8 characters. Earlier releases of the i2b2 used
SQL_ASCII
encoding which won't work. - This package hasn't been tested with Oracle or Microsoft SQL Server. We use sqlalchemy, which should minimize the issues, but you may want to talk with the authors before attempting to run with a non-postgres back end.
Before you start, you will need to know:
- The ip address and port number of the i2b2 SQL server. The default for postgres is: localhost:5432
- A userid/password combination that has write access to the
i2b2demodata
(CRC) andi2b2metadata
schemas. The default for postgresql is: postgres:[none], but, being a responsible dba, you will have changed these.
You should also have an i2b2 client (we use the web client) that can access and query the installed services.
You will need the latest version of Python 3 (3.6 or later). This software will not work with Python 2 or earlier versions of Python 3.
> python --version
Python 3.6.1
>
> pip install i2FHIRb2
> generate_i2b2 -v
Version: 0.2.2 <--- a newer version may print here
The following command creates a file, my_conf
with the default configuration parameters for the i2FIRb2 package.
> config_file -f my_conf --user <db user> --password <db password>
This file has an editable set of parameters used by the FHIR loading tools. All of these parameters can be set or overriden on the command line. Of particular interest are:
- dburl - the URL of the target i2b2 SQL instance
- user - user id to use with instance
- password - password to use with instance
- sourcesystem - sourcesystem_id used in i2b2 tables. Handy in that
removefacts
can remove all entries for a given source system - uploadid - upload identifier used in i2b2 tables. Individual uploads can also be removed.
> generate_i2b2 --conf my_conf --test
Validating input files
URL: https://build.fhir.org/fhir.ttl is valid
URL: https://build.fhir.org/w5.ttl is valid
Validating sql connection
Connection validated
Validating target tables
Table concept_dimension exists
Table encounter_mapping exists
Table modifier_dimension exists
Table observation_fact exists
Table ontology_table exists
Table patient_dimension exists
Table patient_mapping exists
Table provider_dimension exists
Table table_access exists
Table visit_dimension exists
Testing write access
2 rows updated in table_access table
(TODO: Add a fail situation)
The sections below tell you how to:
- Load the FHIR Metadata Vocabulary as an i2b2 ontology
- Load FHIR data as observation facts
- Run i2b2 queries across FHIR data
There are two ways to load/update an existing set of i2b2 tables:
- Run
generate_i2b2
and load the tables directly - Import the tab separated value (.tsv) tables that have been preloaded as part of this project
Further instructions for running the various loader functions can be found in the scripts directory.
> generate_i2b2 --conf my_conf -l
Loading fhir.ttl
(cached)
loading w5.ttl
done
1 i2b2metadata.table_access record deleted
1 i2b2metadata.table_access record inserted
2143 i2b2demodata.concept_dimension records deleted
2143 i2b2demodata.concept_dimension records inserted
742 i2b2demodata.modifier_dimension records deleted
742 i2b2demodata.modifier_dimension records inserted
15222 i2b2metadata.custom_meta records deleted
19 i2b2metadata.custom_meta records deleted
15241 i2b2metadata.custom_meta records inserted
>
It is also possible to load the i2b2 ontology tables from the set of tab separated value (.tsv) that are included in the distribution.
Note 1: We do our best to reload these file with each release. Sometimes we don't remember. Also, the .tsv files
in the distribution are derived from ../tests/data/fhir_metadata_vocabulary
, which may vary slightly from the FMV that
can be found at 'https://build.fhir.org/'. You can regenerate these tables by:
> generate_i2b2 --conf my_conf -od ../i2b2files
Loading fhir.ttl
(cached)
loading w5.ttl
done
writing i2b2files/table_access.tsv (1) records written
writing i2b2files/concept_dimension.tsv (2143) records written
writing i2b2files/modifier_dimension.tsv (742) records written
writing i2b2files/ontology_table.tsv (15241) records written
>
Note 2:: Before you load the files below, you may need to adjust the length of the following columns:
table | column | default size | new size |
concept_dimension | concept_cd | 50 | 200 |
modifier_dimension | modifier_cd | 50 | 200 |
custom_meta | c_basecode | 50 | 200 |
c_tooltip | 900 | 1600 |
The pre-loaded tsv files can be found in the i2b2files
subdirectory of the i2FHIRb2
install:
table_access.tsv
-- thetable_access
table describes the location and root paths of i2b2 metadata. This file has one row that states that FHIR resource definitions can be found in thecustom_meta
table with the root '\FHIR\'.concept_dimension.tsv
-- theconcept_dimension
table links defines the set of possible concept codes that can appear in theobservation_fact
table. For FHIR, this includes all "first level" resource entries -- Observation.identifier, Observation.basedOn, etc.modifier_dimension.tsv
-- themodifier_dimension
table contains possible modifier codes for theobservation_fact
table. For FHIR, this contains the substructure represented in the concept codes --CodeableConcept.text
,CodeableConcept.coding
, etc.ontology.tsv
-- the "ontology" or "metadata" tables provide a navigational hierarchy that serve to organize and group i2b2 dimensions (concept
,modifier
,patient
,provider
andvisit
). Thei2FHIRb2
project uses the FHIR W5 tables to provide high level organization and then lists the resources and their possible properties and modifiers for this table.
All of these tables have tab-separated values and the first row of each table has the column headers. They can be imported directly into the corresponding i2b2 tables
NOTE: We have been unable to convince the postgreSQL import tool to treat empty columns as NULL
values. While, in general, the i2b2 software appears to treat NULLs and zero-length strings as equivalent, there is at least one place where this breaks -- the m_exclusion_cd
column in the custom_meta
(ontology) table. After loading the ontology.tsv
table it is necessary to execute the following SQL:
UPDATE custom_meta SET m_exclusion_cd = NULL WHERE m_exclusion_cd = '';
Open the i2b2 browser and navigate to the FHIR Resources
. As you drill down it should look like:
The loadfacts
program is used to load select FHIR Resource instances in to the i2b2 CRC tables. It can load data from a list of files, a list of URLs or an input directory. Input can either be in JSON or Turtle format. Both an upload_id
and sourcesystem_cd
can be specified
for tracking and management purposes.
> loadfacts -v
FHIR i2b2 CRC loader -- Version 0.2.2
(venv) > loadfacts --conf my_conf -u 117 -l -t json -rm -i https://build.fhir.org/observation-example-f001-glucose.json
upload_id: 117
Starting encounter number: 505749
Starting patient number: 1000000507
--> loading https://build.fhir.org/observation-example-f001-glucose.json
89 triples
0: (Practitioner) - https://hl7.org/fhir/Practitioner/f005
1: (Patient) - https://hl7.org/fhir/Patient/f001
2: (Observation) - https://hl7.org/fhir/Observation/f001
---> Graph map phase complete
Generated:
22 Observation facts
1 Patients
2 Patient mappings
=== SKIPS ===
0 Bundled resources (shouldn't happen?)
0 Visit resources
0 Infrastructure resources
0 Provider resources
1 Unmapped resources
Deleted 0 patient_dimension records
Deleted 0 patient_mapping records
Deleted 0 observation_fact records
Deleted 0 visit_dimension records
Deleted 0 encounter_mapping records
0 / 0 patient_dimension records added / modified
0 / 0 patient_mapping records added / modified
1 / 0 visit_dimension records added / modified
1 / 1 encounter_mapping records added / modified
22 / 0 observation_fact records added / modified
>
The above example uses the following parameters:
--conf my_conf
default configuration parameters-u 117
upload identifier-l
load the data tables-t json
source format is JSON-rm
Remove existing entries for this upload id before loading (Useful for testing)- **
-i https://build.fhir.org/observation-example-f001-glucose.json
Input comes from this URL
The results of the above load can be (indirectly) viewed with a query such as the one below:
Note that Selected groups occur in the same financial encounter
is selected in the "temporal constraint". We are currently using the notion of "encounter" to represent "resource" -- the selection says that the code, the system and the interpretation all have to occur on the same encounter.
The results for this query (we selected the "Patient Set" and "Number of Patients" options) are shown below:
We can add patient demographics by following the link in the observation, loading:
(venv) > loadfacts --conf my_conf -u 117 -l -t json -rm -i https://build.fhir.org/patient-example-f001-pieter.json
upload_id: 117
Starting encounter number: 505750
Starting patient number: 1000000506
--> loading https://build.fhir.org/patient-example-f001-pieter.json
136 triples
0: (Organization) - https://hl7.org/fhir/Organization/f001
1: (Patient) - https://hl7.org/fhir/Patient/f001
---> Graph map phase complete
Generated:
0 Observation facts
1 Patients
2 Patient mappings
=== SKIPS ===
0 Bundled resources (shouldn't happen?)
0 Visit resources
0 Infrastructure resources
1 Provider resources
0 Unmapped resources
Deleted 1 patient_dimension records
Deleted 2 patient_mapping records
Deleted 22 observation_fact records
Deleted 1 visit_dimension records
Deleted 2 encounter_mapping records
1 / 0 patient_dimension records added / modified
2 / 0 patient_mapping records added / modified
0 / 0 visit_dimension records added / modified
0 / 0 encounter_mapping records added / modified
0 / 0 observation_fact records added / modified
(venv) >
At the moment, the FHIR structural model is represented pretty much verbatim in the i2b2 ontology and the corresponding resources instances in the i2b2 observation_fact table. We have demonstrated that it is possible to create queries in the i2b2 web client to access this information, but it should also be obvious that these queries would be unapproachable to anyone who wasn't both a FHIR and i2b2 model expert.
The next steps include:
Mapping FHIR data types to i2b2 equivalents.
Currently, FHIR data types are represented quite literally. As an example FHIR Quantity currently has a separate row for Quantity.unit
,
Quantity.code
, Quantity.comparator
, Quantity.value
and Quantity.system
. There is a close correspondence between these and the observation_fact value columns units_cd
, tval_char
, and nval_num
entries. In addition, the valueflag_cd
has a close correspondence to the FHIR Observation interpretation field
Currently, the FHIR code, FHIR Coding and FHIR CodeableConcept data element values are represented as textual values. We need to create i2b2 concept and/or modifier codes that represent this information. This step would provide a key entry point to the use of FHIR terminologies in the i2b2 space.
Currently, we have preempted the "visit/financial encounter" dimension to represent FHIR resources. We need to extend the i2b2 model to be able to group elements on the "Observation" or "Resource" level.
The project currently assumes that all information appears in the observation_fact
table. This obviously isn't the case as:
- Patient / provider / visit information either maps to or extends the i2b2 dimension tables
- Resources such as 'Device', 'Medication', etc. currently have no place in i2b2 at all.