You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SNOMED distributions can, even in a snapshot, have more than one relationship relating to the same combination of source, target, type and modifier identifiers.
For example, this is from the September 22 UK edition:
➜ Terminology git:(main) head -n 1 sct2_Relationship_UKCLSnapshot_GB1000000_20220928.txt
id effectiveTime active moduleId sourceId destinationId relationshipGroup typeId characteristicTypeId modifierId
➜
The current relationship indexing is done in a single pass during relationship importing. This would work if there were no relationships that essentially had the same data. In this case, you can see that the later row shows the relationship to be inactive, while the earlier row shows it to be active.
Current import would look at the effective date and if the same or later, would delete the relationship because it is inactive in the later row. This is incorrect behaviour when multiple relationships can reference the same tuple of source-target-type.
This therefore results in ~70 or so concepts not having correct relationships stored, affecting search and inference. The fix is to adopt a two-pass approach, in which relationships are imported, and the indices rebuilt after import.
The text was updated successfully, but these errors were encountered:
SNOMED distributions can, even in a snapshot, have more than one relationship relating to the same combination of source, target, type and modifier identifiers.
For example, this is from the September 22 UK edition:
Terminology git:(main) cat sct2_Relationship_UKCLSnapshot_GB1000000_20220928.txt | grep 1089261000000101 832591000000123. 20210512 0 999000011000000103 1089261000000101 609336008 0 116680003 900000000000011006 900000000000451002 2191421000000129 20210512 0 999000011000000103 1089261000000101 301857004 0 116680003 900000000000011006 900000000000451002 3219831000000124 20210512 1 999000011000000103 1089261000000101 773760007 2 42752001 900000000000011006 900000000000451002 3228451000000128 20210512 1 999000011000000103 1089261000000101 51576004 1 363698007 900000000000011006 900000000000451002 3229451000000120 20210512 1 999000011000000103 1089261000000101 12835000 1 116676008 900000000000011006 900000000000451002 3229461000000123 20210512 1 999000011000000103 1089261000000101 213345000 0 116680003 900000000000011006 900000000000451002 5687171000000128 20210512 0 999000011000000103 1089261000000101 213345000 0 116680003 900000000000011006 900000000000451002 5687191000000129 20210512 0 999000011000000103 1089261000000101 36818005 1 116676008 900000000000011006 900000000000451002 5687201000000127 20210512 0 999000011000000103 1089261000000101 52530000 1 363698007 900000000000011006 900000000000451002
In this, you can see that 3229461000000123 and 5687171000000128 both relate to the same source, target and type:
The current relationship indexing is done in a single pass during relationship importing. This would work if there were no relationships that essentially had the same data. In this case, you can see that the later row shows the relationship to be inactive, while the earlier row shows it to be active.
Current import would look at the effective date and if the same or later, would delete the relationship because it is inactive in the later row. This is incorrect behaviour when multiple relationships can reference the same tuple of source-target-type.
This therefore results in ~70 or so concepts not having correct relationships stored, affecting search and inference. The fix is to adopt a two-pass approach, in which relationships are imported, and the indices rebuilt after import.
The text was updated successfully, but these errors were encountered: