[GH-ISSUE #9] duplicate mappings in episode/episode_object_concept_id/kuur_usagi.csv #4

Closed
opened 2026-03-23 20:29:33 +00:00 by mirror · 5 comments
Owner

Originally created by @pjlammertyn on GitHub (May 30, 2023).
Original GitHub issue: https://github.com/RADar-AZDelta/AZDelta-OMOP-CDM/issues/9

Originally assigned to: @lbertelo01 on GitHub.

in kuur_usagi.csv

query:

select source_code, target_concept_id, count(*)
from (
    SELECT DISTINCT
        t.sourceCode as source_code
        ,0 as source_concept_id
        ,'' as source_vocabulary_id
        ,t.sourceName as source_code_description
        ,t.conceptId as target_concept_id
        ,c.vocabulary_id as target_vocabulary_id
        ,CURRENT_DATE() as valid_start_date
        ,DATE(2099, 12, 31) as valid_end_date
        ,cast(NULL as string) as invalid_reason
    FROM omop_work_zidder.episode__episode_object_concept_id_usagi_casted t
    INNER JOIN omop_zidder.concept c on c.concept_id = t.conceptId
    where t.mappingStatus = 'APPROVED'
) t
group by source_code, target_concept_id
having count(*) > 1

image

Originally created by @pjlammertyn on GitHub (May 30, 2023). Original GitHub issue: https://github.com/RADar-AZDelta/AZDelta-OMOP-CDM/issues/9 Originally assigned to: @lbertelo01 on GitHub. in [kuur_usagi.csv](episode/episode_object_concept_id/kuur_usagi.csv) query: ```sql select source_code, target_concept_id, count(*) from ( SELECT DISTINCT t.sourceCode as source_code ,0 as source_concept_id ,'' as source_vocabulary_id ,t.sourceName as source_code_description ,t.conceptId as target_concept_id ,c.vocabulary_id as target_vocabulary_id ,CURRENT_DATE() as valid_start_date ,DATE(2099, 12, 31) as valid_end_date ,cast(NULL as string) as invalid_reason FROM omop_work_zidder.episode__episode_object_concept_id_usagi_casted t INNER JOIN omop_zidder.concept c on c.concept_id = t.conceptId where t.mappingStatus = 'APPROVED' ) t group by source_code, target_concept_id having count(*) > 1 ``` ![image](https://github.com/RADar-AZDelta/AZDelta-OMOP-CDM/assets/1187178/42570ff5-f4b3-4e6d-acbe-36534cf10cfc)
Author
Owner

@lbertelo01 commented on GitHub (May 30, 2023):

Dit komt doordat sommige kuren soms een sourceTerm hebben die heel lichtjes afwijken van elkaar:

image

Hoe zou je de source_code aanpassen om ze uniek te maken?

<!-- gh-comment-id:1568269706 --> @lbertelo01 commented on GitHub (May 30, 2023): Dit komt doordat sommige kuren soms een sourceTerm hebben die heel lichtjes afwijken van elkaar: ![image](https://github.com/RADar-AZDelta/AZDelta-OMOP-CDM/assets/98480071/588f5bea-7bea-4338-9c7c-e3e462c2a49e) Hoe zou je de source_code aanpassen om ze uniek te maken?
Author
Owner

@pjlammertyn commented on GitHub (May 30, 2023):

heb alvast check toegevoegd in RIAB: Added check for duplicate rows in Usagi CSV's

<!-- gh-comment-id:1568283160 --> @pjlammertyn commented on GitHub (May 30, 2023): heb alvast check toegevoegd in RIAB: [Added check for duplicate rows in Usagi CSV's](https://github.com/RADar-AZDelta/Rabbit-in-a-Blender/commit/ccb3b0e4c877298684ccaf7cac4372d239330321)
Author
Owner

@lbertelo01 commented on GitHub (May 30, 2023):

Is dit issue dan opgelost?

<!-- gh-comment-id:1568302986 --> @lbertelo01 commented on GitHub (May 30, 2023): Is dit issue dan opgelost?
Author
Owner

@pjlammertyn commented on GitHub (May 30, 2023):

nope

<!-- gh-comment-id:1568368563 --> @pjlammertyn commented on GitHub (May 30, 2023): nope
Author
Owner

@pjlammertyn commented on GitHub (May 30, 2023):

Ofwel met comma aaneen plakken, ofwel enkel de eerste nemen, om je usage mee samen te stellen.

with cte as (
  select distinct
    CONCAT("KUUR_", k.VCMOCD) as sourceCode,
    k.OMSCHRIJV as sourceName,
    --NULLIF(REPLACE(REPLACE(k.OPMERK, '\r', ' '), '\n', ''),"") as additionalinfo,
  from `hix.MEDICAT_KUUR` k
  inner join `hix.MEDICAT_KUURDEEL` d on k.code = d.kuurcd
)
select 
  sourceCode,
  STRING_AGG(sourceName, ",") as sourceName,
  --ARRAY_AGG(sourceName)[offset(0)] as sourceName,
  count(sourceCode) as sourceFrequency
from cte
--where sourceCode = 'KUUR_0000001462'
group by sourceCode
<!-- gh-comment-id:1568369664 --> @pjlammertyn commented on GitHub (May 30, 2023): Ofwel met comma aaneen plakken, ofwel enkel de eerste nemen, om je usage mee samen te stellen. ```sql with cte as ( select distinct CONCAT("KUUR_", k.VCMOCD) as sourceCode, k.OMSCHRIJV as sourceName, --NULLIF(REPLACE(REPLACE(k.OPMERK, '\r', ' '), '\n', ''),"") as additionalinfo, from `hix.MEDICAT_KUUR` k inner join `hix.MEDICAT_KUURDEEL` d on k.code = d.kuurcd ) select sourceCode, STRING_AGG(sourceName, ",") as sourceName, --ARRAY_AGG(sourceName)[offset(0)] as sourceName, count(sourceCode) as sourceFrequency from cte --where sourceCode = 'KUUR_0000001462' group by sourceCode ```
Sign in to join this conversation.
No labels
pull-request
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
RADar-AZDelta/AZDelta-OMOP-CDM#4
No description provided.