Inference of joining keys by parent seems insufficient.
Consider a case where two child datasets are joined.
ADLB and ADRS have the same primary keys, "STUDYID", "USUBJID", "PARAMCD", "AVISIT".
> library(teal.data)
> library(dplyr)
>
> ADLB <- rADLB
> ADRS <- rADRS
>
> jk <- default_cdisc_join_keys["ADLB", "ADRS"]
>
> full_join(ADLB, ADRS) |> dim()
Joining with `by = join_by(STUDYID, USUBJID, SUBJID, SITEID, AGE, AGEU, SEX, RACE, ETHNIC, COUNTRY, DTHFL, INVID, INVNAM, ARM, ARMCD, ACTARM, ACTARMCD, TRT01P, TRT01A, TRT02P, TRT02A, REGION1, STRATA1,
STRATA2, BMRKR1, BMRKR2, ITTFL, SAFFL, BMEASIFL, BEP01FL, AEWITHFL, RANDDT, TRTSDTM, TRTEDTM, TRT01SDTM, TRT01EDTM, TRT02SDTM, TRT02EDTM, AP01SDTM, AP01EDTM, AP02SDTM, AP02EDTM, EOSSTT, EOTSTT, EOSDT,
EOSDY, DCSREAS, DTHDT, DTHCAUS, DTHCAT, LDDTHELD, LDDTHGR1, LSTALVDT, DTHADY, ADTHAUT, ASEQ, PARAM, PARAMCD, AVAL, ADTM, ADY, AVISIT, AVISITN)`
[1] 11600 104
> full_join(ADLB, ADRS, by = jk) |> dim()
[1] 67200 165
Warning message:
In full_join(ADLB, ADRS, by = jk) :
Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 1 of `x` matches multiple rows in `y`.
ℹ Row 1 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship = "many-to-many"` to silence this warning.
>
Joining by default, i.e. using intersect(names(x), names(y)) correctly uses all primary keys as joining keys. Extracting a join_key_set from default_cdisc_join_keys results in a cartesian product.
What happened?
Inference of joining keys by parent seems insufficient.
Consider a case where two child datasets are joined.
ADLBandADRShave the same primary keys,"STUDYID", "USUBJID", "PARAMCD", "AVISIT".Joining by default, i.e. using
intersect(names(x), names(y))correctly uses all primary keys as joining keys. Extracting ajoin_key_setfromdefault_cdisc_join_keysresults in a cartesian product.sessionInfo()
Relevant log output
No response
Code of Conduct
Contribution Guidelines
Security Policy