Bug fix:
dplyr update broke ukb_icd_diagnosis. Fixed in dev version.
Corrected functionality:
Updated earlier typo/change that made ukb_df incorrectly convert all column
types to character (caused by replacing stringr::str_interp to
stringr::str_c when passing internal column type vector to data.table::fread,
without updating argument)
Bug fix:
Cleared bug in ukb_icd_freq_by for disease frequency against a quantitative
trait.
Made ukb_df column name from description regex find/replace more general to
capture all variations of "uses data coding"
Updated functionality:
ukb_icd_keyword internal regex search ignore.case defaults to TRUE, with
and option to set added to arguments.
Test data:
Added example UKB data ukbXXXX.tab, ukbXXXX.r, ukbXXXX.html to test the 'read'
and 'summarise' functionality ukb_df, ukb_df_field, and ukb_context. See
the section "An example fileset" in the vignette for details.
Updated functionality:
ukb_icd_freq_by with freq.plot = TRUE plots a barplot for categorical
reference variables, and plots diagnosis frequencies at the midpoint of each
group for quantitative reference variables.
Webpage:
The ukbtools webpage has been rebuilt with pkgdown and includes the vignette under the Articles tab.
Updated functionality:
ukb_df: Replacedreadr::read_tsvwithdata.table::freadfor faster read. Also includes ann_threadsargument passed todata.table::fread, which may make read faster. Column names now include field code to ensure names are unique (UK Biobank sometimes use the same description for more than one variable)
Defunct functionality:
- Added defunct message to
ukb_gen_meta,ukb_gen_pcs,ukb_gen_excl,ukb_gen_rel,ukb_gen_het,ukb_gen_excl_to_na, andukb_gen_write_plink_excl.ukb_defunctexplains why these have become defunct and where to get UK Biobank genetic (meta)data.
New functionality:
- Since the UKB changed the way they serve up genetic metadata, the following work with files described in UKB Resource 531:
ukb_gen_sqc_namessupplies column names for the separately downloaded sample QC file;ukb_gen_rel_countdoes the same as before (a count of levels of relatedness or a plot) but with separately downloaded relatedness data;ukb_gen_related_with_datareturns subset of relatedness data in which both IDs have data on a phenotype of interest;ukb_gen_samples_to_removereturns a list of individuals to exclude in order to remove relatedness (one possible solution to a maximal subset problem).
Bug fix:
-
ukb_icd_freq_by: corrected order by levels ofreference.varin the optional plot. (order in the default dataframe returned was correct.) -
ukb_df: corrected tab file path update in r source file. Specifically, made regular expression more specific (1 case reported of regular expression matching word elsewhere in the source file.). Also, replaced utils::read.delim with readr::read_tsv for faster read, with progress bar.
New functionality:
-
ukb_icd_freq_byreturns frequency for one or more ICD diagnoses by levels of a reference variable and includes an optional plot -
ukb_df_full_join(a thin wrapper arounddplyr::full_join) recursively called on a list of UKB datasets -
ukb_df_duplicated_namesto identify duplicated names within a dataset. The variable prefix (constructed from its description), index, and array should make the column name unique. However, typos in UKB documentation that give two variables the do not increment index/array have been observed. You will want to identify these and update them appropriately. We expect the occurrence of such duplicates will be rare.
Updated functionality:
-
ukb_icd_diagnosisnow takes one or more individual ids and returns a dataframe with a potential message noting ids with no diagnoses -
ukb_icd_keywordaccepts a character vector of one or more "keywords" and returns all ICD descriptions including any of the keywords
- beta release to CRAN. Feature complete but may contain unknown bugs.