Skip to content

Commit fd54c8f

Browse files
authored
Merge pull request #2 from StanfordHPDS/readme_update
add preprint link
2 parents 1e7bf85 + a7b4f73 commit fd54c8f

File tree

2 files changed

+30
-31
lines changed

2 files changed

+30
-31
lines changed

README.Rmd

Lines changed: 26 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -14,21 +14,23 @@ knitr::opts_chunk$set(
1414
)
1515
```
1616

17-
# Package Overview
17+
# Package Overview
1818

1919
<!-- badges: start -->
20+
2021
[![R-CMD-check](https://github.com/StanfordHPDS/upcoding/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/StanfordHPDS/upcoding/actions/workflows/R-CMD-check.yaml)
22+
2123
<!-- badges: end -->
2224

23-
This R package enables users to:
25+
This R package enables users to:
2426

25-
- **Simulate baseline co-occurring health conditions** for a user-defined number of individuals
26-
- **Upcode specific health conditions** to a user-specified degree over a set of time points, including optional additional right censoring
27-
- **Undercode** the simulated individuals to a user-specified degree
27+
- **Simulate baseline co-occurring health conditions** for a user-defined number of individuals
28+
- **Upcode specific health conditions** to a user-specified degree over a set of time points, including optional additional right censoring
29+
- **Undercode** the simulated individuals to a user-specified degree
2830

29-
Overall, these functions help users to better and more reproducibly evaluate approaches for upcoding and/or undercoding analysis and monitoring relevant to Medicare. As one example, see [add link to paper here when available].
31+
Overall, these functions help users to better and more reproducibly evaluate approaches for upcoding and/or undercoding analysis and monitoring relevant to Medicare. As one example, see [arxiv:2602.04092](https://arxiv.org/abs/2602.04092), which uses this package for simulations.
3032

31-
For more details about the background relevant to each of these, see "Brief background" below.
33+
For more details about the background relevant to each of these, see "Brief background" below.
3234

3335
## Installation
3436

@@ -39,9 +41,9 @@ You can install the development version of upcoding from GitHub with:
3941
pak::pak("StanfordHPDS/upcoding")
4042
```
4143

42-
## Example tutorial
44+
## Example tutorial
4345

44-
### Setup
46+
### Setup
4547

4648
```{r}
4749
library(upcoding)
@@ -53,13 +55,13 @@ library(tidycmprsk)
5355
library(ggsurvfit)
5456
```
5557

56-
### Simulate baseline data
58+
### Simulate baseline data
5759

58-
As an illustrative example, we simulate baseline data using default settings for 100 people (e.g. rows). Each row of the simulated data represents a person (indexed by the `person_id` column) and each column represents a diagnosis. More specifically, each column corresponds to one of the 115 Hierarchical Condition Categories (HCCs) in version 28 (v28) of the CMS-HCC risk adjustment formula, which is used in Medicare Advantage. You can read more about the basics of risk adjustment [here](https://www.commonwealthfund.org/publications/explainer/2024/apr/basics-risk-adjustment) and [here](https://www.sciencedirect.com/science/chapter/edited-volume/pii/B9780128113257000038).
60+
As an illustrative example, we simulate baseline data using default settings for 100 people (e.g. rows). Each row of the simulated data represents a person (indexed by the `person_id` column) and each column represents a diagnosis. More specifically, each column corresponds to one of the 115 Hierarchical Condition Categories (HCCs) in version 28 (v28) of the CMS-HCC risk adjustment formula, which is used in Medicare Advantage. You can read more about the basics of risk adjustment [here](https://www.commonwealthfund.org/publications/explainer/2024/apr/basics-risk-adjustment) and [here](https://www.sciencedirect.com/science/chapter/edited-volume/pii/B9780128113257000038).
5961

6062
By default, `simulate_baseline_v28_hcc_dt()` creates a directory in your current working directory (e.g. `here::here()`). We temporarily create an output directory "temp" relative to the current working directory to save our output. We'll also generate two different baseline data sets-- one for upcoding and one for undercoding. Don't forget to change the seed to get different results!
6163

62-
Note. Users will likely want to simulate more people (e.g. rows) in practice.
64+
Note. Users will likely want to simulate more people (e.g. rows) in practice.
6365

6466
```{r}
6567
#| message: false
@@ -79,7 +81,7 @@ simulate_baseline_v28_hcc_dt(
7981
)
8082
```
8183

82-
Read in generated files:
84+
Read in generated files:
8385

8486
```{r}
8587
undercoding_baseline_data <- fread(here("temp/undercoding_baseline_data.csv"))
@@ -90,9 +92,9 @@ upcoding_baseline_data <- fread(here("temp/upcoding_baseline_data.csv"))
9092
dim(undercoding_baseline_data)
9193
```
9294

93-
### Undercoding
95+
### Undercoding
9496

95-
The baseline data generated is meant to simulate co-occurring health conditions free of coding incentives. However, it might be informative to instead generate data like closer to that of Traditional Medicare (TM), which is known to have undercoding of diagnoses (one recent paper about this [here](https://www.healthaffairs.org/doi/full/10.1377/hlthaff.2024.00169)). If we want to simulate TM-like data, we might want to undercode (e.g. randomly remove diagnoses) from our baseline data. This function removes a user-specified proportion of diagnoses across the entire data set, and writes the undercoded data set to file (with default file prefix `undercoded_data_*`).
97+
The baseline data generated is meant to simulate co-occurring health conditions free of coding incentives. However, it might be informative to instead generate data like closer to that of Traditional Medicare (TM), which is known to have undercoding of diagnoses (one recent paper about this [here](https://www.healthaffairs.org/doi/full/10.1377/hlthaff.2024.00169)). If we want to simulate TM-like data, we might want to undercode (e.g. randomly remove diagnoses) from our baseline data. This function removes a user-specified proportion of diagnoses across the entire data set, and writes the undercoded data set to file (with default file prefix `undercoded_data_*`).
9698

9799
```{r}
98100
undercode_dt(undercoding_baseline_data,
@@ -101,16 +103,15 @@ undercode_dt(undercoding_baseline_data,
101103
)
102104
```
103105

104-
105-
### Upcoding
106+
### Upcoding
106107

107108
#### Specify which diagnoses to upcode and to what degree
108109

109-
The main upcoding function, `upcode_all_hccs()` expects as input a tibble or data.frame specifying the following:
110+
The main upcoding function, `upcode_all_hccs()` expects as input a tibble or data.frame specifying the following:
110111

111-
- "hcc" (character vector): Which individual HCCs to upcode, identified as "hcc1", "hcc2", etc.
112-
- "approach" (character vector): How to select people (e.g. rows) to upcode. For each HCC, this should be either "any" or "lower severity". "any" means that any rows not previously coded for that HCC will be considered as available for upcoding, and "lower severity" will only upcode rows where a lower severity HCC was previously coded (if that's available).
113-
- "upcoding_prop" (numeric vector): The proportion of available rows to upcode (needs to be a value greater than 0 and less than 1)
112+
- "hcc" (character vector): Which individual HCCs to upcode, identified as "hcc1", "hcc2", etc.
113+
- "approach" (character vector): How to select people (e.g. rows) to upcode. For each HCC, this should be either "any" or "lower severity". "any" means that any rows not previously coded for that HCC will be considered as available for upcoding, and "lower severity" will only upcode rows where a lower severity HCC was previously coded (if that's available).
114+
- "upcoding_prop" (numeric vector): The proportion of available rows to upcode (needs to be a value greater than 0 and less than 1)
114115

115116
```{r}
116117
# Specification input
@@ -126,7 +127,7 @@ my_upcoding_spec_df <- tibble(
126127
)
127128
```
128129

129-
Now we're set to upcode the speficied HCCs! Note. By default, this will upcode over 4 time points (row IDs to upcode are split randomly across time points) and will also additionally right censor 5% of rows across the same time points (representing loss to follow up). This loss to follow up is also split randomly across the number of time points you specify, and once someone is lost to follow up they can't be coded for any HCCs afterwards. You can adjust these with the `num_timepoints` and `censoring_prop` parameters respectively; see documentation for details.
130+
Now we're set to upcode the speficied HCCs! Note. By default, this will upcode over 4 time points (row IDs to upcode are split randomly across time points) and will also additionally right censor 5% of rows across the same time points (representing loss to follow up). This loss to follow up is also split randomly across the number of time points you specify, and once someone is lost to follow up they can't be coded for any HCCs afterwards. You can adjust these with the `num_timepoints` and `censoring_prop` parameters respectively; see documentation for details.
130131

131132
```{r}
132133
upcode_all_hccs(
@@ -136,9 +137,9 @@ upcode_all_hccs(
136137
)
137138
```
138139

139-
We have the option to either read in the final upcoded dataset (default name: `all_hcc_upcoded_data.csv`) or read in events by HCC (default name: `[hcc_name]_upcoded_data_event_and_time_labels.csv`). `all_hcc_upcoded_data.csv` corresponds to the final upcoded and censored data at the end of all time points.
140+
We have the option to either read in the final upcoded dataset (default name: `all_hcc_upcoded_data.csv`) or read in events by HCC (default name: `[hcc_name]_upcoded_data_event_and_time_labels.csv`). `all_hcc_upcoded_data.csv` corresponds to the final upcoded and censored data at the end of all time points.
140141

141-
Let's look at the latter (e.g. `[hcc_name]_upcoded_data_event_and_time_labels.csv`), as this is the format compatible with standard survival packages with R (so we assume it'll be used more):
142+
Let's look at the latter (e.g. `[hcc_name]_upcoded_data_event_and_time_labels.csv`), as this is the format compatible with standard survival packages with R (so we assume it'll be used more):
142143

143144
```{r}
144145
# Read in events (upcoding or censoring) for HCC 2
@@ -147,7 +148,7 @@ hcc2_labels <- read_csv(here("temp/hcc2_upcoded_data_event_and_time_labels.csv")
147148
head(hcc2_labels)
148149
```
149150

150-
Let's plot it! We're interested in the incidence of new coding, so a cumulative incidence plot makes sense as a starting point.
151+
Let's plot it! We're interested in the incidence of new coding, so a cumulative incidence plot makes sense as a starting point.
151152

152153
```{r}
153154
#| results: FALSE # suppresses text that ggcuminc outputs
@@ -164,7 +165,3 @@ Lastly, you might want to delete the temporary directory we made for this tutori
164165
```{r}
165166
unlink(here("temp/"), recursive = TRUE)
166167
```
167-
168-
169-
170-

README.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
<!-- badges: start -->
77

88
[![R-CMD-check](https://github.com/StanfordHPDS/upcoding/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/StanfordHPDS/upcoding/actions/workflows/R-CMD-check.yaml)
9+
910
<!-- badges: end -->
1011

1112
This R package enables users to:
@@ -18,8 +19,9 @@ This R package enables users to:
1819

1920
Overall, these functions help users to better and more reproducibly
2021
evaluate approaches for upcoding and/or undercoding analysis and
21-
monitoring relevant to Medicare. As one example, see \[add link to paper
22-
here when available\].
22+
monitoring relevant to Medicare. As one example, see
23+
[arxiv:2602.04092](https://arxiv.org/abs/2602.04092), which uses this
24+
package for simulations.
2325

2426
For more details about the background relevant to each of these, see
2527
“Brief background” below.

0 commit comments

Comments
 (0)