-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathCompiledCode_READFIRST.Rmd
More file actions
334 lines (273 loc) · 12.4 KB
/
CompiledCode_READFIRST.Rmd
File metadata and controls
334 lines (273 loc) · 12.4 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
---
title: "Compiled Pull Code"
author: "Alex, Meghan, Jackie, Kari"
date: "November 28, 2018"
output: pdf_document
---
##Background
In general,it can be difficult to find a mental health care provider that meets your needs. There are many different designations of mental health care providers and sometimes, it can be difficult to find the right type of provider. The goal of this project was to gather all the health care providers in Rhode Island, and determine the number of providers by ZIP code, with an emphasis on mental health providers.
##The Data
We used the National Plan and Provider Enumeration System National Provider Identifier (NPI) database to find medical providers. We included all Rhode Island ZIP codes when iterating through the NPI database. We wrote a function to search through the database using all zipcodes in Rhode Island to create a table of all providers. From there, the program sorts through provider taxonomies to select providers that we designated as mental health providers.
##The graphs
The first graph shows. all zip codes and the number of providers in each one.
The second bar graph breaks down how many mental health care providers are in different ZIP codes, from 1 to 11 providers. From this, we learned that there are four ZIP codes with only 1 provider, 6 ZIP codes with two providers, and on the other side, there is only one ZIP code with 11 providers. Note that ZIP codes with 0 providers are not represented in any of these graphs.
The next two graphs show the number of any type of provider, not limited to mental health providers.
The scatter plot shows the number of providers in each RI ZIP code. We can see that most ZIP codes have few providers, with three outliers having more than 1,000 providers. We postulate that these ZIP codes are likely located in major cities.
The final graph shows the number of providers in the top six specialties. We can see that clinical social work is the most common specialty, followed by internal medicine and physical therapy.
```{r setup, include=FALSE}
options(scipen = 3, digits = 3)
knitr::opts_chunk$set(echo = FALSE)
knitr::opts_chunk$set(error = TRUE)
knitr::opts_chunk$set(warning=FALSE)
knitr::opts_chunk$set(message=FALSE)
knitr::opts_chunk$set(cache=T)
knitr::opts_chunk$set(collapse=T)
```
```{r}
library(rvest)
library(httr)
library(dplyr)
library(jsonlite)
library(XML)
library(stringr)
library(zipcode)
library(dplyr)
library(stringr)
library(ggplot2)
```
```{r}
#getting zipcode package
library(zipcode)
data(zipcode)
zip.ri <- filter(zipcode,state=="RI")$zip #restricting to RI zip codes
zip.ri <- as.vector(zip.ri)
zip.ri<-zip.ri
```
```{r}
#looping over zip codes and pages when there are more than 100 providers per zip code
#older function not neccesary
NPIcode<-function(zipcode){
url1<- "https://npiregistry.cms.hhs.gov/registry/search-results-table?addressType=ANY&postal_code=" #setting the url to scrape from
provider.data <- data.frame() #initializing an empty data frame
skips <- seq(0,999999999,100) #create skips
for (i in 1:length(zipcode)) { #iterating over all RI zip codes
for (j in 1:length(skips)){ #also iterating over skils
zip <- zipcode[i]
skip <- skips[j]
url<-paste0(url1,zip,"&skip=",skip) #pasting the url, with the rhode island zip code and including the skips
#text scrape to pull our places by zip code
h <- read_html(url, timeout = 200)
reps <- h %>% #setting up the repeating structure
html_node("table") %>%
html_table()
if (any(is.na(reps[,1])) | all(is.na(reps[,1]))) { #if the page has no physicians, move to next zip code
break}
reps$zip <- zipcode[i]
provider.data <- rbind(provider.data,reps) #binding together the provider data and reps data
}
}
colnames(provider.data) <- c("NPI","Name","NPI_Type","Primary_Practice_Address","Phone","Primary_Taxonomy","zipcode")
return(provider.data)
}
provider.data <- NPIcode(zip.ri)
```
```{r}
#old function, do not use
NPIcode_taxonomy<-function(zipcode,taxonomy){
url1<- "https://npiregistry.cms.hhs.gov/registry/search-results-table?addressType=ANY&postal_code=" #setting the url to scrape from
provider.data <- data.frame() #initializing an empty data frame
skips <- seq(0,9999999,100) #create skips
for (i in 1:length(zipcode)) { #iterating over all RI zip codes
for (j in 1:length(skips)){ #also iterating over skils
zip <- zipcode[i]
skip <- skips[j]
tax <- str_replace_all(taxonomy," ","+")
url<-paste0(url1,zip,"&skip=",skip,"&taxonomy_description=",tax) #pasting the url, with the rhode island zip code and including the skips
#text scrape to pull our places by zip code
h <- read_html(url, timeout = 200)
reps <- h %>% #setting up the repeating structure
html_node("table") %>%
html_table()
if (any(is.na(reps[,1])) | all(is.na(reps[,1]))) { #if the page has no physicians, move to next zip code
break}
reps$zip <- zipcode[i]
provider.data <- rbind(provider.data,reps) #binding together the provider data and reps data
}
}
colnames(provider.data) <- c("NPI","Name","NPI_Type","Primary_Practice_Address","Phone","Primary_Taxonomy","zipcode")
return(provider.data)
}
provider.data <- NPIcode_taxonomy(zip.ri,"Primary Care")
```
```{r}
providernames <- c("Psychologist Addiction (Substance Use Disorder)", "Counselor Mental Health", "Psychiatry & Neurology Psychiatry", "Social Worker Clinical", "Counselor Mental Health", "Counselor", "Psychologist Clinical", "Psychologist", "Marriage & Family", "Therapist", "Social Worker", "Clinic/Center Adult Mental Health") #creating a vector of types of provider taxonomies
filteredtable <- filter(provider.data,Primary_Taxonomy %in% providernames) #creating a filtered dataset by provider taxonomies
```
```{r}
countbyzip <- provider.data %>% #creating a count of practices by zip code
group_by(zipcode) %>%
count() %>%
arrange(n)
#countbyzip
#create a bar graph of providers by frequency
plot<-ggplot(countbyzip, aes(x=zipcode, y=n))+
geom_bar(stat="identity")
plot
#number of counties with 1 provider, 2 providers, etc
number_practices<-countbyzip %>%
select(n) %>% #selecting and grouping by frequency
group_by(n) %>%
count() %>%
arrange(n) #arranging by frequency
number_practices.low <- head(number_practices) #getting the first rows of the number_pratices and desingnating them as low numbers of practices
#add axis names
plot<-ggplot(number_practices.low, aes(x=factor(n), y=nn))+
geom_bar(stat="identity") + labs(x = "Number of providers", y = "Number of zip codes")
plot
```
```{r}
#Graphs of Provider Type
provider_grouping<-provider.data %>% #creating a count of practices by providers
group_by(Primary_Taxonomy) %>%
count() %>%
arrange(desc(n))
#provider_grouping
#general scatter of all providers types
provider_scatter<-ggplot(provider_grouping, aes(x=Primary_Taxonomy, y=n))+
geom_point()+
theme(axis.text.x=element_blank(),
axis.ticks.x=element_blank())+
labs(x = "Provider Types", y = "Number of providers")+
scale_color_brewer(palette="Set1")
provider_scatter
#create a bar graph of top 5 providers by frequency
top5providers<-head(provider_grouping)
providers_bar<-ggplot(top5providers, aes(x=Primary_Taxonomy, y=n))+
geom_bar(stat="identity")+
theme(axis.text.x = element_text(angle=90))+
labs(x = "Provider Type", y = "Number of Providers")
providers_bar
```
```{r}
#problem: no way to get county from these codes, maybe from this data? https://www.unitedstateszipcodes.org/zip-code-database/
county_collect_NPIcode_taxonomy<-function(county,taxonomy){
url1<- "https://npiregistry.cms.hhs.gov/registry/search-results-table?addressType=ANY&postal_code=" #setting the url to scrape from
provider.data <- data.frame() #initializing an empty data frame
skips <- seq(0,9999999,100) #create skips
for (i in 1:length(county)) { #iterating over all counties
for (j in 1:length(skips)){ #also iterating over skils
county_index <- county[i]
skip <- skips[j]
tax <- str_replace_all(taxonomy," ","+")
url<-paste0(url1,count_index,"&skip=",skip,"&taxonomy_description=",tax) #pasting the url, with the county and including the skips
#text scrape to pull our places by county
h <- read_html(url, timeout = 200)
reps <- h %>% #setting up the repeating structure
html_node("table") %>%
html_table()
if (any(is.na(reps[,1])) | all(is.na(reps[,1]))) { #if the page has no physicians, move to next county
break}
reps$county <- county[i]
provider.data <- rbind(provider.data,reps) #binding together the provider data and reps data
}
}
colnames(provider.data) <- c("NPI","Name","NPI_Type","Primary_Practice_Address","Phone","Primary_Taxonomy","zipcode")
return(provider.data)
}
provider.data<-NPIcode_taxonomy("02906", "Mental Health")
provider.data
```
```{r}
#I made this function that will take the length of the output of the taxonomy by county (I just called it county_collect_NPIcode_taxonomy because it doesn't exist yet) and divides it by the population of the county pulled from census .csv data divided by 1000
providers_per_1000_county <- function(taxonomy, county, state) {
numberprovidersper1000county <- length(county_collect_NPIcode_taxonomy(taxonomy, county))
population <- c()
for (i in length(census)) {
if (census$STNAME == state) {
if (census$CTYNAME == county) {
population <- census$CENSUS2010POP / 1000 {
}
}
}
ratio <- numberprovidersper1000county / population
return(ratio)
}
#other stuff we want to do:
#1) make visualizations for each state
#2) find a way to combine multiple states for a query (i.e. look at #providers/every 1000 people for a few different statess at once)
#3) do this at county level
```
```{r}
plot<-ggplot(number_practices.low, aes(x=factor(n), y=nn))+
geom_bar(stat="identity") + labs(x = "Number of providers", y = "Number of zip codes")
print(plot + ggtitle("The number of providers by zipcode"))
```
```{r}
library(rvest)
library(httr)
library(dplyr)
library(jsonlite)
library(XML)
library(stringr)
library(zipcode)
library(dplyr)
library(stringr)
library(ggplot2)
library(stringi)
library(roxygen2)
library(testthat)
#1. All zipcodes from the state
ZipsFromState<-function(state_name){
zip_holder<-zipcode%>%
filter(state==state_name)%>%
select(zip)
zip_state<-zip_holder[,1]
return(zip_state)
}
#2. Uses zipcodes to pull NPI providers
ProvidersInStateByCounty<-function(state,taxonomy){
zips_used <- ZipsFromState(state)
url1<- "https://npiregistry.cms.hhs.gov/registry/search-results-table?addressType=ANY&postal_code=" #setting the url to scrape from
provider.data <- data.frame() #initializing an empty data frame
skips <- seq(0,9999999,100) #create skips
for (i in 1:length(zips_used)) { #iterating over all RI zip codes
for (j in 1:length(skips)){ #also iterating over skils
zip <- zips_used[i]
skip <- skips[j]
tax <- str_replace_all(taxonomy," ","+")
url<-paste0(url1,zip,"&skip=",skip,"&taxonomy_description=",tax) #pasting the url, with the rhode island zip code and including the skips
#text scrape to pull our places by zip code
h <- read_html(url, timeout = 200)
reps <- h %>% #setting up the repeating structure
html_node("table") %>%
html_table()
if (any(is.na(reps[,1])) | all(is.na(reps[,1]))) { #if the page has no physicians, move to next zip code
break}
reps$zip <- zips_used[i]
reps$state_abbrev <- state
provider.data <- rbind(provider.data,reps) #binding together the provider data and reps data
}
}
colnames(provider.data) <- c("NPI","Name","NPI_Type","Primary_Practice_Address","Phone","Primary_Taxonomy","zipcode")
zip_link<-read.csv("zcta_county_rel_10.txt") %>%
select(ZCTA5, STATE, COUNTY, GEOID) %>%
rename(zipcode = ZCTA5) %>%
mutate(zipcode = as.character(zipcode))
zip_link$zipcode = stri_pad_left(zip_link$zipcode, 5, "0")
NPI_join<-inner_join(provider.data, zip_link, by="zipcode")
census<-read.csv("co-est2017-alldata.csv")
NPI_to_census<-inner_join(NPI_join, census, by=c("STATE", "COUNTY"))
#getting input into same format as STNAME
state_abbrev <- read.csv("state_abbrev.txt")
state_abbrev$STNAME <- state_abbrev$State
NPI_to_census_abbrev <- left_join(NPI_to_census,state_abbrev,by="STNAME")
#5. Return the summary measure
rows<-NPI_to_census_abbrev %>%
filter(Abbreviation == state) %>%
group_by(STNAME, CTYNAME, POPESTIMATE2010) %>%
count() %>%
arrange(n)
return(rows)
}
provider.data<-ProvidersInStateByCounty("RI", "Mental Health")
```