-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
geotagging in R
The file is read in R in object 'data'
#this gives the levels of column 'day'
levels(data$day)
[1] "" "?" "di"
[4] "dinsdag" "donderdag" "donderdag "
[7] "donderdagavonden " "Elke vrijdag" "Elke donderdag"
[10] "elke vrijdag" "Elke vrijdag" "Elke zondag in de vijver"
[13] "maandag" "Vrij" "vrijdag"
[16] "Vrijdag" "vrijdag/zaterdag" "Vrijdagavonden "
[19] "woensdag" "woensdag/donderdag" "zatedag"
[22] "zaterdag" "zaterdag " "zondag"
note a lot of extra spaces, typeos and weird descriptions
I made a new_levels and applied these
new_levels <- c("", "", "dinsdag","dinsdag","donderdag","donderdag",
"donderdag","vrijdag","donderdag","vrijdag","vrijdag","zondag",
"maandag","vrijdag","vrijdag","vrijdag","vrijdag","vrijdag",
"woensdag","woensdag","zaterdag","zaterdag","zaterdag","zondag")
data$nday <- factor(data$day, levels=new_levels)
The next step is to redo the geotagging, I used geotag from library ggmap, this uses google geotagging. NOTE 2500 querys max!
the work-flow is:
- convert location to character
- scrub trailing whitespace
- transform all to lowercase
- add "utrecht" to all locations
- create unique list with adresses
- geotag these
- merge xy with original file
# load librarys
library(ggmap)
library(tidyr)
library(dplyr)
step 1
data$location_ch <- as.character(data$location)
step 2 to 4 in one go
data$location_adress <- data$location_ch %>% trimws() %>% tolower() %>% paste("utrecht",sep=",")
step 5, 6 and 7
xy_unique <- data$location_adress %>% unique() %>% geocode()
# this step takes a while : 957 unique locations
locations_unique <- data$location_adress %>% unique()
# combine unique locations with xy from geotagging
adress_xy_unique.df <- data.frame(adress=locations_unique,xy_unique)
# and merge this file back to data
#
data_new <- merge(data,adress_xy_unique.df, by.x=c("location_adress"), by.y=c("adress"),all.x=TRUE) %>% head()
lon.x and lat.x are the coordinates from the first attempt (BAG) lon.y,lat.y are the 'new' coordinates from geotag, below a comparison, we go from 1799 missing coordinates to 196. Good !
with(data_new,table(is.na(lon.x)))
#FALSE TRUE
# 2874 1799
# 1799 missing locations
with(data_new,table(is.na(lon.y)))
#FALSE TRUE
# 4477 196
# 196 missing locations
There are still some glitches, but w're (slowly) getting there
the results look good (screen dump map, (I am not a webmapper :-) )

the reworked file is in /data : events_utrecht_2011_2016_location26012017.csv
Metadata
Metadata
Assignees
Labels
No labels