Skip to content

Workflow in R for geotagging #2

@GreenStat

Description

@GreenStat

geotagging in R

The file is read in R in object 'data'

#this gives the levels of column 'day'
levels(data$day)

 [1] ""                         "?"                        "di"                      
 [4] "dinsdag"                  "donderdag"                "donderdag "              
 [7] "donderdagavonden "        "Elke  vrijdag"            "Elke donderdag"          
[10] "elke vrijdag"             "Elke vrijdag"             "Elke zondag in de vijver"
[13] "maandag"                  "Vrij"                     "vrijdag"                 
[16] "Vrijdag"                  "vrijdag/zaterdag"         "Vrijdagavonden "         
[19] "woensdag"                 "woensdag/donderdag"       "zatedag"                 
[22] "zaterdag"                 "zaterdag "                "zondag" 

note a lot of extra spaces, typeos and weird descriptions

I made a new_levels and applied these

new_levels <-  c("", "", "dinsdag","dinsdag","donderdag","donderdag",
"donderdag","vrijdag","donderdag","vrijdag","vrijdag","zondag",
"maandag","vrijdag","vrijdag","vrijdag","vrijdag","vrijdag",
"woensdag","woensdag","zaterdag","zaterdag","zaterdag","zondag")

data$nday <- factor(data$day, levels=new_levels)

The next step is to redo the geotagging, I used geotag from library ggmap, this uses google geotagging. NOTE 2500 querys max!

the work-flow is:

  1. convert location to character
  2. scrub trailing whitespace
  3. transform all to lowercase
  4. add "utrecht" to all locations
  5. create unique list with adresses
  6. geotag these
  7. merge xy with original file
# load librarys
library(ggmap)
library(tidyr)
library(dplyr)

step 1

data$location_ch <- as.character(data$location)

step 2 to 4 in one go

data$location_adress <- data$location_ch %>% trimws() %>% tolower() %>% paste("utrecht",sep=",") 

step 5, 6 and 7

xy_unique        <- data$location_adress %>% unique()  %>% geocode()
# this step takes a while : 957 unique locations

locations_unique <- data$location_adress %>% unique()

# combine unique locations with xy from geotagging
adress_xy_unique.df <- data.frame(adress=locations_unique,xy_unique)

# and merge this file back to data
# 
data_new <- merge(data,adress_xy_unique.df, by.x=c("location_adress"), by.y=c("adress"),all.x=TRUE) %>% head()

lon.x and lat.x are the coordinates from the first attempt (BAG) lon.y,lat.y are the 'new' coordinates from geotag, below a comparison, we go from 1799 missing coordinates to 196. Good !

with(data_new,table(is.na(lon.x)))
#FALSE  TRUE 
# 2874  1799 
# 1799 missing locations

with(data_new,table(is.na(lon.y)))
#FALSE  TRUE 
# 4477   196 
# 196 missing locations

There are still some glitches, but w're (slowly) getting there

the results look good (screen dump map, (I am not a webmapper :-) )
plot

the reworked file is in /data : events_utrecht_2011_2016_location26012017.csv

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions