escalation of computation times with more than 250k observations

Hello! I have been trying to use CoordinateClener `clean_coordinates` to clean some data that contains about 4 million observations, but it seems it may take a very long time. I read in the reference pub for the package that you worked with 200k observations at a time for computational speed.

What would be the average time you estimate to clean 200k observations? I did a simple test in my computer with your sample produced dataset with 250, increased it to 2 500, 25 000 and  the computation time jumped from 60, to 60 seconds to 1.8 hours respectively. So I wonder if its memory issues (16GB in my local computer) or hard-drive limitation.

```

system.time({
  # Simulate example data
  minages <- runif(250000, 0, 65)
  exmpl <- data.frame(species = sample(letters, size = 250000, replace = TRUE),
                      decimalLongitude = runif(250000, min = 42, max = 51),
                      decimalLatitude = runif(250000, min = -26, max = -11),
                      min_ma = minages,
                      max_ma = minages + runif(250000, 0.1, 65),
                      dataset = "clean")
  
  # Run record-level tests
  rl <- clean_coordinates(x = exmpl)
  
  
})
```

I wonder how does computation time escalates with clean_coordinates and number of observations, and if specifying parameters changes (up or down) these times.

Thanks. 
ps: I can provide additional info if required. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

escalation of computation times with more than 250k observations #102

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

escalation of computation times with more than 250k observations #102

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions