Hello! I have been trying to use CoordinateClener clean_coordinates to clean some data that contains about 4 million observations, but it seems it may take a very long time. I read in the reference pub for the package that you worked with 200k observations at a time for computational speed.
What would be the average time you estimate to clean 200k observations? I did a simple test in my computer with your sample produced dataset with 250, increased it to 2 500, 25 000 and the computation time jumped from 60, to 60 seconds to 1.8 hours respectively. So I wonder if its memory issues (16GB in my local computer) or hard-drive limitation.
system.time({
# Simulate example data
minages <- runif(250000, 0, 65)
exmpl <- data.frame(species = sample(letters, size = 250000, replace = TRUE),
decimalLongitude = runif(250000, min = 42, max = 51),
decimalLatitude = runif(250000, min = -26, max = -11),
min_ma = minages,
max_ma = minages + runif(250000, 0.1, 65),
dataset = "clean")
# Run record-level tests
rl <- clean_coordinates(x = exmpl)
})
I wonder how does computation time escalates with clean_coordinates and number of observations, and if specifying parameters changes (up or down) these times.
Thanks.
ps: I can provide additional info if required.
Hello! I have been trying to use CoordinateClener
clean_coordinatesto clean some data that contains about 4 million observations, but it seems it may take a very long time. I read in the reference pub for the package that you worked with 200k observations at a time for computational speed.What would be the average time you estimate to clean 200k observations? I did a simple test in my computer with your sample produced dataset with 250, increased it to 2 500, 25 000 and the computation time jumped from 60, to 60 seconds to 1.8 hours respectively. So I wonder if its memory issues (16GB in my local computer) or hard-drive limitation.
I wonder how does computation time escalates with clean_coordinates and number of observations, and if specifying parameters changes (up or down) these times.
Thanks.
ps: I can provide additional info if required.