only abbreviate siteid if numeric and over a billion#3428
only abbreviate siteid if numeric and over a billion#3428infotroph merged 3 commits intoPecanProject:developfrom
Conversation
|
Regarding
While working through #3423 I found that digest::digest(geom) is a reproducible and effective way of generating a unique (character) id for each site. A solution for a non-sf dataframe would be to pass paste("lat,lon") to digest, or coerce to sf pts object and digest. |
maybe create an issue, tag it 'good first issue'? Might also be worth replacing all instances with a helper function like |
dlebauer
left a comment
There was a problem hiding this comment.
Looks good, thanks for doing this. 👍🏻
Description
Working on support for arbitrary string identifiers by changing places that abbreviate siteIDs.
Previously these places assumed the ID was coercible to numeric with billions place = server number, then dropped zeroes from the middle:
33->"0-33",1000000005->"1-5","3000001875"->"3-1875","foo"->"NA-NA"from some fns and error from others.Now they check whether the coercion succeeds and gives a value greater than 1e+09, and treat the siteID as a string otherwise.
Note that there are a few places with patterns like "siteid %/% 1e9" that I didn't change here:
shiny/and 3 ininst/folders, which look like they're not used often enough to bother updating right now./modules/data.remote/R/remote_process.R, which is heavily DB-dependent in other ways and it's probably reasonable for now to keep assuming all the IDs it handles come from BETYsite.infoif not present #3324Motivation and Context
As we move away from requiring BETY connections, siteIDs will keep being useful as unique identifiers but need not be constrained to be numeric, and probably will be smaller than 1e9 / the billions place won't have any special significance if they're larger than that. For the initial CCMMF workflows, I've been using site names as IDs and finding they mostly Just Work. Of the changes here, I only needed the ones in
pool_ic_list2netcdftoday, but decided to tackle the others I saw that used the same assumption.This does add a bit of complexity because the ID might be passed as actual numeric or as character containing digits (as read from XML).
One obvious alternate design would be to stop abbreviating at all (or move it to a step further upstream) and have all these functions use the ID exactly as passed, coercing to character if it isn't already. I considered this but thought that for backward compatibility it was worth keeping the existing behavior when running with BETY ids.
Note also that in #3324 we discussed what to do if passed a lat-lon with no siteID, and one design we considered was "generate a siteID by pasting lat and lon together". If we proceed with that design, we may want to consider potential confusion between "1-35" meaning siteID 1000000035 vs meaning a site at 1 degree north and 35 degrees west.
Review Time Estimate
Types of changes
Checklist: