-
Notifications
You must be signed in to change notification settings - Fork 14
added script to generate data tables for the ameriflux data #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
dmullen17
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall really solid work!
If you read my comments you'll find that you can generalize alot of this code so you aren't repeating yourself. Take your time implementing these changes.
When you start making changes follow these steps so that you're able to commit (basically save) your work https://github.com/NCEAS/data-processing#making-changes-to-your-contribution
R/Ameriflux_data_tables.R
Outdated
| definitions <- read.csv('/home/reevesman/Ameriflux/attribute_function/definitions.csv', | ||
| stringsAsFactors = F) | ||
|
|
||
| data1 <- read.csv('/home/reevesman/Ameriflux/AMF_US-Ivo/AMF_US-Ivo_BASE_HH_2-1.csv', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than read all of these in one at a time, you can have your function do so for you.
it would read something like:
attribute_definitions <- function(data_path, definitions) {
data <- read.csv(data_path, skip = 2, stringsAsFactors = FALSE)
(the rest of your code)
}
R/Ameriflux_data_tables.R
Outdated
|
|
||
| data2 <- read.csv('/home/reevesman/Ameriflux/AMF_US-ICt/AMF_US-ICt_BASE_HH_2-1.csv', | ||
| skip = 2, | ||
| stringsAsFactors = F) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
R is weird in that it lets you abbreviate TRUE and FALSE as T/F. It's generally considered best practice to spell these out to increase readability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you're making changes you can split up your commits. For instance in one commit you can just change all the T/Fs to TRUE/FALSE, and just make that commit something like "updated T and F syntax"
R/Ameriflux_data_tables.R
Outdated
|
|
||
| if (str_sub(att,-5,-1) == '_PI_F'){ | ||
| x <- str_sub(att,-5,-1) | ||
| extra <- paste(definitions[which(definitions$uniqueAttributeLabel == '_PI'), 'uniqueAttributeDefinition'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is something i mentioned to sharis as well. At the beginning of your function you could have a line that sets new column names for your attributes. These column names in the attributes csv are pretty bad and make the R card harder to read.
colnames(attributes) <- c("category", "name", "defintion", "units", "SI_units")
R/Ameriflux_data_tables.R
Outdated
|
|
||
| ############################################################################## | ||
|
|
||
| attribute_units <- function(data, definitions){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this function searches for qualifiers in variable names and deletes them. You could probably simplify this by using the gsub function and a regular expression. Take a look at what this code does and you should be able to simplify this quite a bit. | stands for "or" in R regular expressions.
names <- c("var_1", "var_F_2")
gsub("_1|_2|_F", "", names)
https://www.rstudio.com/wp-content/uploads/2016/09/RegExCheatsheet.pdf
R/Ameriflux_data_tables.R
Outdated
| QUALIFIERS_EXIST <- TRUE | ||
| } | ||
|
|
||
| else if (str_sub(att,-3,-1) %in% c('_PI','_QC','_IU','_SD')){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like you're looking for every possible iteration of qualifiers and treating each one as a unique case. While this is totally acceptable, it's a good idea to think about how you might scale this if there were too many to write out by hand.
You can reverse the %in% statement so you're looking for if a qualifier is in an attribute. Then you can run the rest of the commands that you have after an else-if statement to get the extra definition.
See if you can use this example to simplify your code. You should only need to do this twice, once for the integer qualifiers and once for the character ones. The logic here is a bit tricky so don't hesitate to ask me about it
att <- "CO2_F_1
int_qualifiers <- c('_1','_2','_3','_4','_5','_6','_7','_8','_9')
sapply(int_qualifiers, grepl, x = att) # this applies grepl to each value in int_qualifiers, with the additional argument x = att
|
Thanks @dmullen17! I will definitely work on all of these. I really enjoyed this project and your feedback! Please let me know about any similar projects! |
|
@areevesman glad to hear it! If you want to focus on making these changes, I'll think of a similar task for you to work on by the time you're done. |
|
Hey @dmullen17, I made all of the edits that I was planning to based on your comments! Thanks for reviewing for me! Jesse said that you've got an intense job interview today, good luck! |
@dmullen17 I just added the script for the data tables!