Skip to content

Conversation

@areevesman
Copy link

@dmullen17 I just added the script for the data tables!

Copy link
Member

@dmullen17 dmullen17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall really solid work!

If you read my comments you'll find that you can generalize alot of this code so you aren't repeating yourself. Take your time implementing these changes.

When you start making changes follow these steps so that you're able to commit (basically save) your work https://github.com/NCEAS/data-processing#making-changes-to-your-contribution

definitions <- read.csv('/home/reevesman/Ameriflux/attribute_function/definitions.csv',
stringsAsFactors = F)

data1 <- read.csv('/home/reevesman/Ameriflux/AMF_US-Ivo/AMF_US-Ivo_BASE_HH_2-1.csv',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than read all of these in one at a time, you can have your function do so for you.

it would read something like:

attribute_definitions <- function(data_path, definitions) {
    data <- read.csv(data_path, skip = 2, stringsAsFactors = FALSE)
    (the rest of your code)
}


data2 <- read.csv('/home/reevesman/Ameriflux/AMF_US-ICt/AMF_US-ICt_BASE_HH_2-1.csv',
skip = 2,
stringsAsFactors = F)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

R is weird in that it lets you abbreviate TRUE and FALSE as T/F. It's generally considered best practice to spell these out to increase readability.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you're making changes you can split up your commits. For instance in one commit you can just change all the T/Fs to TRUE/FALSE, and just make that commit something like "updated T and F syntax"


if (str_sub(att,-5,-1) == '_PI_F'){
x <- str_sub(att,-5,-1)
extra <- paste(definitions[which(definitions$uniqueAttributeLabel == '_PI'), 'uniqueAttributeDefinition'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is something i mentioned to sharis as well. At the beginning of your function you could have a line that sets new column names for your attributes. These column names in the attributes csv are pretty bad and make the R card harder to read.

colnames(attributes) <- c("category", "name", "defintion", "units", "SI_units")


##############################################################################

attribute_units <- function(data, definitions){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this function searches for qualifiers in variable names and deletes them. You could probably simplify this by using the gsub function and a regular expression. Take a look at what this code does and you should be able to simplify this quite a bit. | stands for "or" in R regular expressions.

names <- c("var_1", "var_F_2")
gsub("_1|_2|_F", "", names)

https://www.rstudio.com/wp-content/uploads/2016/09/RegExCheatsheet.pdf

QUALIFIERS_EXIST <- TRUE
}

else if (str_sub(att,-3,-1) %in% c('_PI','_QC','_IU','_SD')){
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you're looking for every possible iteration of qualifiers and treating each one as a unique case. While this is totally acceptable, it's a good idea to think about how you might scale this if there were too many to write out by hand.

You can reverse the %in% statement so you're looking for if a qualifier is in an attribute. Then you can run the rest of the commands that you have after an else-if statement to get the extra definition.

See if you can use this example to simplify your code. You should only need to do this twice, once for the integer qualifiers and once for the character ones. The logic here is a bit tricky so don't hesitate to ask me about it

att <- "CO2_F_1
int_qualifiers <- c('_1','_2','_3','_4','_5','_6','_7','_8','_9')
sapply(int_qualifiers, grepl, x = att)   # this applies grepl to each value in int_qualifiers, with the additional argument x = att

@areevesman
Copy link
Author

Thanks @dmullen17! I will definitely work on all of these. I really enjoyed this project and your feedback! Please let me know about any similar projects!

@dmullen17
Copy link
Member

@areevesman glad to hear it! If you want to focus on making these changes, I'll think of a similar task for you to work on by the time you're done.

@areevesman
Copy link
Author

Hey @dmullen17, I made all of the edits that I was planning to based on your comments! Thanks for reviewing for me! Jesse said that you've got an intense job interview today, good luck!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants