Skip to content

Conversation

@rahulg603
Copy link

Upstream code to generate the compressed flat file is R code that I still have locally, and need to clean up.

@rahulg603 rahulg603 requested a review from konradjk January 8, 2021 19:39
'must be present in the MatrixTable column fields.')


parser = argparse.ArgumentParser()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I always put this inside the if __name__ == '__main__' block. It probably doesn't matter, but its not really needed if you were to import from this script, so I think it makes sense there.



parser = argparse.ArgumentParser()
parser.add_argument('--input', type=str, help='Location of the table containing the data for merging. Expects this to contain data for a single year.')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type=str is the default. fine to leave it if you want, but you can also omit

entry_fields: list, field_names: list):
if not row_key in field_names:
raise ValueError('Row key must be in set of fields.')
if(len(row_other) == 0):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't need the outer parens here

row_other_l = row_other.split(',')
if not all([row in field_names for row in row_other_l]):
raise ValueError('All row fields must be in the columns of the read table.')
return row_key, list(np.setdiff1d(list(set(row_other_l)),[row_key]))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't modify row_key - is there a compelling reason to return it?


ht : Hail Table
Should contain columns that will be transformed into the new struct.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide a simple pictoral example of what this function does? Hard to envision from just the docstring

age_groups = list(set(age_groups))
age_groups_not_in_vals = np.setdiff1d(age_groups,age_vals)
if len(age_groups_not_in_vals) == 0:
mt_fc = mt_f.filter_cols(hl.literal(age_groups).contains(mt_f.age_name))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is mt_f?

args = parser.parse_args()

# Import
ht = hl.import_table(args.input, force=True, impute=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're making an assumption that this is gzipped, then you must have a particular file in mind. If so, I would suggest making an entry in resources/phenotypes.py like one of the other resources and read it from there rather than taking it as an input arg

mt = ht_m.to_matrix_table(row_key=[row_key], row_fields=row_other_fields,
col_key=[column_key], col_fields=column_other_fields,
n_partitions=100)
mt = mt.annotate_globals(year=args.year)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this for tracking? Is the date in the filename? If so, filename might be better than a user-specified year - if a user picks a new file they might not notice this argument and not update it

mt_f = mt.annotate_rows(**modifications)

# Output MatrixTable
mt_f.write(args.output, overwrite=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parameterize an args.overwrite (action='store_true')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants