I use python to solve the problem, and there is a source code called donation-analysis.py in the source file.
My program will read records in incont.txt line by line. When the program read a new record, it will do:
- check whether all fields that we are interested in are valid
- if all fields are valid, store the record in two hash tables called
transaction_recordanddonor_list - check the number of record in
donor_listwith key valueNAME+ZIP_CODE - if there are more than one record in
donor_list[NAME + ZIP_CODE], the donor is a repeat donor, go to step 5, otherwise, go to step 6 - using
CMTE_ID,ZIP_CODEandyearas a key, call the function calledprintrecord()to output the requested calculations - repeat step 1-5 until read the end of file
I use two hash tables called transaction_record and donor_list to store records.
- In
transaction_record, if contributions have same recipient, zip code and calender year, they will be listed together. The key oftransaction_recordisCMTE_ID+ZIP_CODE+year, and the corresponding value are lists of ( or maybe only one list )[NAME,TRANSACTION_MT, month_date] - In
donor_list, all contributions from same donor will map to same key. The key isNAME + ZIP_CODE, and the value are lists (or maybe one list) of[CMTE_ID,TRANSACTION_MT,year,month_date]
People may wonder why we store same record two times. Although creating two hash tables which store same data may waste some space, the time complexity can reduce a lot when we find data we need and output the requested calculations.
All functions are in a class called donation_analysis(), what they have to do is:
load_percentile(): It load the percentile from percentile.txt and store it in self.percentilefind_percentile_ind(): It find the ordinal rank through the nearest-rank methodprocess_transaction_data():load records fromincont.txtand store them in two hash tables. It will calledprintrecord()when the new record is from a repeat donor.printrecord(): GivenCMTE_ID,ZIP_CODEandyearIt uses priority queue to sort the contribution from repeat donors by the transaction month and list all output data.isValid(): check whether the new record has correct format
There is only one program called donation-analysis.py in the source file. It doesn't require additional libraries, environments, or dependencies, Users can run the program through run.sh.