Excel import export #157

ifly6 · 2019-02-16T22:32:50Z

Implemented some import-export stuff from Excel. There is now a page where you can download dynamically generated excel files that accompany the rooms, judges, and teams which were imported, along with excel files containing team and speaker statistics so tournaments which don’t follow the standard MIT-TAB ranking can get the data they want.

In the upload data form, I’ve added text which tells people what the headers are, so they don’t have to go digging in the manual to find out.

The import system, unlike in the previous version, does not overwrite preexisting data (though the functionality has been kept by a keyword).

I've tried to keep changes outside this to a minimum, but adding it to the level that one can actually access it via the web interface means digging through the template files. The main changes are the import_x files, which I rewrote; the export_xls_files which I wrote; and the template changes.

I also changed one thing in the tab logic to unify one of the uncached Round calls, but that was done semi-accidentally.

The way the template change is structured is that it primarily just goes to a view unconnected with the other ones. I think that is easier for you to check. It could trivially be put back into the main files.

Implemented some import-export stuff from Excel. There is now a page where you can download dynamically generated excel files that accompany the rooms, judges, and teams which were imported, along with excel files containing team and speaker statistics so tournaments which don’t follow the standard MIT-TAB ranking can get the data they want. In the upload data form, I’ve added text which tells people what the headers are, so they don’t have to go digging in the manual to find out. The import system has also been reworked so that it overwrites preexisting data rather than simply skipping the entry. This means that changes applied in some excel document can be applied en-masse by importing the changed document.

wrote the df export commands, included pandas to requirements, updated header names to be consistent with csv general practices, swapped everything over to enumerate instead of java-esque arraylist style

over to next(i for i in haystack if i == needle) paradigm instead of explicit for-loops

functions

unclear on whether the query set thing actually works, this is probably a safer way to get what i want, presumably you get a list or other query set from that, and then you get first

it now cleans the scratch type, attempting on string match then on a direct integer match. it also now includes a lot more error checking on whether teams and judges exist. also now based on the column names rather than column order. note that read_excel depends on pandas being present. also, included a fallback if someone uploads a CSV instead of an excel file also -- - started normalisation of my sections of the code for future possible transition to python 3. shouldn't be bad - corrected import error from pandas, requiring xlrd >= 1.0.0, reflected this change in the requirements.txt

exception blocks

stuff doesn't exist anymore anyway

* phone field removed from debaters * magic number set up wrong

tested old system on testing db, got 141454 ms of runtime. new method uses 22032 ms runtime. introduced strut to hold data and perform one conversion now simplifies processing by holding results directly in memory and not in the strange caching system

- created private adjustment functions in tab_logic - rewrote the debater score function to stop re-calling data that it already has, and instead perform functions on objects in memory - minor optimisation in `deb_team` to select first rather than the first element of all objects new optimisations reduce debater scoring runtime to 13405 ms

ifly6 · 2019-02-17T09:38:11Z

There is a reporting error on commit with tag e7544f5. For unclear reasons (unfortunately I know nothing about Ajax), sometimes the debater rank method is called twice, which slows it down. That yielded an inflated runtime. The original commit message read:

tested old system on testing db, got 141454 ms of runtime. new method
uses 22032 ms runtime. introduced strut to hold data and perform one
conversion

It should have read, taking the retest value on face:

tested old system on testing db, got 49647 ms of runtime. new method
uses 22032 ms runtime. introduced strut to hold data and perform one
conversion

That value comes from below:

This error did not occur in reporting of the other figures, although variance in figures is to be expected, and on retest of the method introduced in e7544f5 runtime was reported like follows:

And with reports from 091a2fd, two tests were run, with times reported as follows:

I will also note that these numbers may be smaller than what other people get on their machines, as my computer has a fast SSD, greatly speeding IO times.

- introduced prefetch for round stats and for the rounds themselves to minimise on the number of database calls - optimised Bye and NoShow calls to use exists rather than Python duck- typing against empty lists - updated some of the debater speaks functions to use the new private internal adjustment functions - changed some uses of `range` to `xrange` since this is better in py2 new optimisations reduce debater scoring runtime to 10730 ms

unifying some of these speaker/rank calculations because they duplicate each other a lot

round calls are no longer made. shaved debater scoring runtime to 10128 ms

the number of repeated calls based on that struct introduced some basic prefetching to the `get_team_scores` call to help speed load times. changes reduced team rank load time from 40829 ms to 15539 ms. also fix for some bugs that were accidentally introduced to the team portion of the tab logic code when changing the debater side. my apologies.

around and screwed it up. this should fix that

reduce the number of calls by saving the speaks and ranks and then adjusting them after the fact speaks and rank calculations for both are centred around two private methods which create, flatten, and sort lists of speaks or ranks reduced team scoreing time to 11647 ms

the teams speaks and ranks calculations. since the list itself is now cached and sums are cheap, this isn't a problem. corrected oversight in that sums were not provided in some sections

also corrected a summing oversight in the debaters code which i no longer use

that they work one failed, due to oversight of actually calling the function

more than a second off the load time, at least on my computer: load time falls from 6454 ms to 5093 ms in doing so, also rewrote the num_opps and num_govs in tab logic to use gov team related keys, which can be prefetched, rather than filtering from all round objects.

BenMusch · 2019-03-25T00:16:23Z

Super late here because I just noticed the changes to this PR -- just as a heads up, you may have to resolve some conflicts due to python3 upgrade being merged.

Also, this PR size is pretty large, making it difficult for me to review thoroughly and be confident in approving it. Would you be willing to break it up into smaller tasks? I think a potentially good structure is:

Refactor current code to put all data import stuff in something like data_views.py (the export stuff can go here later)
Refactor current code to over-write existing data when duplicates exist
Add export data to libs + views
Ranking performance improvements

BenMusch · 2019-03-25T18:10:19Z

mittab/libs/structs.py

+        #  single_adjusted_ranks_deb(debater),
+        #  -double_adjusted_speaks_deb(debater),
+        #  double_adjusted_ranks_deb(debater))
+        self.speaker = s


Really love this approach, I'm thinking of doing a larger tab logic refactor FYI so I may include this there (and try to do some git-fu to make sure you're credited for the contribution)

BenMusch · 2019-03-26T02:52:58Z

mittab/apps/tab/debater_views.py

+    #                  tab_logic.tot_ranks_deb(s),
+    #                  tab_logic.deb_team(s)) for s in nov_speakers]
+
+    debater_scores = tab_logic.get_debater_scores()


Where is this function defined?

tab_logic.py

def get_debater_scores(): return [DebaterScores(d, debater_score(d)) for d in Debater.objects.prefetch_related('roundstats_set', 'roundstats_set__round', 'team_set', 'team_set__bye_set', 'team_set__noshow_set').all()] # prefetch roundstats to save on ORM time

BenMusch · 2019-03-26T16:56:50Z

mittab/libs/tab_logic.py

    try:
-        return d.team_set.all()[0]
-    except:
+        return d.team_set.first()


Interestingly, I'm finding that changing from all()[0] to first() actually makes things noticeably slower (adds about 1 second). Maybe it's because nobody is every really on more than 1 team, and first() adds some more sql logic that makes the query slower than getting all of them?

Analytically, I would imagine that wouldn't be the case, since the former should produce an SQL query with more runtime than the latter. But empirics don't lie. Thinking about how that could be the case, perhaps it has to do with caching or prefetch changes?

ifly6 · 2019-03-26T19:36:17Z

mittab/settings.py


 # SECURITY WARNING: don't run with debug turned on in production!
 DEBUG = os.environ.get('DEBUG')
+DEBUG = True


Discard this change

ifly6 and others added 24 commits January 23, 2018 23:19

changes to do with excel import

2872641

changes to do with excel import

3a84421

Merge branch 'master' into excel

7b3bc6f

Merge remote-tracking branch 'upstream/master' into excel

3b31ef4

Merge remote-tracking branch 'origin/master' into excel

9d50dc3

whoops, that needs to go

3f72b9e

rename template views to export views, which is a much better name

93707c2

fixed bug in export xls (can blame python's lack of typing, whoppdydo),

d27f5c3

wrote the df export commands, included pandas to requirements, updated header names to be consistent with csv general practices, swapped everything over to enumerate instead of java-esque arraylist style

missed one enumerate change

47e811d

whoops, missed another enumerate change

06e2d81

cleaning up

c01ed4d

cleaned up bug in vn status to string and seed to string. also swapped

cd44fa7

over to next(i for i in haystack if i == needle) paradigm instead of explicit for-loops

simplified stop iteration call using next(itr, default) paradigm

dfb4842

eliminated unnecessary enumeration calls from pandas df generation

738ca97

functions

fixing problematic debater -> team search

67d5a09

unclear on whether the query set thing actually works, this is probably a safer way to get what i want, presumably you get a list or other query set from that, and then you get first

fixed scratch import not checking for duplicates

1ee1621

fixed my forgetting to actually save the room on import

e869f3c

cleaned up code -- minimise number of try statements by use of multi

add7c2b

exception blocks

removed everything having to do with providers and phones, since that

f26fcc1

stuff doesn't exist anymore anyway

found and caught two bugs breaking the export functions:

550a6f1

* phone field removed from debaters * magic number set up wrong

kevinw26 and others added 5 commits February 17, 2019 05:18

extended last optimisations through more of the code, considering just

87c3995

unifying some of these speaker/rank calculations because they duplicate each other a lot

unified prefetch in debater scoring to the first call so that more

6bcdb09

round calls are no longer made. shaved debater scoring runtime to 10128 ms

fixed bug in the new novice selection code where i moved the statements

67635a4

around and screwed it up. this should fix that

ifly6 and others added 5 commits February 17, 2019 07:27

cleaned up some readibility stuff, removed some redundant caching for

86e6c06

the teams speaks and ranks calculations. since the list itself is now cached and sums are cheap, this isn't a problem. corrected oversight in that sums were not provided in some sections

expanded team prefetch

f6dd777

also corrected a summing oversight in the debaters code which i no longer use

did tests for all the ranks and speaks for teams and speakers to confirm

e817b89

that they work one failed, due to oversight of actually calling the function

BenMusch reviewed Mar 25, 2019

View reviewed changes

BenMusch reviewed Mar 26, 2019

View reviewed changes

ifly6 commented Mar 26, 2019

View reviewed changes

JoeyRubas force-pushed the master branch 2 times, most recently from b9653d1 to 29c52e5 Compare October 26, 2025 16:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Excel import export #157

Excel import export #157

Uh oh!

ifly6 commented Feb 16, 2019

Uh oh!

ifly6 commented Feb 17, 2019 •

edited

Loading

Uh oh!

BenMusch commented Mar 25, 2019

Uh oh!

BenMusch Mar 25, 2019

Uh oh!

ifly6 Mar 26, 2019 •

edited

Loading

Uh oh!

BenMusch Mar 26, 2019

Uh oh!

ifly6 Mar 26, 2019

Uh oh!

BenMusch Mar 26, 2019 •

edited

Loading

Uh oh!

ifly6 Mar 26, 2019

Uh oh!

ifly6 Mar 26, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Excel import export #157

Are you sure you want to change the base?

Excel import export #157

Uh oh!

Conversation

ifly6 commented Feb 16, 2019

Uh oh!

ifly6 commented Feb 17, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BenMusch commented Mar 25, 2019

Uh oh!

BenMusch Mar 25, 2019

Choose a reason for hiding this comment

Uh oh!

ifly6 Mar 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BenMusch Mar 26, 2019

Choose a reason for hiding this comment

Uh oh!

ifly6 Mar 26, 2019

Choose a reason for hiding this comment

Uh oh!

BenMusch Mar 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ifly6 Mar 26, 2019

Choose a reason for hiding this comment

Uh oh!

ifly6 Mar 26, 2019

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ifly6 commented Feb 17, 2019 •

edited

Loading

ifly6 Mar 26, 2019 •

edited

Loading

BenMusch Mar 26, 2019 •

edited

Loading