Leaving this one open because it's going to take a while before it's worth resolving...
At the moment, reading a medium sized excel sheet (~100 rows, 40 columns) takes about 1s. That's obviously a bit slow and it's going to be nasty to do that over the whole corpus.
Once we work out exactly what we're after, it might be worth dumping the parsing parts into C/C++, using either libxml2 directly, or using "rapidxml" (like readxl does).
Leaving this one open because it's going to take a while before it's worth resolving...
At the moment, reading a medium sized excel sheet (~100 rows, 40 columns) takes about 1s. That's obviously a bit slow and it's going to be nasty to do that over the whole corpus.
Once we work out exactly what we're after, it might be worth dumping the parsing parts into C/C++, using either libxml2 directly, or using "rapidxml" (like readxl does).