RFC: Clean up operators#351
Conversation
|
Is this generally good to merge? I'd like to get something (even if it's a draft) merged soon, so that I can finish the work I've started splitting DataArrays out of DataFrames. |
May need additional tests
|
It passes the current tests, but it might be worth adding some tests to |
|
That would be great. I'll give it a review now. Sorry for taking so long to look at this. Please ping me in the future if you think I'm neglecting something. Or just merge it if you're happy with it. |
|
I can move the code in PR #354 to the new repo when it's made. |
|
That would be great. And the same comments I made are relevant there: if I'm behind with a PR, please ping me. I've gotten to have more to do now than I can easily manage, so any reminders are really helpful. |
There was a problem hiding this comment.
Can we revert this change? I've edited this on my own and now feel like similar should not initialize the na bit mask.
There was a problem hiding this comment.
I can change this not to initialize the na mask, but I still think we only need a single similar function with all the arguments. The one- and two-argument versions of similar in Base will just call this one.
There was a problem hiding this comment.
That's true. I'll make that change on my end.
|
Okay, I've added some additional tests and fixed two bugs they picked up, one preexisting and one new. I think this should be good to merge. |
|
Ok. Let's merge this. Then I'll start the split into DataFrames and DataArrays. After that, we can review this stuff again. |
RFC: Clean up operators
There are two main goals here: to improve performance by allowing type inference to happen for most of these operations, and to reduce the amount of repetitive code. See #327 for more background.
Some notes:
dataframe_blocks.jl. My main grievance is that they make it hard to tell what's being defined where without jumping around in the file. A secondary issue is that the operator categories that make sense inoperators.jldon't necessarily make sense elsewhere. For example,./needs to be defined separately inoperators.jlfor type reasons, so it's not inarray_arithmetic_operators, and at the moment this also means it's not handled indataframe_blocks.jl.isnaanddatamethods that take indices? Theisnamethod would return aBool, and thedatamethod would returndv[i]ifdv[i] != NAand could return anything otherwise. This would permit efficient type inference without accessing fields directly, which would let me remove the special cases for DataArrays and speed up other AbstractDataArrays.col*androw*, but these could use some more work. The API should probably change to be more Julian (Make API more Julian #159) and there are other considerations as well (see Clean up basic functions like mean and std #325 and Implementna_rmfor math functions? #259).all. Whether we returnNAorfalsedepends on the order of the vector, i.e.,all([false, NA]) == falsewhereasall([NA, false]) == NA. I haven't changed this, since I wanted the existing tests to pass, but is this really what we want?rlemethods, since I'm not sure they're sufficiently commonly used to be worth optimizing.