RFC: Clean up operators by simonster · Pull Request #351 · JuliaData/DataFrames.jl

simonster · 2013-08-19T02:24:35Z

There are two main goals here: to improve performance by allowing type inference to happen for most of these operations, and to reduce the amount of repetitive code. See #327 for more background.

Some notes:

I'm not a huge fan of the giant lists of operators at the top of the file, but I left them in since they're used in dataframe_blocks.jl. My main grievance is that they make it hard to tell what's being defined where without jumping around in the file. A secondary issue is that the operator categories that make sense in operators.jl don't necessarily make sense elsewhere. For example, ./ needs to be defined separately in operators.jl for type reasons, so it's not in array_arithmetic_operators, and at the moment this also means it's not handled in dataframe_blocks.jl.
Would it be reasonable to give all AbstractDataArrays isna and data methods that take indices? The isna method would return a Bool, and the data method would return dv[i] if dv[i] != NA and could return anything otherwise. This would permit efficient type inference without accessing fields directly, which would let me remove the special cases for DataArrays and speed up other AbstractDataArrays.
The pairwise and cumulative vector operators will give errors instead of returning NA when there are undefined values in the array underlying the DataArray, which can only happen for non-bits types. This is left over from the old implementation, and I'm not sure whether it's worth fixing, or how much effort it's worth putting into performance if I do.
I only made cosmetic changes to col* and row*, but these could use some more work. The API should probably change to be more Julian (Make API more Julian #159) and there are other considerations as well (see Clean up basic functions like mean and std #325 and Implement na_rm for math functions? #259).
I'm not too sure about the behavior of all. Whether we return NA or false depends on the order of the vector, i.e., all([false, NA]) == false whereas all([NA, false]) == NA. I haven't changed this, since I wanted the existing tests to pass, but is this really what we want?
I didn't touch the rle methods, since I'm not sure they're sufficiently commonly used to be worth optimizing.
I still need to go through the tests and make sure that the coverage is still somewhere close to full.

johnmyleswhite · 2013-09-19T21:15:29Z

Is this generally good to merge? I'd like to get something (even if it's a draft) merged soon, so that I can finish the work I've started splitting DataArrays out of DataFrames.

May need additional tests

simonster · 2013-09-19T21:25:35Z

It passes the current tests, but it might be worth adding some tests to operators.jl for PooledDataArray, since most operators now have separate code paths for DataArray and AbstractDataArray. I'll try to get to that today.

johnmyleswhite · 2013-09-19T21:29:34Z

That would be great. I'll give it a review now. Sorry for taking so long to look at this. Please ping me in the future if you think I'm neglecting something. Or just merge it if you're happy with it.

nfoti · 2013-09-19T21:30:44Z

I can move the code in PR #354 to the new repo when it's made.

johnmyleswhite · 2013-09-19T21:33:21Z

That would be great. And the same comments I made are relevant there: if I'm behind with a PR, please ping me. I've gotten to have more to do now than I can easily manage, so any reminders are really helpful.

johnmyleswhite · 2013-09-19T21:34:23Z

src/dataarray.jl

Can we revert this change? I've edited this on my own and now feel like similar should not initialize the na bit mask.

I can change this not to initialize the na mask, but I still think we only need a single similar function with all the arguments. The one- and two-argument versions of similar in Base will just call this one.

That's true. I'll make that change on my end.

simonster · 2013-09-20T01:10:49Z

Okay, I've added some additional tests and fixed two bugs they picked up, one preexisting and one new. I think this should be good to merge.

johnmyleswhite · 2013-09-20T14:48:09Z

Ok. Let's merge this. Then I'll start the split into DataFrames and DataArrays. After that, we can review this stuff again.

RFC: Clean up operators

simonster added 10 commits September 19, 2013 17:20

Clean up unary operators

b7906c6

Clean up matrix multiplication

4234ff8

May need additional tests

Only define the necessary similar operator, and make it accept types

0c769ec

Clean up comparison operators

35f7e12

Clean up arithmetic operators

f432b01

Clean up most remaining operators, and restore operator lists for now

4287817

Remove cov_spearman and colffts from tests

0e182b1

Move all macros to top

ddf1124

Make operator arrays const

0849900

Fix ambiguity warnings

9af8b19

johnmyleswhite reviewed Sep 19, 2013
View reviewed changes

simonster added 3 commits September 19, 2013 21:03

Fix f(::AbstractArray, ::AbstractDataArray) for binary operators

c7cd915

Fix pairwise operators when first element fo a DataVector is NA

d0488e1

Add tests for PooledDataArray and pairwise operators

3a6a042

johnmyleswhite added a commit that referenced this pull request Sep 20, 2013

Merge pull request #351 from simonster/operators

af27a63

RFC: Clean up operators

johnmyleswhite merged commit af27a63 into JuliaData:master Sep 20, 2013

simonster mentioned this pull request Sep 20, 2013

Clean up operators.jl? #327

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Clean up operators#351

RFC: Clean up operators#351
johnmyleswhite merged 13 commits intoJuliaData:masterfrom
simonster:operators

simonster commented Aug 19, 2013

Uh oh!

johnmyleswhite commented Sep 19, 2013

Uh oh!

simonster commented Sep 19, 2013

Uh oh!

johnmyleswhite commented Sep 19, 2013

Uh oh!

nfoti commented Sep 19, 2013

Uh oh!

johnmyleswhite commented Sep 19, 2013

Uh oh!

johnmyleswhite Sep 19, 2013

Uh oh!

simonster Sep 19, 2013

Uh oh!

johnmyleswhite Sep 19, 2013

Uh oh!

simonster commented Sep 20, 2013

Uh oh!

johnmyleswhite commented Sep 20, 2013

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

simonster commented Aug 19, 2013

Uh oh!

johnmyleswhite commented Sep 19, 2013

Uh oh!

simonster commented Sep 19, 2013

Uh oh!

johnmyleswhite commented Sep 19, 2013

Uh oh!

nfoti commented Sep 19, 2013

Uh oh!

johnmyleswhite commented Sep 19, 2013

Uh oh!

johnmyleswhite Sep 19, 2013

Choose a reason for hiding this comment

Uh oh!

simonster Sep 19, 2013

Choose a reason for hiding this comment

Uh oh!

johnmyleswhite Sep 19, 2013

Choose a reason for hiding this comment

Uh oh!

simonster commented Sep 20, 2013

Uh oh!

johnmyleswhite commented Sep 20, 2013

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants