Skip to content

Missed optimization for SFrame column indexing #89

@hoytak

Description

@hoytak

When sf['column'] is called, it spins off an SArray as a new column. As a result, it doesn't preserve caches. This causes unexpected behavior, as reported by a user on the forum:

In [7]: arr1 = array.array('d',[random.random() for item in range(4096)])
...
In [13]: sf = gl.SFrame({'data':[arr1 for item in range(10000)]})

In [14]: sa = sf['data']

In [15]: %timeit sa[1]
The slowest run took 6524.06 times longer than the fastest. This could mean that an intermediate result is being cached 
1 loops, best of 3: 154 µs per loop

In [16]: %timeit sf['data'][1]
1 loops, best of 3: 902 ms per loop

(Note the stark differences in timing. ) The solution is to keep references to created sarrays when retrieving a column from an SFrame.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions