-
Notifications
You must be signed in to change notification settings - Fork 322
Open
Description
When sf['column'] is called, it spins off an SArray as a new column. As a result, it doesn't preserve caches. This causes unexpected behavior, as reported by a user on the forum:
In [7]: arr1 = array.array('d',[random.random() for item in range(4096)])
...
In [13]: sf = gl.SFrame({'data':[arr1 for item in range(10000)]})
In [14]: sa = sf['data']
In [15]: %timeit sa[1]
The slowest run took 6524.06 times longer than the fastest. This could mean that an intermediate result is being cached
1 loops, best of 3: 154 µs per loop
In [16]: %timeit sf['data'][1]
1 loops, best of 3: 902 ms per loop
(Note the stark differences in timing. ) The solution is to keep references to created sarrays when retrieving a column from an SFrame.