In the assignment, I implemented a few core data structures and then compared their time complexities with the numbers produced by running microbenchmarks. The structures I built were a dynamic array, a stack, a ring buffer queue, and a hash set learned from PSADS: Ch. 2 Algorithm Analysis and PSADS: Ch. 3 Basic Data Structures. After writing each one, I measured how long the main operations took for different input sizes and compared the results to the behavior from class.
Dynamic arrays should actually give constant time for append, except when a resize happens. A resize is more expensive since it copies the elements into a new underlying list, although these events do not happen often enough to ruin the overall average. Popping from the end should also stay constant time.
Stacks are basically built on top of a Python list, so pushing and popping should behave the same as append and pop on the dynamic array. Both operations should fall under constant time.
For the queue, I used a ring buffer that was recommended on reddit. In normal cases, enqueue and dequeue should run in constant time because the head and tail pointers just move through the array. If the buffer fills up, I must allocate a new array that is basically double the previous size, so that one specific operation becomes linear for that moment. Just like the dynamic array, the average cost still stays constant.
Hash sets in Python are implemented using hashing with open addressing. Because of that, add, remove, and contains should all be constant time on average. Worst case behavior is not good, but Python's implementation tries hard to avoid those situations.
The benchmark results I made showed a consistent pattern across all the structures, which I am very proud of. When the input size went from 1000 to 10000 to 100000, the times basically scaled in a straight line, linear if you will. The numbers for the dynamic array were about ten times bigger each step, and the stack matched that pattern almost exactly. This is what we expect when each operation takes constant time, but is repeated many times inside a loop.
The queue as we see was slightly slower, which makes sense because the ring buffer logic has a little more bookkeeping (which means: the small routine steps a data structure needs to keep track of its internal state) than push and pop on a list. There is also the extra cost of resizing whenever the queue fills up. Even though that resize does not happen often, it is still visible in the timing.
HashSet was the fastest of all four, and was expected to be. This makes very good sense since Python's internal hashing functions are written in C, and the lookups are extremely optimized (I did find some that helps speed like pypy, Cython, and Numba for numerical code). The times grew at the same rate as the other structures, which matches the idea that each single operation is constant time.
There were no major surprises in the results. Almost all of the growth patterns lined up with what the theory predicts. The only extra variation looked like small bumps caused by the Python interpreter itself and system noise.
| Structure | Operation | Expected Complexity | Observed Trend |
|---|---|---|---|
| DynamicArray | append, pop | O(1) amortized | Linear growth across larger test sizes |
| Stack | push, pop | O(1) | Same trend as DynamicArray |
| RingBufferQueue | enqueue, dequeue | O(1) amortized | Slightly slower but same overall pattern |
| HashSet | add, contains | O(1) average | Fastest results, steady linear increase |