Unfair comparison of timing in one of the examples

Dear Ben,

Nice examples, thanks for writing this!

The `030_timing.py` code is doing an unfair comparison of the speed of a slow python loop versus a compiled C-OpenCL code. In this case, the C-kernel will always be faster. A more fair comparison would be to compare the speed of a C-code that performs the sum vs the OpenCL version executed on the GPU. Does this make sense?