Whilst the documentation for parBuffer suggests it is more suitable for lazy lists, the documentation for parList or other list strategies does not explicitly state that evaluating infinite lists directly with e.g. parList may change the termination property of a program.
Calling print f terminates for:
f :: Int
f = sum (takeWhile (< 100) (concat infinite_list_chunked))
where
infinite_list = [n + n | n <- [0, 1 ..]]
infinite_list_chunked = chunks 20 infinite_list
Because sum consumes 50 values from takeWhile, which consumes 50 values from the infinite list produced by concat.
However if:
concat infinite_list_chunked
is replaced with:
concat (infinite_list_chunked `using` parList rseq)
then the function does not terminate. I.e. print g does not terminate:
g :: Int
g = sum (takeWhile (< 100) (concat (infinite_list_chunked `using` parList rseq)))
where
infinite_list = [n + n | n <- [0, 1 ..]]
infinite_list_chunked = chunks 20 infinite_list
It might be surprising to a user that introducing a parList strategy to their code changes the termination property of their program. Presumably parList triggers an attempted full evaluation of the spine of the infinite list to preemptively create all sparks, which is where non-termination occurs because the spine is infinite. But introducing parList is breaking the compositionality of composing lazy functions over infinite lists.
For completeness, introducing parBuffer rather than parList does preserve the termination of the original sequential version. I.e.g print k terminates because a buffer is size 10 is repeatedly populated with list elements for consumption from takeWhile until n+n is >=100:
k :: Int
k = sum (takeWhile (< 100) (concat (infinite_list_chunked `using` parBuffer 10 rseq)))
where
infinite_list = [n + n | n <- [0, 1 ..]]
infinite_list_chunked = chunks 20 infinite_list
@simonmar 's book says:
In contrast to parList which eagerly creates a spark for every list element, parBuffer N creates sparks for only the first N elements of the list, and then creates more sparks as the result list is consumed.
However a user may not realise that "eagerly creates a spark for every list element" means that control flow will not be relinquished from parList until it has created all sparks (or not, for infinite lists).
If the non-terminating example above is changed so that the parList strategy is applied to the finite list produced by takeWhile rather than the infinite list consumed by concat, then it does terminate as the sequential one does. print m terminates:
m :: Int
m = sum (takeWhile (< 100) (concat infinite_list_chunked) `using` parList rseq)
where
infinite_list = [n + n | n <- [0, 1 ..]]
infinite_list_chunked = chunks 20 infinite_list
Two points about the documentation for the list strategies documented here:
https://hackage.haskell.org/package/parallel-3.2.2.0/docs/Control-Parallel-Strategies.html#g:7
-
Should the documentation explicitly state that the list strategies (except evalBuffer/parBuffer) should not be used to directly evaluate infinite lists?
-
Should the documentation explicitly state that if they are used with infinite lists, a function may not terminate even if the infinite list is demanded to just WHNF?
The chunks function:
chunks :: Int -> [a] -> [[a]]
chunks _ [] = []
chunks n xs =
let (ys, zs) = splitAt n xs
in ys : chunks n zs
Whilst the documentation for
parBuffersuggests it is more suitable for lazy lists, the documentation forparListor other list strategies does not explicitly state that evaluating infinite lists directly with e.g.parListmay change the termination property of a program.Calling
print fterminates for:Because
sumconsumes 50 values fromtakeWhile, which consumes 50 values from the infinite list produced byconcat.However if:
concat infinite_list_chunkedis replaced with:
then the function does not terminate. I.e.
print gdoes not terminate:It might be surprising to a user that introducing a
parListstrategy to their code changes the termination property of their program. PresumablyparListtriggers an attempted full evaluation of the spine of the infinite list to preemptively create all sparks, which is where non-termination occurs because the spine is infinite. But introducingparListis breaking the compositionality of composing lazy functions over infinite lists.For completeness, introducing
parBufferrather thanparListdoes preserve the termination of the original sequential version. I.e.gprint kterminates because a buffer is size 10 is repeatedly populated with list elements for consumption fromtakeWhileuntiln+nis>=100:@simonmar 's book says:
However a user may not realise that "eagerly creates a spark for every list element" means that control flow will not be relinquished from
parListuntil it has created all sparks (or not, for infinite lists).If the non-terminating example above is changed so that the
parListstrategy is applied to the finite list produced bytakeWhilerather than the infinite list consumed byconcat, then it does terminate as the sequential one does.print mterminates:Two points about the documentation for the list strategies documented here:
https://hackage.haskell.org/package/parallel-3.2.2.0/docs/Control-Parallel-Strategies.html#g:7
Should the documentation explicitly state that the list strategies (except
evalBuffer/parBuffer) should not be used to directly evaluate infinite lists?Should the documentation explicitly state that if they are used with infinite lists, a function may not terminate even if the infinite list is demanded to just WHNF?
The
chunksfunction: