@@ -143,12 +143,32 @@ and run on shared-memory parallel machines like general-purpose multicores.
143143` libpluto.{so,a} ` is also built and can be found in ` src/.libs/ ` . ` make install `
144144will install it.
145145
146+ ## Using Pluto
147+
148+ - Use ` #pragma scop ` and '#pragma endscop' around the section of code
149+ you want to parallelize/optimize.
150+
151+ - Then, run:
152+
153+ ./polycc <C source file > [ --pet]
154+
155+ The output file will be named <original prefix >.pluto.c unless '-o
156+ <filename >" is supplied. When --debug is used, the .cloog used to
157+ generate code is not deleted and is named similarly. The pet frontend
158+ ` --pet ` is needed to process many of the test cases/examples.
159+
160+ Please refer to the documentation of Clan or PET for information on the
161+ kind of code around which one can put ` #pragma scop ` and `#pragma
162+ endscop`. Most of the time, although your program may not satisfy the
163+ constraints, it may be possible to work around them.
164+
165+
146166## Trying a new example
147167
148168- Use ` #pragma scop ` and ` #pragma endscop ` around the section of code
149169 you want to parallelize/optimize.
150170
151- - Then, just run ` ./polycc <C source file> ` .
171+ - Then, just run ` ./polycc <C source file> --pet ` .
152172
153173 The transformation is also printed out, and ` test.par.c ` will have the
154174 parallelized code. If you want to see intermediate files, like the
@@ -177,25 +197,6 @@ where target can be orig, orig_par, opt, tiled, par, pipepar, etc. (see
177197- ` make check-pluto ` to test for correctness, ` make perf ` to compare
178198performance.
179199
180-
181- ## Using Pluto
182-
183- - Use ` #pragma scop ` and '#pragma endscop' around the section of code
184- you want to parallelize/optimize.
185-
186- - Then, run
187-
188- ./polycc <C source file > --parallel --tile
189-
190- The output file will be named <original prefix >.pluto.c unless '-o
191- <filename >" is supplied. When --debug is used, the .cloog used to
192- generate code is not deleted and is named similarly.
193-
194- Please refer to the documentation of Clan or PET for information on the
195- kind of code around which one can put ` #pragma scop ` and `#pragma
196- endscop`. Most of the time, although your program may not satisfy the
197- constraints, it may be possible to work around them.
198-
199200## Command-line options
200201
201202``` shell
@@ -232,19 +233,50 @@ loops in that order. For eg., for heat-3d, you'll see this output when
232233you run Pluto
233234
234235``` shell
235- ../../polycc 3d7pt.c
236-
237- [...]
238-
236+ # With default tile sizes.
237+ ../../polycc test/3d7pt.c --pet
238+
239+ [pluto] compute_deps (isl)
240+ [pluto] Number of statements: 1
241+ [pluto] Total number of loops: 4
242+ [pluto] Number of deps: 15
243+ [pluto] Maximum domain dimensionality: 4
244+ [pluto] Number of parameters: 0
245+ [pluto] Concurrent start hyperplanes found
239246[pluto] Affine transformations [< iter coeff' s> <param> <const>]
240247
241- T(S1): (t, t+i, t+j, t+k)
248+ T(S1): (t-i , t+i, t+j, t+k)
242249loop types (loop, loop, loop, loop)
243250
244- [...]
251+ [Pluto] After tiling:
252+ T(S1): ((t-i)/32, (t+i)/32, (t+j)/32, (t+k)/32, t-i, t+i, t+j, t+k)
253+ loop types (loop, loop, loop, loop, loop, loop, loop, loop)
254+
255+ [Pluto] After intra_tile reschedule
256+ T(S1): ((t-i)/32, (t+i)/32, (t+j)/32, (t+k)/32, t, t+i, t+j, t+k)
257+ loop types (loop, loop, loop, loop, loop, loop, loop, loop)
258+
259+ [Pluto] After tile scheduling:
260+ T(S1): ((t-i)/32+(t+i)/32, (t+i)/32, (t+j)/32, (t+k)/32, t, t+i, t+j, t+k)
261+ loop types (loop, loop, loop, loop, loop, loop, loop, loop)
262+
263+ [pluto] using statement-wise -fs/-ls options: S1(5,8),
264+ [pluto-unroll-jam] No unroll jam loop candidates found
265+ [Pluto] Output written to 3d7pt.pluto.c
266+
267+ [pluto] Timing statistics
268+ [pluto] SCoP extraction + dependence analysis time: 0.087957s
269+ [pluto] Auto-transformation time: 0.011928s
270+ [pluto] Tile size selection time: 0.000000s
271+ [pluto] Total constraint solving time (LP/MIP/ILP) time: 0.002028s
272+ [pluto] Code generation time: 0.049415s
273+ [pluto] Other/Misc time: 0.310162s
274+ [pluto] Total time: 0.459462s
275+ [pluto] All times: 0.087957 0.011928 0.049415 0.310162
245276```
246277
247- Hence, the tile sizes specified correspond to t, t+i, t+j, and t+k.
278+ The tile sizes specified correspond to t, t+i, t+j, and t+k. Notice the
279+ multi-dimensional affine transformation function before tiling and after tiling.
248280
249281
250282### Setting good tile sizes
0 commit comments