Content pipeline #132

bhagany · 2017-01-08T03:37:20Z

Credit to @MartyGentillon for the ideas that led to this PR (see the conversation on #109). This is the big change most of my past PR's have been leading up to - it meets the following goals:

support existing usage patterns
allow more possible compositions of perun tasks
allow 3rd party tasks to be interposed with perun tasks more easily
resolve weirdness about the :content key
abstract content parsing similarly to the rendering abstraction (aka render-pre-wrap)

The central conceptual change here is completing the transition from a more metadata-based approach to a more file-based approach. The first major step here was to store file metadata on boot's TmpFiles, rather than in fileset metadata. This allows us to move or delete files without having to maintain their metadata separately. The second (and last) step, is to do away with the :content metadata key in favor of writing and manipulating TmpFiles, and using the content of the files, rather than maintaining that information in metadata. This allows any task to manipulate perun's final output, without that task having to be actually aware of perun and its metadata conventions.

For example, I know some here are fans of Tachyons (http://tachyons.io/). With this change, it would be possible to write a general tachyons task that manipulates file content and can be used with or without perun. To explain with code,

(deftask build
  (comp (p/markdown)
        (tachyons) ;; <-- unaware of perun
        (p/render :renderer 'your.ns/render)))

Before this change, the tachyons task would need to know about and manipulate perun's :content key in order to get what you wanted in perun/render, but now it can just change the file itself and ignore perun.

In addition to being more composable with 3rd party tasks, the change from metadata content to file content also allows perun's task to be composed with each other in ways that weren't possible before. Here's a real-life example from the build.boot for my blog*:

(deftask urls
  "Helper task for encapsulating all url generation stuff"
  [_ filterer FILTERER code "predicate to use for selecting entries (default: `:has-content`)"]
  (let [filterer (or filterer :has-content)]
    (comp (p/slug :slug-fn slugify-filename :filterer filterer)
          (p/permalink :permalink-fn permalinkify :filterer filterer)
          (p/canonical-url :filterer filterer))))

(deftask build
  "Build nicerthantriton.com"
  [t tier TIER kw "The tier we're building for"]
  (comp (p/global-metadata)
        (set-tier :val tier)
        (p/markdown :options {:extensions {:smarts true}})  ;; writes an .html file per .md file
        (urls)
        (p/collection :renderer 'nicerthantriton.core/index ;; produces an index page based on the output from `markdown`
                      :filterer post?
                      :meta {:derived true})
        (p/assortment :renderer 'nicerthantriton.core/topic ;; not in master yet, you can ignore it
                      :filterer post?
                      :grouper tagify
                      :meta {:derived true})
        (urls :filterer :derived)
        (recent-posts)                                      ;; These two set some global metadata based on .html files
        (topics)                                            ;; that exist at this point in the pipeline - you can ignore
        (p/render :renderer 'nicerthantriton.core/post      ;; Wraps post content with things like tags and publish date
                  :filterer post?)
        (p/render :renderer 'nicerthantriton.core/page)     ;; Wraps _all_ .html files with the things that are common to every page
        (p/atom-feed :filterer post?)))

Some interesting things about this:

collection can be called before render, and the result can be treated the same as files generated by markdown
multiple render tasks can be used to progressively build up pages
the original usage pattern is still supported - that is, you can still use a single render step, and collection could render a finished page in one pass, if that works best for you.

I'm very interested in any feedback you have, and in particular, if there are any use cases that are currently supported in master, but unsupported here.

Lastly, I would like you all to know that every time I think about this branch, my brain sings "It's the content pipeline" to the tune of "It's the final countdown". You're welcome.

* There's no content on the blog yet. Working on it.

Making Perun more file-based than metadata-based makes composing arbitrary rendering pipelines easier, by leveraging Boot's primary abstraction, the fileset. This opens up more possibilities of using third party tasks that don't know anything about Perun, and makes Perun's own tasks more useful. Summary of changes: - Tasks no longer store content in the :content metadata key, but instead use the actual file's content - `markdown` now sets `:parsed` instead of `:content` - In addition to setting `:parsed`, `markdown` also writes an html file that contains the parsed content, for processing downstream. - Rather than setting `:content` on the file that `markdown` writes, a new flag, `:has-content` is set to indicate to downstream tasks that this file is available for further processing - Default filterers are set to `:has-content` - Most of `markdown`'s functionality is wrapped by `content-pre-wrap`, which can be reused by future input parsing tasks - `collection` can now be used as a kind of input-parsing task that produces a page fragment in the same manner that `markdown` does. This fragment can then be picked up by subsequent `render` tasks and used in the same way as the output from `markdown`. The original usage of `collection` to produce a complete page is also still supported, with no changes. - `collection` no longer sets `:date-build` or `:canonical-url`, because you can now use the `build-date` or `canonical-url` tasks to set these keys, just as you would with any other content. - You can now chain multiple `render`s together in the same task pipeline. This allows you to split your rendering needs into as many logical parts as necessary.

… metadata

…s`, and make it optional

Rather than only using the original parsed input, if that isn't available, fall back to input content, parsed current file, or current file content, in that order.

…metadata

Adding this allows users to more precisely control what ends up in their feeds, and also avoids some tedious work in `content-pre-wrap`

bhagany · 2017-01-08T03:43:47Z

I should also add:

this has been in progress awhile, I can squash if you prefer :)
it fixes Collection task should work even if no :content files are found #77 (Collection task should work even if no :content files are found)
makes Feature: pass pre-parse function to markdown task #105 (pre-parse function to markdown task) easy to implement yourself as a standalone task

This change prevents perun's `:path` from getting out of sync with where the file actually is. For instance, if a 3rd party task moves a file without perun's knowledge, this change allows perun to still get the correct path.

podviaznikov · 2017-01-10T04:35:53Z

Code looks good to me. Will need to test it locally first. Thanks for the detailed description

bhagany · 2017-01-12T02:07:11Z

Correction: #105 still isn't really possible with this PR. However, I did just complete a branch for a future PR that does fulfill that goal. All in good time :). See https://github.com/bhagany/perun/wiki for details if you're curious.

MartyGentillon · 2017-01-14T00:05:26Z

src/io/perun.clj

+            input-meta (meta-by-ext input-fs file-exts)
+            output-meta (doall
+                         (for [{:keys [path parsed filename] :as entry*} input-meta]
+                           (let [page-filepath (string/replace path #"(?i).[a-z]+$" ".html")


Extensions can contain nearly any character including '-' and '_'. Perhaps this should be constructed by the file-exts parameter?

MartyGentillon · 2017-01-14T00:05:52Z

src/io/perun.clj

+                                                  :original-path path
+                                                  :path page-filepath
+                                                  :filename (string/replace filename
+                                                                            #"(?i).[a-z]+$" ".html"))


MartyGentillon · 2017-01-14T00:08:34Z

src/io/perun.clj

+                                                  :filename (string/replace filename
+                                                                            #"(?i).[a-z]+$" ".html"))
+                                           (dissoc :parsed :extension :file-type :full-path
+                                                   :mime-type :original))]


It might be better to update the :mime-type, :extension, and :file-type keys rather than removing them. They seems like something that it would be useful to filter on.

MartyGentillon · 2017-01-14T00:30:57Z

src/io/perun.clj

   This task will look for files ending with `md` or `markdown`
-   and add a `:content` key to their metadata containing the
+   and add a `:parsed` key to their metadata containing the
   HTML resulting from processing markdown file's content"


It seems that this also writes out HTML files to the location related, is that correct?

MartyGentillon · 2017-01-14T00:48:02Z

src/io/perun.clj

+            new-metadata (render-to-paths render-paths (:renderer options) tmp tracer)
+            rm-files (keep #(boot/tmp-get fileset (-> % :entry :path)) (vals render-paths))]
        (-> fileset
+            (boot/rm rm-files)


It might makes sense to log the names of removed files in debug mode.

MartyGentillon · 2017-01-14T00:53:44Z

src/io/perun.clj

+                                   new-path (make-path (:out-dir options) permalink path)
+                                   meta-entry (merge meta entry)
+                                   content-entry (assoc meta-entry :content content)]
                               (perun/report-debug "render" "rendered page for path" path)


This logging message is potentially confusing, as the rendering isn't actually occurring here. When someone's render task is crashing they might think that it had worked given that this printed out for all files. Perhaps it should be moved to render-to-paths.

bhagany · 2017-01-14T02:23:27Z

Excellent feedback, @MartyGentillon. I believe I've addressed all of your concerns.

MartyGentillon

Looks good to me.

bhagany · 2017-01-14T04:25:43Z

I merged #134 here, because I assumed it would get merged before this one. This way, there'll be no conflicts here when that happens. If there's a problem with #134 that I can't resolve quickly, I'll undo this merge.

bhagany · 2017-01-14T04:26:00Z

(also, yes, the tests pass here)

podviaznikov

Looks great to me

podviaznikov · 2017-01-16T00:28:54Z

@MartyGentillon thank you for your help with discussion and reviews

bhagany added 30 commits December 3, 2016 21:12

Merge branch 'refactor-yaml' into content-pipeline

e6ca001

Merge branch 'render-pre-wrap' into content-pipeline

e04d824

Merge branch 'render-pre-wrap' into content-pipeline

fb4a69b

Merge branch 'render-pre-wrap' into content-pipeline

27394a0

Merge branch 'merge-filedata' into content-pipeline

8b05456

Simplify finding unchanged content files

c431711

Include :parent-path in parsed file metadata

03eec4e

Exclude :include-rss and :include-atom from original content file…

88e5ff5

… metadata

Get original, not fully-rendered, content into atom xml

7f40dc8

Update spec document with :has-content, :original-content, :parsed

b36bac8

Small content-pre-wrap refactor; pass only pod instead of `option…

14a7e1b

…s`, and make it optional

Add docstrings to content-pre-wrap and make-path

f17e211

Merge branch 'merge-filedata' into content-pipeline

e7bcf75

Update new default filterers, :content -> :has-content

4563918

Don't store :original-content in meta; get it on demand

d0acca3

Update spec document, :original-content -> :original-path

ba71a04

Make atom content more flexible

e2ba0e3

Rather than only using the original parsed input, if that isn't available, fall back to input content, parsed current file, or current file content, in that order.

Correctly handle :include-rss and :include-atom on markdown file …

cf07276

…metadata

merge merge-filedata -> content-pipeline

a9566b1

Use correct namespace for get-global-meta

0ea04a4

Add extensions parameter to atom-feed and rss

01516fb

Adding this allows users to more precisely control what ends up in their feeds, and also avoids some tedious work in `content-pre-wrap`

Set :content\ key for \ttr\ and \word-count\ tasks

c5f6ff7

Use output-files for rss

a401423

Use full extensions, with leading dot, for extension lookups

0432a3c

Add :content key to collection :entries

72b4bfa

Tighten up the data that render-paths-fns return

ea7939f

Avoid setting :content key on file metadata; only use for rendering

61c14f1

merge master -> content-pipeline

a1c4bfe

Count words in original markdown, rather than parsed or rendered content

40fd58d

MartyGentillon reviewed Jan 14, 2017

View reviewed changes

bhagany added 5 commits January 13, 2017 19:49

Build regex from file-exts parameter to content-pre-wrap

c8b1023

Set new filedata on files created by content-pre-wrap

51e7fef

Amend markdown docstring to note it writes HTML files

51fe571

Add debug statement before file removal during rendering

5c64296

Move rendering debug and info statements to more logical places

919f16f

MartyGentillon approved these changes Jan 14, 2017

View reviewed changes

bhagany added 2 commits January 13, 2017 21:24

merge master -> content-pipeline

c8248f5

merge task-tests -> content-pipeline

1e3a3f0

bhagany added 2 commits January 14, 2017 08:22

Add test for hashobject#77 - collection renders without input files

5a48a1f

Add content tests

f677bd8

bhagany mentioned this pull request Jan 15, 2017

feat(asciidoctor): basic asciidoc plugin #100

Closed

8 tasks

bhagany added 2 commits January 15, 2017 15:29

merge master -> content-pipeline

61d8860

Add testing util task

d7cb61b

podviaznikov approved these changes Jan 16, 2017

View reviewed changes

podviaznikov merged commit 341ed5c into hashobject:master Jan 16, 2017

bhagany deleted the content-pipeline branch January 16, 2017 01:29

Content pipeline #132

Content pipeline #132

Uh oh!

Conversation

bhagany commented Jan 8, 2017

Uh oh!

bhagany commented Jan 8, 2017

Uh oh!

podviaznikov commented Jan 10, 2017

Uh oh!

bhagany commented Jan 12, 2017

Uh oh!

MartyGentillon Jan 14, 2017

Choose a reason for hiding this comment

Uh oh!

MartyGentillon Jan 14, 2017

Choose a reason for hiding this comment

Uh oh!

MartyGentillon Jan 14, 2017

Choose a reason for hiding this comment

Uh oh!

MartyGentillon Jan 14, 2017

Choose a reason for hiding this comment

Uh oh!

MartyGentillon Jan 14, 2017

Choose a reason for hiding this comment

Uh oh!

MartyGentillon Jan 14, 2017

Choose a reason for hiding this comment

Uh oh!

bhagany commented Jan 14, 2017

Uh oh!

MartyGentillon left a comment

Choose a reason for hiding this comment

Uh oh!

bhagany commented Jan 14, 2017

Uh oh!

bhagany commented Jan 14, 2017

Uh oh!

podviaznikov left a comment

Choose a reason for hiding this comment

Uh oh!

podviaznikov commented Jan 16, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants