Skip to content

Conversation

@bhagany
Copy link
Contributor

@bhagany bhagany commented Jan 8, 2017

Credit to @MartyGentillon for the ideas that led to this PR (see the conversation on #109). This is the big change most of my past PR's have been leading up to - it meets the following goals:

  • support existing usage patterns
  • allow more possible compositions of perun tasks
  • allow 3rd party tasks to be interposed with perun tasks more easily
  • resolve weirdness about the :content key
  • abstract content parsing similarly to the rendering abstraction (aka render-pre-wrap)

The central conceptual change here is completing the transition from a more metadata-based approach to a more file-based approach. The first major step here was to store file metadata on boot's TmpFiles, rather than in fileset metadata. This allows us to move or delete files without having to maintain their metadata separately. The second (and last) step, is to do away with the :content metadata key in favor of writing and manipulating TmpFiles, and using the content of the files, rather than maintaining that information in metadata. This allows any task to manipulate perun's final output, without that task having to be actually aware of perun and its metadata conventions.

For example, I know some here are fans of Tachyons (http://tachyons.io/). With this change, it would be possible to write a general tachyons task that manipulates file content and can be used with or without perun. To explain with code,

(deftask build
  (comp (p/markdown)
        (tachyons) ;; <-- unaware of perun
        (p/render :renderer 'your.ns/render)))

Before this change, the tachyons task would need to know about and manipulate perun's :content key in order to get what you wanted in perun/render, but now it can just change the file itself and ignore perun.

In addition to being more composable with 3rd party tasks, the change from metadata content to file content also allows perun's task to be composed with each other in ways that weren't possible before. Here's a real-life example from the build.boot for my blog*:

(deftask urls
  "Helper task for encapsulating all url generation stuff"
  [_ filterer FILTERER code "predicate to use for selecting entries (default: `:has-content`)"]
  (let [filterer (or filterer :has-content)]
    (comp (p/slug :slug-fn slugify-filename :filterer filterer)
          (p/permalink :permalink-fn permalinkify :filterer filterer)
          (p/canonical-url :filterer filterer))))

(deftask build
  "Build nicerthantriton.com"
  [t tier TIER kw "The tier we're building for"]
  (comp (p/global-metadata)
        (set-tier :val tier)
        (p/markdown :options {:extensions {:smarts true}})  ;; writes an .html file per .md file
        (urls)
        (p/collection :renderer 'nicerthantriton.core/index ;; produces an index page based on the output from `markdown`
                      :filterer post?
                      :meta {:derived true})
        (p/assortment :renderer 'nicerthantriton.core/topic ;; not in master yet, you can ignore it
                      :filterer post?
                      :grouper tagify
                      :meta {:derived true})
        (urls :filterer :derived)
        (recent-posts)                                      ;; These two set some global metadata based on .html files
        (topics)                                            ;; that exist at this point in the pipeline - you can ignore
        (p/render :renderer 'nicerthantriton.core/post      ;; Wraps post content with things like tags and publish date
                  :filterer post?)
        (p/render :renderer 'nicerthantriton.core/page)     ;; Wraps _all_ .html files with the things that are common to every page
        (p/atom-feed :filterer post?)))

Some interesting things about this:

  • collection can be called before render, and the result can be treated the same as files generated by markdown
  • multiple render tasks can be used to progressively build up pages
  • the original usage pattern is still supported - that is, you can still use a single render step, and collection could render a finished page in one pass, if that works best for you.

I'm very interested in any feedback you have, and in particular, if there are any use cases that are currently supported in master, but unsupported here.

Lastly, I would like you all to know that every time I think about this branch, my brain sings "It's the content pipeline" to the tune of "It's the final countdown". You're welcome.

* There's no content on the blog yet. Working on it.

Making Perun more file-based than metadata-based makes composing
arbitrary rendering pipelines easier, by leveraging Boot's primary
abstraction, the fileset. This opens up more possibilities of using
third party tasks that don't know anything about Perun, and makes
Perun's own tasks more useful.

Summary of changes:
 - Tasks no longer store content in the :content metadata key, but
   instead use the actual file's content
 - `markdown` now sets `:parsed` instead of `:content`
 - In addition to setting `:parsed`, `markdown` also writes an html
   file that contains the parsed content, for processing downstream.
 - Rather than setting `:content` on the file that `markdown`
   writes, a new flag, `:has-content` is set to indicate to
   downstream tasks that this file is available for further
   processing
 - Default filterers are set to `:has-content`
 - Most of `markdown`'s functionality is wrapped by
   `content-pre-wrap`, which can be reused by future input parsing
   tasks
 - `collection` can now be used as a kind of input-parsing task
   that produces a page fragment in the same manner that `markdown`
   does. This fragment can then be picked up by subsequent `render`
   tasks and used in the same way as the output from `markdown`.
   The original usage of `collection` to produce a complete page
   is also still supported, with no changes.
 - `collection` no longer sets `:date-build` or `:canonical-url`,
   because you can now use the `build-date` or `canonical-url` tasks
   to set these keys, just as you would with any other content.
 - You can now chain multiple `render`s together in the same task
   pipeline. This allows you to split your rendering needs into as
   many logical parts as necessary.
Rather than only using the original parsed input, if that isn't
available, fall back to input content, parsed current file, or
current file content, in that order.
Adding this allows users to more precisely control what ends up
in their feeds, and also avoids some tedious work in `content-pre-wrap`
@bhagany
Copy link
Contributor Author

bhagany commented Jan 8, 2017

I should also add:

This change prevents perun's `:path` from getting out of sync with
where the file actually is. For instance, if a 3rd party task moves
a file without perun's knowledge, this change allows perun to still
get the correct path.
@podviaznikov
Copy link
Member

Code looks good to me. Will need to test it locally first. Thanks for the detailed description

@bhagany
Copy link
Contributor Author

bhagany commented Jan 12, 2017

Correction: #105 still isn't really possible with this PR. However, I did just complete a branch for a future PR that does fulfill that goal. All in good time :). See https://github.com/bhagany/perun/wiki for details if you're curious.

src/io/perun.clj Outdated
input-meta (meta-by-ext input-fs file-exts)
output-meta (doall
(for [{:keys [path parsed filename] :as entry*} input-meta]
(let [page-filepath (string/replace path #"(?i).[a-z]+$" ".html")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extensions can contain nearly any character including '-' and '_'. Perhaps this should be constructed by the file-exts parameter?

src/io/perun.clj Outdated
:original-path path
:path page-filepath
:filename (string/replace filename
#"(?i).[a-z]+$" ".html"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

src/io/perun.clj Outdated
:filename (string/replace filename
#"(?i).[a-z]+$" ".html"))
(dissoc :parsed :extension :file-type :full-path
:mime-type :original))]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to update the :mime-type, :extension, and :file-type keys rather than removing them. They seems like something that it would be useful to filter on.

src/io/perun.clj Outdated
This task will look for files ending with `md` or `markdown`
and add a `:content` key to their metadata containing the
and add a `:parsed` key to their metadata containing the
HTML resulting from processing markdown file's content"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that this also writes out HTML files to the location related, is that correct?

new-metadata (render-to-paths render-paths (:renderer options) tmp tracer)
rm-files (keep #(boot/tmp-get fileset (-> % :entry :path)) (vals render-paths))]
(-> fileset
(boot/rm rm-files)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might makes sense to log the names of removed files in debug mode.

src/io/perun.clj Outdated
new-path (make-path (:out-dir options) permalink path)
meta-entry (merge meta entry)
content-entry (assoc meta-entry :content content)]
(perun/report-debug "render" "rendered page for path" path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logging message is potentially confusing, as the rendering isn't actually occurring here. When someone's render task is crashing they might think that it had worked given that this printed out for all files. Perhaps it should be moved to render-to-paths.

@bhagany
Copy link
Contributor Author

bhagany commented Jan 14, 2017

Excellent feedback, @MartyGentillon. I believe I've addressed all of your concerns.

Copy link
Contributor

@MartyGentillon MartyGentillon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@bhagany
Copy link
Contributor Author

bhagany commented Jan 14, 2017

I merged #134 here, because I assumed it would get merged before this one. This way, there'll be no conflicts here when that happens. If there's a problem with #134 that I can't resolve quickly, I'll undo this merge.

@bhagany
Copy link
Contributor Author

bhagany commented Jan 14, 2017

(also, yes, the tests pass here)

Copy link
Member

@podviaznikov podviaznikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me

@podviaznikov podviaznikov merged commit 341ed5c into hashobject:master Jan 16, 2017
@podviaznikov
Copy link
Member

@MartyGentillon thank you for your help with discussion and reviews

@bhagany bhagany deleted the content-pipeline branch January 16, 2017 01:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants