proposal: Design a format for untrusted lexers

Currently, rouge lexers are expressed in Ruby code, which makes them extremely flexible for complex languages like Ruby. However, there is a use-case for allowing user-provided lexers in contexts where executing arbitrary Ruby code would be inappropriate.

Supporting this use case would involve a rouge extension providing an alternate `Lexer` subclass which consumes either a custom format or implements an existing one such as tmLanguage, and attempts to map the resulting tokens back to rouge/pygments's token set.

Main concerns here are:
* tmLanguage is not nearly as flexible as Rouge's stack-based approach (inherited from pygments), which allows arbitrary lexer state among other things. Very complex lexers would probably not be expressible accurately in this format.
* A custom format could place a "15th competing standard" burden on language devs / communites, where they already have to implement Ruby lexers, or could encourage people to use inaccurate/slower lexing rather than contribute a Ruby lexer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: Design a format for untrusted lexers #2196

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

proposal: Design a format for untrusted lexers #2196

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions