add a quick way of tokenizing by character

Because the "tokenize" parameter is tested for existence, it's challenging to tokenize on "nothing" (which would split everything into individual characters)

Notably, there is also a difference in behavior between the Python and Perl implementations, in that distribution.py will successfully split on "0", while Perl will act as though I hadn't passed anything tokenize parameter in at all, with "-t=0"

The Perl-with-zero behavior should be easy to fix, but I'd suggest adding another special "tokenize" value (along with the existing "white" and "word") of "char" or something similar.

I'm not very experienced with Python, and while in Perl you can simply add a line like
`elsif ($tokenize eq 'char')   { $tokenize = ''; }`
as far as I can tell, Python will not behave that way with splitting on an empty regex.  And it's also beyond me how to properly test for "None" vs. some other existence thing to see if it was defined at all on the command line.

Anyway, there's always a work-around for now to split the entire thing before it even gets in.
e.g.
`cat theFile | perl -ne 'print join "\n", split //' | distribution`
But it feels like something that should be available more easily.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add a quick way of tokenizing by character #21

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

add a quick way of tokenizing by character #21

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions