Skip to content

Commit 7ad0557

Browse files
committed
Add man pages for strip-markup and sanitize-string
https://claude.ai/code/session_01Wjn2KfitiA5iTADcLLfdbR
1 parent 16d8cac commit 7ad0557

2 files changed

Lines changed: 158 additions & 0 deletions

File tree

man/sanitize-string.1.ronn

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
sanitize-string(1) -- Strip markup and control characters from a string
2+
========================================================================
3+
4+
<!--
5+
# Copyright (C) 2025 ENCRYPTED SUPPORT LLC <adrelanos@whonix.org>
6+
# See the file COPYING for copying conditions.
7+
-->
8+
9+
## SYNOPSIS
10+
11+
`sanitize-string [--help] max_length [string]`
12+
13+
## DESCRIPTION
14+
15+
`sanitize-string` combines the functionality of `strip-markup`(1) and
16+
`stdisplay`(1) to fully sanitize an untrusted string by removing both
17+
HTML markup tags and dangerous terminal control characters (such as ANSI
18+
escape sequences). The result can be safely displayed in a terminal or
19+
used in non-HTML text contexts.
20+
21+
If a string is provided as the second positional argument, it is used
22+
as the input. Otherwise, the string is read from standard input.
23+
24+
The `max_length` argument specifies the maximum number of characters to
25+
output. Set it to `nolimit` to allow arbitrarily long strings. When a
26+
limit is set, the output is truncated to that many characters.
27+
28+
### Sanitization order
29+
30+
Sanitization is performed in three steps:
31+
32+
1. Strip ANSI escape sequences and control characters (via `stdisplay`).
33+
2. Strip HTML markup tags (via `strip-markup`).
34+
3. Strip ANSI escape sequences and control characters again, in case
35+
the markup stripping step decoded HTML entities into escape
36+
characters.
37+
38+
This ordering ensures that neither markup nor escape sequences can be
39+
used to smuggle the other past the sanitizer.
40+
41+
## RETURN VALUES
42+
43+
* `0` Successfully sanitized and printed the result.
44+
* `1` Usage error (missing or invalid arguments).
45+
46+
## EXAMPLES
47+
48+
Sanitize a string with no length limit:
49+
50+
<code>
51+
sanitize-string nolimit '&lt;b&gt;Hello&lt;/b&gt;'
52+
</code>
53+
54+
Output: `Hello`
55+
56+
Sanitize and truncate to 10 characters:
57+
58+
<code>
59+
sanitize-string 10 'This is a long untrusted string.'
60+
</code>
61+
62+
Output: `This is a `
63+
64+
Sanitize from standard input:
65+
66+
<code>
67+
echo '&lt;script&gt;alert(1)&lt;/script&gt;' | sanitize-string nolimit
68+
</code>
69+
70+
Use `--` to separate options from positional arguments:
71+
72+
<code>
73+
sanitize-string -- nolimit '--help'
74+
</code>
75+
76+
## SEE ALSO
77+
78+
strip-markup(1), stdisplay(1)
79+
80+
## AUTHOR
81+
82+
This man page has been written by Patrick Schleizer (adrelanos@whonix.org).

man/strip-markup.1.ronn

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
strip-markup(1) -- Strip HTML markup tags from a string
2+
========================================================
3+
4+
<!--
5+
# Copyright (C) 2025 ENCRYPTED SUPPORT LLC <adrelanos@whonix.org>
6+
# See the file COPYING for copying conditions.
7+
-->
8+
9+
## SYNOPSIS
10+
11+
`strip-markup [--help] [string]`
12+
13+
## DESCRIPTION
14+
15+
`strip-markup` strips HTML markup tags from an untrusted string,
16+
returning only the text content. It is intended to ensure that a string
17+
will not be interpreted as HTML markup in isolation.
18+
19+
If a string is provided as an argument, it is used as the input.
20+
Otherwise, the string is read from standard input.
21+
22+
HTML character references (such as `&amp;`, `&lt;`, `&#60;`) are
23+
decoded to their corresponding characters.
24+
25+
### Double-strip protection
26+
27+
`strip-markup` performs two consecutive strip passes over the input. If
28+
the second pass further transforms the text, this indicates that the
29+
first pass revealed new markup that was hidden inside nested tags (for
30+
example, `<<b>b>Bold<</b>/b>`). In this case, the tool treats the
31+
input as malicious and replaces all `<`, `>`, and `&` characters in the
32+
first-pass result with underscores (`_`), so that the neutered text is
33+
visible to the user as a warning.
34+
35+
### Scope
36+
37+
`strip-markup` ensures that its output does not contain HTML tags. It
38+
does **not** escape the output for safe embedding in HTML attributes or
39+
other HTML contexts. If the output will be inserted into HTML, the
40+
caller is responsible for applying appropriate context-specific
41+
escaping.
42+
43+
## RETURN VALUES
44+
45+
* `0` Successfully stripped markup and printed the result.
46+
* `1` Usage error.
47+
48+
## EXAMPLES
49+
50+
Strip tags from a string argument:
51+
52+
<code>
53+
strip-markup '&lt;p&gt;Hello &lt;b&gt;world&lt;/b&gt;.&lt;/p&gt;'
54+
</code>
55+
56+
Output: `Hello world.`
57+
58+
Strip tags from standard input:
59+
60+
<code>
61+
echo '&lt;p&gt;Hello&lt;/p&gt;' | strip-markup
62+
</code>
63+
64+
Use `--` to pass strings that start with `-`:
65+
66+
<code>
67+
strip-markup -- '--help'
68+
</code>
69+
70+
## SEE ALSO
71+
72+
sanitize-string(1), stdisplay(1)
73+
74+
## AUTHOR
75+
76+
This man page has been written by Patrick Schleizer (adrelanos@whonix.org).

0 commit comments

Comments
 (0)