Conversation
LLM responses contain standard markdown that Telegram renders as raw text. Add Bodhi.Telegram.Formatter that parses markdown via MDEx AST and renders Telegram-compatible HTML subset (<b>, <i>, <code>, <pre>, <a>, <blockquote>, <s>). Messages are sent with parse_mode: "HTML" and split at block boundaries when exceeding Telegram's 4096 char limit. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
test |
PR Review: feat/telegram-markdownOverall a well-structured addition. Clean module design, correct HTML escaping, and solid test coverage. A few issues worth addressing before merge. Bug: LLM response record created before send succeeds File: lib/bodhi/tg_webhook_handler.ex Previously maybe_create_llm_response ran inside the with block, only on a successful send. Now it runs unconditionally before Enum.reduce_while. If all Telegram API calls fail, an orphaned LLM response record is created with no corresponding message. This is a regression from the previous behaviour. Fix: move maybe_create_llm_response(metadata) inside the {:ok, message} branch — only the first chunk needs it. Hard split does not handle single lines over 4096 chars File: lib/bodhi/telegram/formatter.ex hard_split/1 splits at newlines then packs lines into chunks. But if a single line exceeds 4096 chars (e.g. a long base64 blob or a code block with no internal newlines), chunk_lines includes it as-is with no length check, violating Telegram limit. A fallback that byte-splits lines exceeding @max_length is needed. Unnecessary default arg on @impl functions In telegex_adapter.ex and telegram.ex the behaviour callback is send_message/3, but the default opts param silently creates a send_message/2 outside the behaviour contract. Since every call site passes 3 args, remove the default to keep implementations strictly matching callback arity. periodic_messages_test assertion may be fragile The test asserts text == prompt_text but text is now the HTML-formatted version. It passes today because the test prompt contains no markdown. Any prompt with bold, inline code, or other markdown will cause a mismatch. Consider asserting against elem(Formatter.format(prompt_text), 0) instead. Minor: escape_attr/1 is a transparent alias for escape/1 Fine as a semantic placeholder for potential future URL-specific encoding. A short comment explaining the intent would help future maintainers. Positives
|
…lines - Move maybe_create_llm_response inside send success branch to avoid orphaned records when all Telegram API calls fail - Add byte-level splitting for single lines exceeding 4096 chars - Remove default opts on @impl send_message to match behaviour arity - Fix periodic_messages_test to assert against formatted HTML - Add semantic comment to escape_attr/1 - Update conn_case stub to 3-arg send_message Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
This is a web application written using the Phoenix web framework. Project StructureKey FeaturesDaily Dialog Summarization SystemThe application includes an automatic dialog summarization system to optimize AI context and reduce API costs: Database Schema:
Core Modules:
Context Assembly:
Configuration: # config/config.exs
config :bodhi, :summarization, recent_days: 7
config :bodhi, Oban,
plugins: [
{Oban.Plugins.Cron,
crontab: [{"0 2 * * *", Bodhi.Workers.DailyChatSummarizer}]}
]Documentation:
Key Guidelines:
Project guidelines
Phoenix v1.8 guidelines
JS and CSS guidelines
UI/UX & design guidelines
phoenix:ecto usageEcto Guidelines
phoenix:elixir usageElixir guidelines
Mix guidelines
Test guidelines
phoenix:html usagePhoenix HTML guidelines
phoenix:liveview usagePhoenix LiveView guidelines
LiveView streams
LiveView JavaScript interop
LiveView hooks come in two flavors, 1) colocated js hooks for "inline" scripts defined inside HEEx, Inline colocated js hooksNever write raw embedded
External phx-hookExternal JS hooks ( Pushing events between client and serverUse LiveView's Pushed events can then be picked up in a JS hook with Clients can also push an event to the server and receive a reply with Where the server handled it via: LiveView tests
Form handlingCreating a form from paramsIf you want to create a form based on When you pass a map to You can also specify a name to nest the params: Creating a form from changesetsWhen using changesets, the underlying data, form params, and errors are retrieved from it. The And then you create a changeset that you pass to Once the form is submitted, the params will be available under In the template, the form form assign can be passed to the Always give the form an explicit, unique DOM ID, like Avoiding form errorsAlways use a form assigned via And never do this:
phoenix:phoenix usagePhoenix guidelines
igniter usageA code generation and project patching framework Rules for working with IgniterUnderstanding IgniterIgniter is a code generation and project patching framework that enables semantic manipulation of Elixir codebases. It provides tools for creating intelligent generators that can both create new files and modify existing ones safely. Igniter works with AST (Abstract Syntax Trees) through Sourceror.Zipper to make precise, context-aware changes to your code. Available ModulesProject-Level Modules (
|
|
defmodule Bodhi.MixProject do def project do Configuration for the OTP application.Type
|
PR Review: feat/telegram-markdownOverall this is a solid, well-structured feature with good test coverage. The three-level splitting strategy and clean module separation are commendable. A few issues worth addressing before merge: |
PR Review: feat/telegram-markdownOverall this is a solid, well-structured feature with good test coverage. The three-level splitting strategy and clean module separation are commendable. A few issues worth addressing before merge: BUG: Chunk ordering when first block requires hard-splitting File: In defp chunk_blocks([block | rest], []) do
...
hard_split(block) ++ chunk_blocks(rest, [])
...
endThe problem: Example: first block hard-splits into
The other call site in Fix: Reverse defp chunk_blocks([block | rest], []) do
if String.length(block) <= @max_length do
chunk_blocks(rest, [block])
else
Enum.reverse(hard_split(block)) ++ chunk_blocks(rest, [])
end
endThis scenario occurs when an LLM response begins with a large code block (> 4096 chars) followed by more content - realistic in practice. There is no test covering "first block needs hard-split AND there are subsequent blocks"; adding one would catch regressions. MINOR: CodeBlock info string should use only the first word as language File: CommonMark allows arbitrary text in the fenced code block info string (e.g., backtick-backtick-backtick Fix - extract only the language token: defp render_node(%MDEx.CodeBlock{info: info, literal: text}) do
lang = info |> String.split(" ", parts: 2) |> List.first()
if lang != "" do
"<pre><code class=\"language-#{escape(lang)}\">" <>
escape(text) <> "</code></pre>"
else
"<pre><code>" <> escape(text) <> "</code></pre>"
end
endMINOR: ^text pin in webhook handler test may be fragile File: The expectation {expected_html, _} = Bodhi.Telegram.Formatter.format(text)
expect(Bodhi.TelegramMock, :send_message, fn ^chat_id, ^expected_html, _opts ->What's done well
|
…tring - Fix chunk_blocks empty-acc clause to pass hard_split result as initial accumulator instead of concatenating reversed lists - Extract only first word from code block info string as language (handles "python title=..." style info strings) - Pin formatted HTML in webhook handler test instead of raw text - Add tests for both fixes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
PR Review:
|
| Area | Status |
|---|---|
| Core formatting logic | ✅ Good |
| Splitting logic | hard_split |
| Test coverage | ✅ Good (28 tests), with a few gaps noted |
| Code style | ✅ Follows project conventions |
| Security (HTML escaping) | ✅ Correct |
| Behaviour/adapter contract | ✅ Consistent |
The chunk-ordering issue in hard_split is the only potentially impactful bug—worth verifying with a test that asserts the actual text content order of hard-split chunks, not just their lengths.
- Replace elem(0) with then/1 pattern in webhook handler for clarity - Fix "soft break becomes space" test description (it's a newline) - Add tests for unknown AST node fallback and raw HTML escaping - Strengthen hard-split ordering test to verify first chunk content - Add comment explaining intentional HTML escaping on HtmlInline/HtmlBlock Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Code Review Overall this is a solid implementation. The MDEx AST-based approach is much better than regex-based markdown parsing. Good test coverage with 28 unit tests. Below are my findings, roughly ordered by severity. BUGS AND CORRECTNESS ISSUES
format(empty string) returns an empty HTML string, then split returns a list with one empty element, so TgWebhookHandler.send_message will call Telegram.send_message with an empty body - Telegram rejects this with a 400 error. This is reachable whenever an LLM returns an empty response. Suggested fix: guard in TgWebhookHandler.send_message/3 - if html is empty, return :ok early. Or have split/1 return an empty list for empty input.
chunk_lines/2 prepends to acc, so hard_split/1 returns chunks in reversed order. The final Enum.reverse in split/1 restores order, but this relies on a subtle double-reversal that is easy to break. API AND INTERFACE CONCERNS
Any caller expecting {:ok, %Telegex.Type.Message{}} will silently break.
Sending with empty opts (no parse_mode) is correct for a plain-text URL, but inconsistent with other send sites. A brief comment clarifies. CODE QUALITY
URL percent-encoding and HTML attribute escaping differ significantly. Until divergence happens, inline escape/1 at the call sites.
Running Dialyzer in :test env analyses test code and deps unnecessarily. Use :dev instead.
Add a test for ordered lists starting at 2 to guard against off-by-one errors.
Telegram uses UTF-16 code units for its 4096-char limit. Worth a note in the module docs. TESTING
The {:error, _} branch in format/1 is never hit by existing tests. Either inject an MDEx error or add a comment acknowledging this.
No test verifying: (a) all chunks are dispatched, (b) only one LlmResponse record is created, (c) failures on later chunks halt correctly. MINOR NITS
SUMMARY: Bugs=2, API/interface=2, Code quality=4, Tests=2, Nits=3. Items 1 (empty-string edge case) and 10 (multi-chunk test coverage) are the most important to address before merging. |
|
test |
|
test2 |
BodhiTo start your Phoenix server:
Now you can visit Ready to run in production? Please check our deployment guides. AI Provider ConfigurationBodhi supports multiple AI providers that can be switched via configuration. Available ProvidersOpenRouter (Default)
Google Gemini
Switching ProvidersTo switch AI providers, update # Use OpenRouter (default)
config :bodhi, :ai_client, Bodhi.OpenRouter
# Use Google Gemini
config :bodhi, :ai_client, Bodhi.GeminiSetting Up API Keys
Changing OpenRouter ModelEdit @default_model "deepseek/deepseek-r1-0528:free" # Current default
# Other popular models:
# @default_model "anthropic/claude-3.5-sonnet"
# @default_model "openai/gpt-4-turbo"
# @default_model "meta-llama/llama-3.1-70b-instruct"
# @default_model "google/gemini-pro-1.5"See all available models at: https://openrouter.ai/models FeaturesDaily Dialog SummarizationBodhi automatically summarizes chat conversations daily to optimize AI context and reduce API costs:
Example Results:
Documentation
Quick StartAfter deployment, backfill historical summaries: # Preview what will be processed (no API calls)
bin/bodhi eval "Bodhi.Release.backfill_summaries(dry_run: true)"
# Run the backfill
bin/bodhi eval "Bodhi.Release.backfill_summaries()"See docs/DEPLOYMENT.md for detailed instructions. Learn more
|
|
Full review text here |
Feature Implementation Flow
|
|
placeholder |
The /start reply assertions compared raw prompt text against HTML-formatted output. This would break if prompts contained HTML-special characters. Compare against Formatter.format/1 output instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Supplementing the previous review with corrections and additional findings. |
|
Supplemental review — corrections and new findings Corrections to previous review Hard split / single lines over 4096 chars: already handled. chunk_lines/2 has an explicit guard calling split_long_line/1 for lines exceeding @max_length, and a passing test covers this case. No action needed. LLM response record created before send succeeds: also incorrect. llm_id || maybe_create_llm_response(metadata) is inside the {:ok, message} branch, conditional on a successful send. No action needed. New findings Bug: empty text still calls Telegram API An empty input produces a single empty-string chunk that is forwarded to Telegram.send_message. The Telegram Bot API rejects empty text with a 400 error. The test stub accepts anything, so the assertion passes despite the missing guard. Suggested fix: filter blank chunks before the reduce. Use Enum.reject to drop empty-string chunks; if the result is an empty list return {:ok, nil}, otherwise proceed with Enum.reduce_while. Security: javascript: links pass through unmodified escape_attr/1 only HTML-escapes the standard four characters. A javascript: href from LLM output passes through unchanged. A scheme allowlist (allow only https?://) at the formatting layer is safer. Minor: .claude/rules/feature-workflow.md should not be in this PR The pr-workflow.md rule states never modify files in the .claude directory. The two-line addition also contains typos (everyting, chacks) and should be reverted or submitted separately. Minor: credo: :test in preferred_cli_env Credo is conventionally run in :dev. Running in :test pulls test-support modules into analysis scope. Minor: escape_attr/1 comment is slightly misleading Future URL-specific encoding implies percent-encoding, which is a different operation. Clearer wording: Kept separate so URL validation (e.g. scheme allowlist) can be added here without touching the general escape path. Overall: the core implementation (MDEx AST traversal, HTML escaping, block-boundary splitting, and test coverage) is solid. The empty-chunk Telegram API call and the javascript: scheme gap are the only issues with real runtime consequences. |
… dialyzer env
- Extract send_chunks/4 from send_message/3 for clarity
- Guard against empty formatted text (return {:ok, nil} early)
- Add login URL comment explaining why parse_mode is omitted
- Remove escape_attr/1 alias, inline escape/1 at call sites
- Change dialyzer preferred_cli_env from :test to :dev
- Add ordered list offset test (starting at 3)
- Mark tg-spoiler as not yet implemented in moduledoc
- Add UTF-16 code unit note to moduledoc
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
TITLE: PR Review: feat/telegram-markdown Good overall implementation with solid test coverage. Previous review feedback (LLM response creation, hard-split for long lines, periodic messages test) has been addressed. A few issues remain. BUG: split/1 breaks HTML inside pre code blocks String.split on double-newline is applied to the fully-rendered HTML string. A code block containing a blank line will contain a double-newline in the rendered output, causing the split to cut inside the tag. Example: a Python code block with a blank line between two functions renders to HTML with a blank line inside the pre/code tags. After splitting on that blank line, Telegram receives two messages with mismatched HTML — unclosed pre tag in the first chunk, orphaned closing tags in the second. Telegram rejects or mangles both. Fix: split on the AST level before rendering (preferred), or track whether the scan cursor is inside a pre block when choosing split points. SECURITY: javascript: links pass through unmodified render_node/1 for MDEx.Link and MDEx.Image calls escape(url) on the href value. escape/1 only handles ampersand, less-than, greater-than, and double-quote. A javascript:alert(1) URL from LLM output passes through unchanged. Add a scheme allowlist (http, https, tg) and replace unsafe URLs with a safe fallback. ISSUE: Empty chunks may reach the Telegram API If format/1 produces HTML with consecutive double-newline separators (e.g. a thematic break renders as empty string surrounded by blank lines), splitting on double-newline yields empty-string elements inside the chunk list. These flow into send_chunks and reach Telegram.send_message with an empty string body, which Telegram rejects. The empty-list guard in send_message/3 only catches the case where the entire formatted message is empty. Fix: filter empty chunks before iterating in send_chunks. ISSUE: Partial-send leaves orphaned LLM response record If chunk 1 succeeds (LLM response record created and message saved) but chunk 2 fails, the function returns an error while leaving an LLM response record pointing to only one of the expected messages. The trade-off is acceptable, but a comment in send_chunks/4 documenting this behaviour would help future maintainers. MINOR: .claude/rules/feature-workflow.md should not be in this PR pr-workflow.md explicitly states never fix files in the .claude directory. The two-line addition also contains typos (everyting, chacks). Revert or submit separately. MINOR: preferred_cli_env — credo: :test is unconventional Credo is conventionally run in :dev. Running it in :test pulls test-support modules into static analysis scope. Change to :dev or omit the entry. POSITIVES:
|
- Add format_chunks/1 that splits at AST block boundaries to avoid breaking HTML inside <pre><code> blocks - Add URL scheme allowlist (http, https, tg) to prevent javascript: links - Filter empty chunks from thematic breaks before sending to Telegram - Add comment documenting partial-send orphaned LLM response trade-off - Revert .claude/rules/feature-workflow.md changes (per PR workflow rules) - Fix preferred_cli_env: credo from :test to :dev Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Code ReviewOverall this is a well-structured PR with good test coverage and clear documentation. The HTML-escaping approach to prevent XSS, URL scheme validation, and AST-based splitting at block boundaries are all solid design decisions. A few issues worth addressing before merge. Bug: Hard-split breaks HTML tags in long code blocks
If a single rendered Test inconsistency:
The webhook test computes expected values via Minor: empty input returns different shapes from
Minor: typespec style
Minor:
Nit: no default for Adding Positive callouts
Generated with Claude Code |
Summary
Bodhi.Telegram.Formattermodule that parses LLM markdown via MDEx AST and renders Telegram-compatible HTML (<b>,<i>,<code>,<pre>,<a>,<blockquote>,<s>)parse_mode: "HTML"and automatically split at block boundaries when exceeding Telegram's 4096 char limitsend_messagebehaviour/adapter to accept optionaloptskeyword list for passingparse_modeand other Telegram API optionsTest plan
Bodhi.Telegram.Formatter(format + split)send_messagemix compile— no new warningsmix dialyzer— 0 errorsmix format— clean🤖 Generated with Claude Code