Skip to content

Releases: thiswillbeyourgithub/wdoc

Release 5.1.0

15 May 14:05

Choose a tag to compare

What's new

This release focuses on modular installation extras, CLI robustness improvements, and a sweep of bug fixes across summarization, logging, and setup.

✨ Features

  • CLI: Accept kebab-case flags (--foo-bar--foo_bar) automatically ([e9bfb80])
  • CLI: Warn on every sys.argv mutation via ArgvState ([4fe38f2]); accept --yt_* as shorthand for --youtube_doc_* ([f800805])
  • YouTube: Auto-detect original-language subtitle track (-orig), falling back to en/en-US ([0753b00])
  • Prompts: Skip per-bullet citations when only one source; mention it once at the top instead ([c0097b7])
    • Exception: for YouTube/timecoded sources, use per-bullet timecodes (e.g. [02:17:33]) ([c0779a1])

🐛 Fixes

  • Summarize: Strip *DEEP BREATH*-style LLM intro artifacts from all top-level bullets, not just the first ([dd09942], [c837143])
  • Summarize: Fix model name in output summary ([e253d45])
  • Logger: Actually remove the default DEBUG stderr handler instead of stacking a second sink on top of it ([2f3b295])
  • Env: Match --debug/--verbose by exact argv token, not substring, preventing false positives from argument values ([e183729])
  • Loaders: Better check for empty documents ([5ebbbd2], [586bc5c])
  • YouTube: Add troubleshooting instructions on failed extraction ([397d133]); fix default language handling ([29207e8])
  • Audio: Fix WDOC_WHISPER_API_KEY handling when OPENAI_API_KEY is unset ([23b6b1f])
  • Setup: Guard openparse-download behind an import openparse probe ([f4445b4]); scope yt-dlp pre-release upgrade to [youtube] users only ([9242566])

♻️ Refactors

  • Setup: Split install_requires into modular extras [youtube], [audio], [anki], [office], [logseq], [full] ([7bb4744])
    • Move audioop-lts into [audio] extra with python_version>='3.13' marker ([eb9eba4])
    • Move py_ankiconnect into [anki] extra with requests fallback ([fe2d9c0])
    • Drop python-magic git install from post-install hook ([bafb379])
  • CLI: Centralize all sys.argv mutations in ArgvState helper class ([1098157], [a22f56d])
  • Logger: Move handler setup out of import side-effects into setup_cli_logging(), called only from __main__.py ([203ab6f])

🧪 Tests

  • Cover ArgvState helpers with unit tests ([6bc897a], [a22f56d])
  • Move API-key precheck from test_wdoc.py to run_all_tests.sh for faster fail ([dbf4410])
  • Skip test_parse_docx on HTTP 429 instead of failing ([a02d684])
  • Improve venv management in run_all_tests.sh ([b6a0dd8])

📚 Docs

  • Clarify uvx wdoc[full] usage throughout README and examples.md ([0f72eaf], [7653e9a])
  • Add/fix [anki] extra in Anki parse example ([0f72eaf])
  • Improve installation instructions recommending uvx ([d88c461])
  • Clarify how to use a cloned repository ([b8b4b5e])

Commits details since the last release

bumpver.toml
docs/source/conf.py
setup.py
wdoc/wdoc.py

  • [dd09942] by @thiswillbeyourgithub, 25 minutes ago:
    fix(summarize): clean LLM intro artifacts on all top-level bullets
    Extract the 'deep breath' / "i'll summarize" cleanup into
    _strip_llm_intro_artifacts and run it on every top-level line, not just
    the first one. Previously a source reference on line 1 would leave a
    later deep-breath bullet untouched.

Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com

wdoc/utils/tasks/summarize.py

README.md

tests/run_all_tests.sh

  • [a02d684] by @thiswillbeyourgithub, 2 hours ago:
    test(parsing): skip test_parse_docx on HTTP 429 instead of failing
    The test downloads a sample DOCX from freetestdata.com, which sometimes
    returns 429 (rate limited). That is not a wdoc bug, so skip rather than
    fail in that case.

tests/test_parsing.py

  • [c0779a1] by @thiswillbeyourgithub, 2 hours ago:
    feat(prompts): use timecodes as per-bullet source for YouTube single-source
    Extends the single-source citation exception: when the unique source is a
    YouTube video (or other timecoded media), don't drop citations entirely.
    Mention the video source once at the top, then use each bullet's timecode
    (e.g. [02:17:33]) as its precise per-bullet pointer. Applied to both the
    summary prompt (Sam) and the combine prompt (Carl).

Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com

wdoc/utils/prompts.py

  • [c0097b7] by @thiswillbeyourgithub, 2 hours ago:
    feat(prompts): skip per-bullet citations when only one source
    Avoids wasting tokens by repeating the same page/WDOC_ID citation on every
    bullet point when all information shares a single unique source. In that
    case the citation is mentioned once at the top instead. Applies to both
    the summary prompt (Sam) and the combine prompt (Carl).

Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com

wdoc/utils/prompts.py

  • [c837143] by @thiswillbeyourgithub, 2 hours ago:
    fix(summarize): strip "DEEP BREATH -" style prefixes from first line
    Permissive on asterisks, "breath"/"breaths", and the separator character
    so variants like "- DEEP BREATH - ", "DEEP BREATHS: ", "DEEP BREATH, "
    are handled while preserving the bullet marker.

Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com

wdoc/utils/tasks/summarize.py

wdoc/utils/tasks/summarize.py

  • [c67f564] by @thiswillbeyourgithub, 2 hours ago:
    docs(setup): note nltk punkt_tab download is likely redundant
    unstructured already lazily downloads punkt_tab on first tokenize call,
    so the eager post-install download is probably duplicate work. Keep it
    as a safety net (and to front-load the network hit at install time
    instead of on the first office-document parse), but document it.

Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com

setup.py

  • [f4445b4] by @thiswillbeyourgithub, 2 hours ago:
    fix(setup): only run openparse-download when openparse is installed
    Guard the post-install weight download with an import openparse probe
    so a stripped-down install (no openparse[ml] in install_requires) does
    not call a missing console-script and emit a confusing error.

Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com

setup.py

  • [9242566] by @thiswillbeyourgithub, 2 hours ago:
    fix(setup): scope yt-dlp pre-release upgrade to [youtube] users
    yt-dlp lives in the optional [youtube] extra, but the post-install hook
    was force-installing it for everyone (with --user, which is wrong
    inside a venv and quietly drops the install outside the env). Probe for
    yt_dlp first and only run the pip install -U --pre yt-dlp if it's
    already there. This keeps yt-dlp truly optional while still letting
    [youtube] users track YouTube extractor fixes that land in pre-releases.

Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com

setup.py

  • [eb9eba4] by @thiswillbeyourgithub, 2 hours ago:
    refactor(setup): declare audioop-lts via the [audio] extra
    Move the audioop-lts 3.13+ install out of the imperative post-install
    hook and into the [audio] extra with a python_version>='3.13'
    environment marker. audioop-lts is only needed because pydub needs it,
    and pydub already lives in [audio], so the conditional belongs there.
    This also makes the dependency visible to non-python setup.py install
    installers (pip install wdoc[audio], uv, pipx, etc.) which never ran
    the post-install hook in the first place.

Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com

setup.py

  • [bafb379] by @thiswillbeyourgithub, 2 hours ago:
    chore(setup): drop python-magic git install from post-install
    The git install existed to get the FIFO/pipe fix from upstream PR for
    issue #261, used via magic.from_buffer on stdin bytes. That code path
    is commented out in batch_file_loader.py, and the two remaining call
    sites (magic.from_file in batch_file_loader.py and pdf.py) work fine
    with the released 0.4.27 wheel on PyPI. Both call sites are already
    wrapped in try/except, so python-magic stays optional at runtime.

Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com

setup.py

  • [203ab6f] by @thiswillbeyourgithub, 2 hours ago:
    refactor(logger): move handler setup out of import side effects
    When wdoc was imported as a library (e.g. as an open-webui tool),
    wdoc/utils/logger.py mutated the global loguru logger at import time:
    removing the default stderr sink and adding its own stdout/stderr/file
    sinks. That clobbered the host application's loguru configuration.

Wrap the handler installation in a setup_cli_logging() function that
is called explicitly from wdoc/main.py. Library users get whatever
loguru handlers the host already configured (since loguru is a
singleton, wdoc's records will flow through them automatically); CLI
users get the customized colorized stdout/stderr plus the rotated
file log.

Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com

wdoc/init.py
wdoc/main.py
wdoc/utils/logger.py
wdoc/wdoc.py

README.md

  • [dbf4410] by @thiswillbeyourgithub, 2 hours ago:
    test(env): move API-key precheck from test_wdoc.py to run_all_tests.sh
    Fails fast at the shell level before spinning up the venv and pytest,
    rather than only when test_wdoc.py is imported.

Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com

tests/run_all_tests.sh
tests/test_wd...

Read more

Release 5.0.1

13 May 17:37

Choose a tag to compare

What's new

feat

  • Summary citations (dcdecc86, 6cbf025c, e5d7840b, 9d7802aa)
    • Per-chunk metadata (page, source) injected as XML into summarization input
    • LLM prompted to add [p.N] citations; Python fallback adds them to uncited top-level bullets
    • Multi-file summaries use [p.N, filename.pdf] format with shortest disambiguating path
    • New citation_url_template parameter turns page citations into clickable markdown links (e.g. {source}#page={page})
  • Query output anchor links (8f72df69): WDOC_ID citations now render as [N](#document-N) with HTML anchors for in-page navigation

change

  • Default models switched to OpenRouter/DeepSeek (14ac41e4, be3dced3)
    • WDOC_DEFAULT_MODEL and WDOC_DEFAULT_QUERY_EVAL_MODEL now default to openrouter/deepseek/deepseek-v4-pro and openrouter/deepseek/deepseek-v4-flash
    • Routes through OpenRouter instead of calling the DeepSeek provider directly

fix

  • b555a904: Coerce int to float for CLI kwargs type checking (fixes Gradio UI sending integer values for float parameters)
  • 4015a3a9: Better removal of "deep breath" mentions in summarization prompts

test

  • 61c8238c, 0ea3d13d, b34e2e3f: Crash early at import time when required API keys (OPENROUTER_API_KEY, OPENAI_API_KEY, MISTRAL_API_KEY, WDOC_WHISPER_API_KEY) are missing
  • 1d07163c: Warn and skip instead of crash when whisper test hits a 502 error
  • 8380d8ad: Skip ollama embedding test when the ollama port is unreachable
  • e672e092: Better test cleanup in run_all_tests.sh

add

  • 72819fab: bump_default_models.sh helper script — dry-run by default, --apply to write; syncs model names across docs, README, SKILL, ARCHITECTURE, and docker/env.example

doc

  • 9d7802aa, ec09cf1e, 71560800, c400738f, 96f9ac23: CLAUDE.md and ARCHITECTURE.md updated with new settings documentation requirements, sphinx-apidoc command, bump_default_models.sh usage, and citation feature docs
  • 057bb75f: Added DeepWiki badge to README
  • 3f59670e: SVG updated to remove outdated default model reference"}

Commits details since the last release

bumpver.toml
docs/source/conf.py
setup.py
wdoc/wdoc.py

  • [1d07163] by @thiswillbeyourgithub, 31 minutes ago:
    test: warn instead of crash when whisper test hits 502 error
    A 502 from the whisper endpoint means the upstream is unavailable, not
    that the code under test failed. Skip the test in that case so the run
    reports not-tested rather than a false negative.

tests/test_wdoc.py

tests/test_wdoc.py

tests/test_wdoc.py

  • [61c8238] by @thiswillbeyourgithub, 7 hours ago:
    test: crash early when required API key for a test model is missing
    Check that OPENROUTER_API_KEY / OPENAI_API_KEY are defined when any of the
    test models (or their default-model fallbacks) starts with 'openrouter/' or
    'openai/'. Fails fast at import time with a clear message instead of
    producing opaque auth errors deep in a test run.

Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com

tests/test_wdoc.py

  • [8380d8a] by @thiswillbeyourgithub, 7 hours ago:
    test: skip ollama embedding test when ollama port is unreachable
    Probe OLLAMA_HOST (default 127.0.0.1:11434) before test_ollama_embeddings
    and skip with a clear message instead of failing when ollama is not running.

Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com

tests/test_wdoc.py

  • [ec09cf1] by @thiswillbeyourgithub, 10 hours ago:
    doc: mention bump_default_models.sh in CLAUDE.md and ARCHITECTURE.md
    CLAUDE.md gets a new section explaining when and how to run the helper.
    ARCHITECTURE.md gets a short pointer next to the default-models table.

Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com

ARCHITECTURE.md
CLAUDE.md

  • [be3dced] by @thiswillbeyourgithub, 10 hours ago:
    change: prefix default models with 'openrouter/'
    deepseek/deepseek-v4-pro -> openrouter/deepseek/deepseek-v4-pro
    deepseek/deepseek-v4-flash -> openrouter/deepseek/deepseek-v4-flash

Routes the defaults through OpenRouter rather than calling the deepseek
provider directly. Applied via bump_default_models.sh --apply.

Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com

README.md
SKILL.md
docker/env.example
wdoc/docs/help.md
wdoc/utils/env.py

  • [72819fa] by @thiswillbeyourgithub, 10 hours ago:
    add: bump_default_models.sh helper
    Bumpver-style script for changing WDOC_DEFAULT_MODEL and
    WDOC_DEFAULT_QUERY_EVAL_MODEL: reads the current values from
    wdoc/utils/env.py, replaces both the full id and its basename across
    docs/README/SKILL/ARCHITECTURE, and re-syncs key=value lines in
    docker/env.example. Dry-run by default; --apply to write; never commits.

Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com

bump_default_models.sh

wdoc/docs/svg/summary.svg

  • [14ac41e] by @thiswillbeyourgithub, 10 hours ago:
    change: switch default models to deepseek-v4-pro / deepseek-v4-flash
    Update WDOC_DEFAULT_MODEL and WDOC_DEFAULT_QUERY_EVAL_MODEL defaults, and
    align README, SKILL, ARCHITECTURE, and help docs to match.

Co-Authored-By: Claude Opus 4.7 noreply@anthropic.com

ARCHITECTURE.md
README.md
SKILL.md
docker/env.example
wdoc/docs/help.md
wdoc/utils/env.py

tests/run_all_tests.sh

README.md

CLAUDE.md

  • [7156080] by @thiswillbeyourgithub, 4 weeks ago:
    doc: add new settings documentation requirements to CLAUDE.md
    Detail that new settings must be documented in help.md and examples.md,
    explain env var re-read behavior, list misc.py variables to keep updated,
    and add a guide for adding new filetype support.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

CLAUDE.md

  • [b555a90] by @thiswillbeyourgithub, 4 weeks ago:
    fix: coerce int to float for cli_kwargs type checking
    The Gradio UI can send integer values (e.g. 0, 1) for float parameters
    like doccheck_min_lang_prob, causing a type check failure. Now int
    values are automatically coerced to float when float is the expected type.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

wdoc/wdoc.py

CLAUDE.md

wdoc/utils/tasks/summarize.py

  • [9d7802a] by @thiswillbeyourgithub, 4 weeks ago:
    doc: document citation features in help, examples, README FAQ, and add tests
  • Add citation_url_template docs to help.md
  • Add PDF citation example to examples.md
  • Add FAQ entry about source citations in README.md
  • Add unit tests for source_replace anchor links and citation URL template
  • Fix double-bracket bug in citation URL link generation

Developed with Claude Code.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

README.md
tests/test_wdoc.py
wdoc/docs/examples.md
wdoc/docs/help.md
wdoc/utils/tasks/summarize.py

  • [e5d7840] by @thiswillbeyourgithub, 4 weeks ago:
    feat: add citation_url_template parameter for clickable citation links
    When set (e.g. "{source}#page={page}"), page citations like [p.42]
    become markdown links p.42 in summary output.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

wdoc/utils/tasks/summarize.py
wdoc/wdoc.py

  • [dcdecc8] by @thiswillbeyourgithub, 4 weeks ago:
    feat: add page citations to summaries with hybrid LLM + Python fallback
  • Prompt instructs LLM to add [p.N] citations from chunk_metadata
  • Python post-processing adds fallback citations to uncited top-level bullets
  • Multi-file summaries use [p.N, filename.pdf] format
  • Ambiguous filenames resolved with shortest distinguishing path

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

wdoc/utils/prompts.py
wdoc/utils/tasks/summarize.py

  • [6cbf025] by @thiswillbeyourgithub, 4 weeks ago:
    feat: inject per-chunk metadata (page, source) as XML into summarization input
    Each chunk now includes its page number and source path as XML metadata
    before the text content, giving the LLM context for citations.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

wdoc/utils/tasks/summarize.py

  • [8f72df6] by @thiswillbeyourgithub, 4 weeks ago:
    feat: replace WDOC_ID plain numbers with markdown anchor links in query output
    Citations now render as N instead of plain numbers,
    and document sections include HTML anchors for in-page navigation.

Co-Authored-By: Claude Opus 4.6 noreply@anthropic.com

wdoc/utils/tasks/query.py
wdoc/wdoc.py

Release 5.0.0

04 Jan 15:08

Choose a tag to compare

What's new

Major Release: Docker Web UI, Python 3.13 Support, and Architecture Improvements

This major release introduces experimental Docker-based web interface, upgrades Python version requirements, migrates to modern LangChain modules, and includes breaking changes with license updates.

✨ Features

🔧 Refactoring & Breaking Changes

  • Python Version Upgrade [126026f, 2d1f8ab, b476dba]

    • Require Python 3.13+ (breaking change)
    • Update to Python 3.13.5
    • Add audioop-lts post-install script for Python 3.13+
  • LangChain Migration [830edd4, 4eb303c, 4763b55, 908d536, e190d60, 5846ae0]

    • Migrate to langchain_core and langchain_text_splitters modules
    • Update imports from outdated langchain modules
    • Require langchain >= 1.2.0
    • Update CacheBackedEmbeddings import paths
  • License Change [f30fcda, b9e8eb2]

    • Switch from GPLv3 to AGPLv3 (breaking change)
  • Async Operations [c412233]

    • Use asyncio tqdm instead of regular tqdm for better async support

📚 Documentation

🐛 Fixes

📦 Dependencies

  • [32ec5bd] Bump langchain-litellm dependency
  • [23bef7d] Add docker documentation to MANIFEST.in"}

Commits details since the last release

bumpver.toml
docs/source/conf.py
setup.py
wdoc/wdoc.py

images/diagram_query.png
images/diagram_search.png
images/diagram_summary.png

images/diagram_query.mmd
images/diagram_search.mmd
images/diagram_summary.mmd

README.md

images/all.mmd
images/all.png

docs/source/index.rst

README.md

README.md

README.md

README.md

README.md
docker/README.md

README.md

wdoc/utils/batch_file_loader.py
wdoc/utils/embeddings.py
wdoc/utils/filters.py
wdoc/utils/loaders/anki.py
wdoc/utils/loaders/pdf.py
wdoc/utils/tasks/query.py
wdoc/utils/tasks/summarize.py
wdoc/wdoc.py

wdoc/wdoc.py

MANIFEST.in

docker/README.md
docker/gui.py

docker/README.md
images/gradio_interface.png

setup.py

wdoc/utils/embeddings.py
wdoc/utils/misc.py
wdoc/wdoc.py

wdoc/utils/misc.py

wdoc/utils/embeddings.py

wdoc/utils/embeddings.py

setup.py
wdoc/utils/retrievers.py

setup.py

wdoc/utils/embeddings.py

wdoc/utils/customs/compressed_embeddings_cacher.py
wdoc/utils/embeddings.py

Read more

Release 4.1.2

28 Oct 09:12

Choose a tag to compare

What's new

What's new

This patch release fixes an optional dependency installation issue and improves hashing performance.

🐛 Fixes

  • Fixed chonkie optional install dependency name (semantic not semantics) [18a2f9d5]

⚡ Performance

  • Switched from SHA256 to BLAKE3 for faster hashing [835e43dc]
    • Updated in setup.py, tests/test_parsing.py, and wdoc/utils/misc.py

Commits details since the last release

bumpver.toml
docs/source/conf.py
setup.py
wdoc/wdoc.py

setup.py

setup.py
tests/test_parsing.py
wdoc/utils/misc.py

Release 4.1.1

27 Oct 18:11

Choose a tag to compare

What's new

This release focuses on integrating Chonkie for semantic chunking, improving test reliability, and code quality enhancements through comprehensive linting.

Features

  • Chonkie Semantic Chunking Integration
    • Implemented ChonkieSemanticSplitter using semantic chunking with memoization ([081e81a])
    • Added transform_documents method to ChonkieSemanticSplitter ([534cc90])
    • Replaced RecursiveCharacterTextSplitter with ChonkieSemanticSplitter in summarize.py ([77f1652])
    • Added chonkie to requirements ([7234f86])
    • Merged chonkie branch into dev ([f89390a])

Fixes

  • Logging & Display

    • Fixed colors not appearing in loguru ([99502e7])
    • Fixed wrong logic for stdout color ([83e7fb9])
  • Parsing & Type Hints

    • Allow LLM to mention "thinking" inside its thinking ([d2bca84])
    • Fixed error message when parsing thinking ([a50ec42])
    • Fixed typehint error for topk autoincrease ([615828a])

Refactor

  • Split batch file loader into two files ([a0420fd])
  • Comprehensive ruff linter run across codebase ([d9f7eac])
  • Switched from black to ruff ([2d8a51b])
  • Made ruff configuration less strict ([e04fc8d])

Tests

  • DDG Test Improvements
    • Finally fixed DDG error not capturing output ([e1b2a87])
    • Capture DDG output properly ([d9c5ae9])
    • Set max DDG results to 10 to reduce failures ([ccdffd1])
    • Print output before error message ([66bf47c])
    • Better way to print output ([96c5186])
    • Don't use alias of grep ([a02ffbc])

Chore

Commits details since the last release

bumpver.toml
docs/source/conf.py
setup.py
wdoc/wdoc.py

tests/test_cli.sh

tests/test_cli.sh

tests/test_cli.sh

README.md

tests/test_cli.sh

tests/test_cli.sh

tests/test_cli.sh
tests/test_wdoc.py

wdoc/utils/batch_file_loader.py
wdoc/utils/load_recursive.py

setup.py

wdoc/utils/misc.py

  • [77f1652] by @thiswillbeyourgithub, 29 hours ago:
    refactor: replace RecursiveCharacterTextSplitter with ChonkieSemanticSplitter in summarize.py
    Co-authored-by: aider (openrouter/anthropic/claude-sonnet-4.5) aider@aider.chat

wdoc/utils/tasks/summarize.py

wdoc/utils/misc.py

wdoc/utils/tasks/query.py

wdoc/wdoc.py

wdoc/utils/logger.py

wdoc/utils/logger.py

wdoc/utils/logger.py

wdoc/utils/misc.py

wdoc/utils/misc.py

README.md

README.md

README.md

README.md

README.md

scripts/AnkiFiltered/AnkiFilteredDeckCreator.py
scripts/NtfySummarizer/NtfySummarizer.py
scripts/TheFiche/TheFiche.py
tests/test_parsing.py
tests/test_vectorstores.py
tests/test_wdoc.py
wdoc/main.py
wdoc/utils/batch_file_loader.py
wdoc/utils/customs/binary_faiss_vectorstore.py
wdoc/utils/customs/litellm_embeddings.py
wdoc/utils/embeddings.py
wdoc/utils/env.py
wdoc/utils/filters.py
wdoc/utils/interact.py
wdoc/utils/llm.py
wdoc/utils/loaders/init.py
wdoc/utils/loaders/anki.py
wdoc/utils/loaders/local_audio.py
wdoc/utils/loaders/local_html.py
wdoc/utils/loaders/local_video.py
wdoc/utils/loaders/logseq_markdown.py
wdoc/utils/loaders/online_media.py
wdoc/utils/loaders/pdf.py
wdoc/utils/loaders/shared_audio.py
wdoc/utils/loaders/youtube.py
wdoc/utils/misc.py
wdoc/utils/prompts.py
wdoc/utils/retrievers.py
wdoc/utils/tasks/parse.py
wdoc/utils/tasks/query.py
wdoc/utils/tasks/shared_query_search.py
wdoc/utils/tasks/types.py
wdoc/wdoc.py

.pre-commit-config.yaml

.pre-commit-config.yaml
setup.py

Release 4.1.0

21 Oct 18:25

Choose a tag to compare

What's new

What's new

This release focuses on robustness improvements, particularly around language detection, file loading, and error handling.

Features

  • Task type system: Introduced dataclass-based task type storage for better type safety [7c95e3c]
  • Source tag logging: Added failure count and success rate tracking to source tag logging [69dca45]

Fixes

  • PowerPoint loader: Fixed TypeError when loading PowerPoint files [ebfc66c]
  • Anki loader: Resolved forward reference error [73924e1]
  • Language detection: Fixed potential edge case issue [2d928ab]
  • Infinite loop detection:
    • Replaced simple loop counter with hash-based detection mechanism [bb147b3]
    • Adjusted loop counter threshold [fcf9ca5]

Enhancements

  • Language detection improvements:
    • Better exception handling [0b9c6da]
    • Reduced debug log verbosity [d7589cc]
    • General improvements [c0e2ce7]
  • Batch file loader: Reduced verbosity of progress logging [d207d98]
  • Testing: Improved model detection logic [5257c5a]
  • Post-install: Use logger.error instead of print during installation [c0795e9]

Refactoring

  • wdoc class: Added dynamic interaction_settings property [f806b98]
  • Type hints: Improved type annotations across multiple modules [a94a889, 920e5d3]

Documentation

  • Help text: Fixed powerpoint filetype documentation incorrectly mentioning .doc/.docx instead of .ppt/.pptx [e9b29eb]

Dependencies

  • Bumped litellm to enable latest OpenRouter pricing [577e6f6]

Maintenance

  • Removed debug print statement [80f7f32]
  • Better warning messages [faa5d3b]
  • Fixed setup.py logger usage [4a672c1]

Commits details since the last release

bumpver.toml
docs/source/conf.py
setup.py
wdoc/wdoc.py

wdoc/utils/misc.py

wdoc/utils/loaders/init.py

wdoc/utils/misc.py

wdoc/utils/misc.py

wdoc/utils/misc.py

wdoc/utils/batch_file_loader.py

wdoc/wdoc.py

wdoc/wdoc.py

wdoc/main.py
wdoc/utils/batch_file_loader.py
wdoc/utils/loaders/init.py
wdoc/utils/misc.py
wdoc/utils/retrievers.py
wdoc/utils/tasks/parse.py
wdoc/utils/tasks/summarize.py
wdoc/utils/tasks/types.py
wdoc/wdoc.py

wdoc/utils/llm.py
wdoc/utils/misc.py
wdoc/wdoc.py

wdoc/utils/loaders/anki.py

wdoc/utils/loaders/powerpoint.py

wdoc/utils/batch_file_loader.py

wdoc/utils/loaders/pdf.py

wdoc/utils/batch_file_loader.py

setup.py

wdoc/utils/batch_file_loader.py

wdoc/utils/batch_file_loader.py

wdoc/docs/help.md

setup.py

setup.py

Release 4.0.2

15 Oct 08:51

Choose a tag to compare

What's new

What's new

This release focuses on bug fixes, performance improvements, and code cleanup related to docstore filtering and retriever functionality.

🐛 Fixes

  • Docstore filtering improvements

    • Fixed missing arguments when calling filter_docstore ([1a2442d])
    • Fixed type hints for filter_docstore ([d96c2f3]) and create_filter_metadata ([ee9cc6f])
    • Corrected docstore serialization behavior ([9c1d967])
  • Retriever fixes

    • Fixed parent retriever when loading from embeddings ([cf9171d])
    • Fixed type hint for retrievers in edge cases ([39951a8])

⚡ Performance

  • Do not store nor serialize the unfiltered docstore ([d29d3a5], [a9a0a35])
    • Renamed filter_docstore to filter_vectorstore for clarity

✨ Features

  • Added timing measurements for docstore serialization and deletion ([375c1a1])

🧹 Chores

Commits details since the last release

bumpver.toml
docs/source/conf.py
setup.py
wdoc/wdoc.py

wdoc/utils/filters.py
wdoc/wdoc.py

wdoc/utils/filters.py
wdoc/wdoc.py

wdoc/utils/retrievers.py

wdoc/utils/retrievers.py

wdoc/utils/retrievers.py

wdoc/utils/filters.py

wdoc/utils/filters.py

wdoc/utils/filters.py
wdoc/wdoc.py

wdoc/utils/filters.py

wdoc/utils/filters.py

wdoc/wdoc.py

Release 4.0.1

07 Oct 23:02

Choose a tag to compare

What's new

What's new

This release focuses on langfuse v3 compatibility and improved error handling.

🐛 Fixes

  • Langfuse v3 compatibility

    • [89f5132] Update callback import for langfuse v3
    • [07257e0] Use langfuse opentelemetry for v3
  • Document loading robustness

    • [3039bcf] Prevent crash when no documents remain after transform_documents
    • [101c7f7] Add assertion to verify documents were found

📝 Documentation

  • [56866d1] Add warning for using youtube audio backend instead of whisper or deepgram

🔧 Maintenance

  • [fb49e60] Bump version 4.0.0 → 4.0.1

Commits details since the last release

bumpver.toml
docs/source/conf.py
setup.py
wdoc/wdoc.py

wdoc/utils/misc.py

wdoc/utils/loaders/youtube.py

wdoc/utils/misc.py

wdoc/utils/loaders/init.py

wdoc/utils/loaders/init.py

Release 4.0.0

05 Oct 16:36

Choose a tag to compare

What's new

What's new

This release focuses on major performance improvements through lazy loading and deferred imports, extensive code refactoring for better maintainability, and improved testing infrastructure.

⚡ Performance

🔧 Fixes

♻️ Refactoring

  • Modularized loaders: Split monolithic loader file into separate modules [df1a0ad, d3ed873, f0a3fce, b249068, 984a8d3, def441f, fb421cc]
    • Created dedicated files for PDF, Anki, URL, audio, HTML, and other loaders
    • Enabled lazy loading of loader modules [7fc5fad]
  • Extracted task-specific functions to separate modules:
    • Moved parse_doc to utils/tasks/parse.py [1c7c6e4]
    • Moved query/search retrieval logic to task modules [7982051, c2e6142]
    • Moved evaluate_doc_chain to shared_query_search.py [8965c48]
    • Extracted query splitting logic to shared utility [4bb54a5]
    • Moved source_replace to query.py [0ce5f4f]
    • Moved autoincrease_top_k to query.py [38e82b4]
  • Split search and query task methods with better type hints [1d94644, 824f395, 319b8eb]
  • Moved debug_exceptions to logger module [99cc99f]
  • Moved VectorStore filtering code to filters.py [de4ce57]
  • Added wdocSummary dataclass for type hinting [9fc51c0, 92f5c47]
  • Added lazy caching for all_texts property [79b1661, 7b45948]
  • Removed obsolete import_tricks.py [5116616]

🧪 Testing

  • Improved test cleanup and temp folder removal [a768642, 35ef63e, c149f5d, 913378a]
  • Better verbose output in cost tests [342ad3f]
  • Use Mistral for OpenRouter API tests (zero data retention) [8f511dd]
  • Added shell-based CLI test script for more reliable testing [cc74a84, 4170567]
  • Added check for wdoc[full] installation [7cb9a3c]
  • Updated Ollama embedding test to use embeddingsgemma [4d47631]
  • Improved test assertions with more info [3d0f947]

📦 Dependencies

  • Bumped langchain version [98fd2cb]
  • Bumped litellm version [7aa2ce1]
  • Bumped langfuse version (litellm bug fix) [fc16e5e]
  • Updated general dependencies [616457c]
  • Added unstructured to required dependencies [c98d0e9]
  • Added bumpver to dev packages [54be0e2]

✨ Features

  • Added wdoc[full] installation option for all optional dependencies [6321942]
  • Added beartype runtime type validation for numpy arrays [691dbff]
  • Prioritize throughput and Groq when using OpenRouter [f049846]
  • Enable lazy loading of imports by default [7c2e397]

📝 Documentation

  • Updated default models to latest Gemini in README and help [761ddd1, 0868086, 78e562f]
  • Clarified that binary embeddings are not always better [fb611c4]
  • Added link explaining fixed cache of LLM issue [fdc3c64]
  • Improved docstrings for summarization functions [a06f570]
  • Added docstring for VectorStore filtering [2bd8dcb]

🎨 Code Quality

🔖 Version

  • Bumped version 3.3.1 → 4.0.0 [e1548c4]

Commits details since the last release

bumpver.toml
docs/source/conf.py
setup.py
wdoc/wdoc.py

setup.py

README.md

tests/run_all_tests.sh

tests/run_all_tests.sh

tests/run_all_tests.sh

tests/test_wdoc.py

tests/test_wdoc.py

tests/test_wdoc.py

tests/test_cli.sh

wdoc/wdoc.py

tests/run_all_tests.sh

wdoc/utils/tasks/query.py
wdoc/wdoc.py

wdoc/utils/llm.py
wdoc/utils/retrievers.py
wdoc/utils/tasks/query.py
wdoc/utils/tasks/summarize.py

wdoc/main.py
wdoc/wdoc.py

setup.py

setup.py

setup.py

wdoc/utils/misc.py

wdoc/utils/customs/litellm_embeddings.py
wdoc/utils/embeddings.py
wdoc/utils/llm.py
wdoc/utils/loaders/shared_audio.py
wdoc/utils/misc.py
wdoc/utils/retrievers.py
wdoc/utils/tasks/query.py
wdoc/utils/tasks/summarize.py
wdoc/wdoc.py

wdoc/init.py

wdoc/utils/loaders/shared_audio.py
wdoc/utils/misc.py

wdoc/utils/retrievers.py

wdoc/main.py
wdoc/utils/customs/binary_faiss_vectorstore.py
wdoc/utils/embeddings.py
wdoc/utils/env.py
wdoc/utils/interact.py
wdoc/utils/llm.py
wdoc/utils/misc.py
wdoc/utils/prompts.py
wdoc/utils/tasks/parse.py
wdoc/utils/tasks/query.py
wdoc/utils/tasks/search.py

Read more

Release 3.3.1

26 Jul 08:24

Choose a tag to compare

What's new

This release focuses on improving code quality through comprehensive type hint fixes and enhanced testing infrastructure.

🔧 Fixes

  • Type Hints: Comprehensive type hint improvements across the codebase

  • Model Compatibility: Fixed issue where some models consider <answer> as implying </think> ([09684bb])

  • Langchain Integration: Fixed callable_chain compatibility by creating runnables without decorators ([0c89cac])

✨ Enhancements

  • Type Checking: Replaced manual type checking with import hook system ([56b353a], [6b3ddab])
  • Logging: Reduced verbosity of litellm logging ([9a4a69c])
  • Search: Added duplicate check for DuckDuckGo search results ([f68c8a4])

🧪 Tests

  • Added comprehensive test for DuckDuckGo search functionality ([7dbd3c2])
  • Fixed existing CLI tests ([781f6d6])

📦 Version

  • Bumped version from 3.3.0 to 3.3.1 ([0690df9])

Commits details since the last release

bumpver.toml
docs/source/conf.py
setup.py
wdoc/wdoc.py

tests/test_wdoc.py

tests/test_cli.py

wdoc/utils/customs/binary_faiss_vectorstore.py

wdoc/utils/customs/binary_faiss_vectorstore.py

wdoc/utils/batch_file_loader.py

wdoc/utils/customs/binary_faiss_vectorstore.py

wdoc/wdoc.py

wdoc/utils/loaders.py

  • [e65abad] by @thiswillbeyourgithub, 20 hours ago:
    Revert "fix: typehint of load_one_doc"
    This reverts commit f0037b54ac5ce317442e672f12e1da266b58c5c1.

wdoc/utils/loaders.py

wdoc/utils/loaders.py

wdoc/utils/customs/binary_faiss_vectorstore.py

wdoc/utils/tasks/query.py

wdoc/wdoc.py

wdoc/utils/customs/binary_faiss_vectorstore.py

wdoc/utils/tasks/query.py

wdoc/utils/misc.py

wdoc/utils/customs/callable_runnable.py
wdoc/utils/misc.py
wdoc/utils/tasks/query.py
wdoc/wdoc.py

wdoc/main.py
wdoc/utils/batch_file_loader.py
wdoc/utils/embeddings.py
wdoc/utils/interact.py
wdoc/utils/llm.py
wdoc/utils/loaders.py
wdoc/utils/logger.py
wdoc/utils/misc.py
wdoc/utils/prompts.py
wdoc/utils/retrievers.py
wdoc/utils/tasks/query.py
wdoc/utils/tasks/summarize.py
wdoc/utils/typechecker.py
wdoc/wdoc.py

wdoc/init.py
wdoc/utils/customs/callable_runnable.py
wdoc/utils/misc.py
wdoc/utils/tasks/query.py
wdoc/utils/typechecker.py
wdoc/wdoc.py

wdoc/utils/customs/binary_faiss_vectorstore.py

wdoc/utils/prompts.py

wdoc/utils/prompts.py