Skip to content

fix: WBS base_name 不一致時のフォールバックマッチング追加#16

Open
sohei-t wants to merge 18 commits into
mainfrom
fix/scanner-fallback-matching
Open

fix: WBS base_name 不一致時のフォールバックマッチング追加#16
sohei-t wants to merge 18 commits into
mainfrom
fix/scanner-fallback-matching

Conversation

@sohei-t
Copy link
Copy Markdown
Owner

@sohei-t sohei-t commented Feb 19, 2026

Summary

  • WBS.json の base_name と実ファイル名が異なる場合、エピソード番号による自動マッチングでフォールバック
  • レベル対応ソート(introbasicintermediateadvanced)を追加
  • コンテンツ命名規則ドキュメント(CONTENT_NAMING_CONVENTION.md)を追加

Background

AI エージェントがコンテンツを生成する際、WBS.json の base_name と実際のファイル名でローマ字変換が異なるケースがある。「kaggle とデータ分析入門」では 103 トピック中 50 件が検出されず 53/103 と表示されていた。

Changes

  • scanner.py: _extract_episode_info(), _build_file_index(), _resolve_base_names() の3メソッドを追加
  • マッチング優先順位: 完全一致 → エピソード番号一意 → 接頭語絞り込み → サブフォルダ絞り込み
  • _detect_topics_from_files() のソートをレベル対応に改善

Test plan

  • 「kaggle とデータ分析入門」が 103/103 で表示されること
  • 既存22プロジェクトのスキャン結果に影響がないこと
  • レベル接頭語付きファイル名(intro-1-1 等)が正しく解析されること

Closes #14

🤖 Generated with Claude Code

sohei-t and others added 18 commits February 3, 2026 17:44
fix: Vue.js computed プロパティのundefinedエラーを修正
- init_db → get_database() に変更(存在しない関数の修正)
- progress.db → progress_tracker.db に変更(ファイル名の整合性)

原因: database.py の実際の初期化関数名とファイル名が
launch_app.command と一致していなかった

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add subfolder field to ParsedTopic dataclass (wbs_parser.py)
- Implement recursive folder scanning in _detect_topics_from_files (scanner.py)
- Update _scan_topic_files to handle subfolder paths
- Add database migration for subfolder column with updated UNIQUE constraint
- Add groupedTopics computed property for hierarchical display (app.js)
- Implement collapsible folder sections in topic list (index.html)
- Add folder expand/collapse animations (styles.css)

Changes:
- Scanner now recursively scans subfolders like 入門, 初級, 中級, 上級
- Topics are grouped by subfolder in the UI with progress indicators
- Each folder section shows completion stats and can be expanded/collapsed
- Database migration handles transition from old to new schema

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
feat: 再帰的フォルダスキャンと階層表示の実装
- Change file pattern from "starts with digit" to "contains digit-digit"
  (e.g., 1-1, 01-02, 2-3)
- Add project detection for folders with content/ directory (not just WBS.json)
- Add 'old' folder to exclusion list for both project and subfolder scanning
- Update _detect_topics_from_files and _scan_topic_files methods

This allows detection of files like:
- advanced_1-1.html
- basic_2-3.html
- intro_01-01_topic.html

Closes #5

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
feat: ファイル検出パターンを数値-数値形式に改善
- database.py: has_ssml, ssml_hashカラムをtopicsテーブルに追加
- scanner.py: SSMLファイル検出ロジック追加(xxx_ssml.txt形式)
- scanner.py: ファイルパターンを\d+[-_]\d+に拡張(1-1と1_1両方に対応)
- scanner.py: _ssmlで終わるファイルをスキップしてSSMLとTXTを分離
- api.py: has_ssmlのboolean変換を追加
- index.html: トピック詳細にSSMLインジケーター追加(紫色)

SSMLはGoogle Cloud TTS用のオプショナルファイルのため、
進捗計算には影響しない(SSML=0でも進捗100%可能)

Closes #7

Co-authored-by: Claude Opus 4.5 <noreply@anthropic.com>
- Add publication_status column to projects table with CHECK constraint
- Add migration for existing tables to add publication_status column
- Update update_project_settings function to handle publication_status
- Add API validation for publication_status values
- Add publication status dropdown in both card and list views
- Add getPublicationStatusLabel helper function for display labels
- Add onPublicationStatusChange handler for auto-save

Closes #10

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
feat: add publication status (free/paid/private) to projects
- Add publication_statuses table to database
- Migrate from TEXT column to publication_status_id INTEGER
- Add CRUD API endpoints for publication statuses
- Add publication status tab in settings page
- Update project dropdowns to use dynamic publication status list
- Add initial data: 非公開, 無料公開, 有料公開

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
feat: add publication status master management
Remove workflow, agent, and development files that should not be public:
- All CLAUDE.md, WORKFLOW_*.md, PHASE_*.md documentation
- Agent configuration files (agent_config.yaml, etc.)
- Development scripts (setup_*.sh, launch_*.sh, etc.)
- Test files and artifacts
- __pycache__ directories
- Internal documentation (SPEC.md, WBS.json, etc.)
- worktrees directory
- images directory
- src/ development tools

Update .gitignore to prevent these files from being added again.

The repository now only contains:
- project/ folder with the actual application
- README.md
- LICENSE
- .gitignore

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
chore: remove unnecessary files from repository
Replace the old AI Multi-Agent System Template README with proper
documentation for the Training Content Progress Tracker application.

- Add application overview and features
- Document tech stack (FastAPI, Vue.js 3, SQLite)
- Add setup and installation instructions
- Document API endpoints
- Add directory structure

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Show file counts as 'current/total' format
- Add warning icons (⚠️) for incomplete files
- Display missing files summary on project cards
- Add 'View incomplete' shortcut button in detail view
- Highlight missing files with red border and ✗ mark
- Show specific missing file types for in-progress topics
- Update filter labels to be more descriptive
- Add publication status dropdown filter
- Include 'unset' option to filter projects without status
- Filter logic in filteredProjects computed property
- Add delete_project method to database
- Scanner now checks if DB projects still exist on filesystem
- Automatically removes projects whose folders were deleted
- Also removes projects without WBS.json or content folder
…atches

When AI agents generate content files, the actual filenames often differ
from the base_name specified in WBS.json (e.g. romanization differences).
This caused the scanner to report missing files even when they existed.

Add episode number-based fallback matching with priority:
1. Exact base_name match (unchanged)
2. Unique episode number match
3. Episode number + level prefix narrowing
4. Episode number + subfolder narrowing

Also adds level-aware sorting and content naming convention docs.

Closes #14

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

WBS base_name と実ファイル名の不一致でコンテンツが検出されない

1 participant