airmang
diff --git a/‎CHANGELOG.md‎
Lines changed: 13 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 13 additions & 0 deletions
diff --git a/‎DevDoc/hwpxskill_gap_audit.md‎
Lines changed: 193 additions & 0 deletions b/‎DevDoc/hwpxskill_gap_audit.md‎
Lines changed: 193 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 26 additions & 2 deletions b/‎README.md‎
Lines changed: 26 additions & 2 deletions
diff --git a/‎docs/usage.md‎
Lines changed: 24 additions & 0 deletions b/‎docs/usage.md‎
Lines changed: 24 additions & 0 deletions
diff --git a/‎pyproject.toml‎
Lines changed: 4 additions & 1 deletion b/‎pyproject.toml‎
Lines changed: 4 additions & 1 deletion
@@ -2,6 +2,19 @@
 
 모든 중요한 변경 사항은 이 문서에 기록됩니다. 형식은 [Keep a Changelog](https://keepachangelog.com/ko/1.1.0/)과 [Semantic Versioning](https://semver.org/lang/ko/)을 따릅니다.
 
+## [2.7] - 2026-03-08
+### 추가
+- `hwpx-unpack`, `hwpx-pack`, `hwpx-analyze-template` CLI를 추가했습니다.
+- `src/hwpx/tools/archive_cli.py`를 추가해 unpack/pack 워크플로를 패키지 레벨 도구로 승격했습니다.
+- unpack 시 `.hwpx-pack-metadata.json`을 기록하고, pack 시 이를 사용해 원본 ZIP 엔트리 순서/압축 방식을 가능한 범위에서 보존하도록 했습니다.
+- `src/hwpx/tools/template_analyzer.py`와 `DevDoc/hwpxskill_gap_audit.md`를 추가했습니다.
+
+### 변경
+- `scripts/office/unpack.py`, `scripts/office/pack.py`, `scripts/analyze_template.py`를 패키지 도구 래퍼로 정리했습니다.
+- `page_guard`에 shape/control count 및 히스토그램 비교를 추가하고, rendered page count가 아닌 layout-drift proxy임을 문서와 CLI 설명에 명시했습니다.
+- README와 `docs/usage.md`에 새 CLI 사용 예시를 추가했습니다.
+- 새 tooling에 대한 CLI/추출/overwrite/page-guard 회귀 테스트를 강화했습니다.
+
 ## [2.6] - 2026-03-08
 ### 추가
 - `hwpx-validate-package` CLI와 `hwpx.tools.package_validator`를 추가해 ZIP/OPC/HWPX 패키지 구조, `mimetype`, `container.xml`, manifest/spine 참조, XML well-formedness를 점검할 수 있게 했습니다.
 
@@ -0,0 +1,193 @@
+# hwpxskill Gap Audit
+
+## Scope
+
+This audit compares the practical workflow surface of `python-hwpx` against
+`Canine89/hwpxskill` without assuming the competitor's README claims are
+correct. The goal is to identify real workflow gaps, reuse existing engine
+abstractions where possible, and separate reproduced bugs from unverified
+assertions.
+
+## Current Repository Summary
+
+- Core engine: `src/hwpx/document.py`, `src/hwpx/opc/package.py`,
+  `src/hwpx/oxml/*`
+- Existing high-level tooling before this patch set:
+  - schema validator: `src/hwpx/tools/validator.py`
+  - text extraction engine: `src/hwpx/tools/text_extractor.py`
+  - object finder / exporters: `src/hwpx/tools/object_finder.py`,
+    `src/hwpx/tools/exporter.py`
+- Workflow-gap code already present at audit start:
+  - package validator: `src/hwpx/tools/package_validator.py`
+  - page guard: `src/hwpx/tools/page_guard.py`
+  - text extraction CLI: `src/hwpx/tools/text_extract_cli.py`
+  - script-only unpack/pack/analyze tools under `scripts/`
+
+## Confirmed Gaps vs hwpxskill
+
+These were confirmed by inspecting both repos and the current local checkout.
+
+1. Public unpack/pack workflow was incomplete.
+   - `python-hwpx` had script files for unpack/pack, but no package-level CLI
+     entry points such as `hwpx-unpack` / `hwpx-pack`.
+   - The existing pack/unpack scripts did not record archive entry order or
+     compression metadata, so they could not preserve original ZIP layout
+     details when repacking.
+   - Overwrite behavior was not explicit or safe.
+
+2. Template analysis workflow was incomplete.
+   - `python-hwpx` had a script for analyzing reference documents, but it was
+     not promoted to a package-level CLI, did not emit a structured JSON
+     summary, and had only a smoke test instead of extraction-focused tests.
+
+3. Page-guard coverage was narrower than requested.
+   - The existing page guard already acted as a structural drift detector, but
+     it did not count shape/control deltas yet.
+   - It also needed clearer documentation that it is a proxy/risk heuristic,
+     not a rendered page counter.
+
+4. Public docs lagged behind the implemented tooling.
+   - `README.md` still documented only `hwpx-validate` in the CLI section.
+   - The main usage guide did not document unpack/pack/analyze/page-guard/text
+     extraction workflows.
+
+5. Audit documentation itself was missing.
+   - There was no repository-local audit note separating verified findings from
+     competitor marketing claims.
+
+## Reusable Internals Confirmed
+
+These existing internals made it unnecessary to cargo-cult `hwpxskill`'s raw
+XML-first approach.
+
+- `src/hwpx/opc/package.py`
+  - `HwpxPackage.open()`
+  - `HwpxPackage.part_names()`
+  - `HwpxPackage.get_part()`
+  - `HwpxPackage.get_xml()`
+  - `HwpxPackage.header_paths()`
+  - `HwpxPackage.section_paths()`
+  - `HwpxPackage.main_content`
+- `src/hwpx/tools/validator.py`
+  - existing schema validation path
+- `src/hwpx/tools/text_extractor.py`
+  - existing traversal and text extraction engine
+- `src/hwpx/tools/page_guard.py`
+  - existing metrics collection shape that could be extended instead of replaced
+
+Conclusion: `python-hwpx` already had enough engine-level primitives to add the
+missing workflows without switching to competitor-style "raw XML everywhere".
+
+## Real Reproduced Bugs
+
+### 1. Validation dirty-state mutation (historical, now fixed)
+
+The concrete bug candidate worth treating seriously was whether validation
+mutated document state. That bug was real in the earlier implementation:
+
+- `HwpxDocument.validate()` serialized via `_to_bytes_raw()`
+- `_to_bytes_raw()` called `self._root.reset_dirty()`
+- Result: validating a modified document could clear the dirty state even when
+  the user had not saved yet
+
+That behavior is now covered by a regression test:
+
+- `tests/test_gap_closure_tools.py::test_validate_preserves_dirty_state`
+
+At the time of this audit, current `main` already contains the fix, so the bug
+does not reproduce anymore on HEAD.
+
+## Bugs I Could Not Reproduce
+
+These claims appeared in or were implied by `hwpxskill`, but I could not
+substantiate them from evidence in the current `python-hwpx` checkout.
+
+1. "python-hwpx API has many bugs"
+   - Too vague to verify.
+   - Current tests and integration flows do not support that broad claim.
+
+2. "High-level API editing necessarily destroys styles/structure"
+   - Not reproduced for ordinary paragraph/table editing in the current test
+     suite.
+   - Existing tests already cover roundtrip and style-preserving behavior.
+
+3. "page_guard detects actual page count changes"
+   - Not supported by the competitor implementation itself.
+   - Their script measures structural/text drift in `section0.xml`; it does not
+     compute rendered page count.
+
+4. Header/footer instability or TypeError complaints
+   - No current reproduction from repository tests.
+   - Existing `tests/test_section_headers.py` covers the public API surface.
+
+## Competitor Claims That Remain Unverified
+
+1. "XML-direct workflow preserves formatting almost exactly"
+   - Plausible for some templates, but not benchmarked here.
+   - No controlled comparison was performed in this patch.
+
+2. "Their workflow is more reliable for all existing documents"
+   - Not established.
+   - The competitor repo does not provide a broad evidence matrix for this.
+
+3. "Template replacement quality is universally better than the object API"
+   - Not established.
+   - Likely document-dependent.
+
+## Exact Files / Functions Inspected
+
+### Local repository
+
+- `pyproject.toml`
+- `README.md`
+- `docs/usage.md`
+- `src/hwpx/document.py`
+  - `HwpxDocument.validate`
+  - `HwpxDocument._to_bytes_raw`
+- `src/hwpx/opc/package.py`
+  - `HwpxPackage.open`
+  - `HwpxPackage.part_names`
+  - `HwpxPackage.get_part`
+  - `HwpxPackage.get_xml`
+  - `HwpxPackage.main_content`
+  - `HwpxPackage.header_paths`
+  - `HwpxPackage.section_paths`
+  - `HwpxPackage.save`
+- `src/hwpx/tools/validator.py`
+  - `validate_document`
+- `src/hwpx/tools/package_validator.py`
+  - `validate_package`
+- `src/hwpx/tools/page_guard.py`
+  - `collect_metrics`
+  - `compare_metrics`
+- `src/hwpx/tools/text_extractor.py`
+  - `TextExtractor.iter_sections`
+  - `TextExtractor.iter_paragraphs`
+  - `TextExtractor.extract_text`
+- `src/hwpx/tools/text_extract_cli.py`
+- `scripts/office/unpack.py`
+- `scripts/office/pack.py`
+- `scripts/analyze_template.py`
+- `tests/test_gap_closure_tools.py`
+- `tests/test_section_headers.py`
+- `.github/workflows/release.yml`
+- `.github/workflows/tests.yml`
+
+### Competitor repository (`Canine89/hwpxskill`)
+
+- `README.md`
+- `scripts/validate.py`
+- `scripts/page_guard.py`
+- `scripts/text_extract.py`
+- `scripts/analyze_template.py`
+
+## Patch Direction Chosen
+
+This first PR-equivalent patch should:
+
+1. promote unpack/pack/analyze into package-level tooling with CLI entry points
+2. keep using `python-hwpx` engine abstractions for package inspection and text
+   extraction
+3. extend page guard as a proxy detector, not as a fake page counter
+4. keep backward compatibility with existing `HwpxDocument` APIs
+5. strengthen tests and docs around the new tooling
@@ -98,7 +98,8 @@ doc.save_to_path("결과물.hwpx")
 | 🔎 **객체 검색** | 태그/속성/XPath | 특정 요소 탐색, 주석 이터레이터 |
 | 🎨 **스타일 치환** | 서식 기반 필터 | 색상/밑줄/charPrIDRef 기반 Run 검색 및 교체 |
 | 📤 **내보내기** | 텍스트/HTML/Markdown | 문서 변환 출력 |
-| ✅ **유효성 검사** | XSD 스키마 | CLI(`hwpx-validate`) 및 API |
+| ✅ **유효성 검사** | XSD + 패키지 구조 | CLI(`hwpx-validate`, `hwpx-validate-package`) 및 API |
+| 🧰 **워크플로 도구** | unpack/pack/template analyze/page guard | 템플릿 보존형 XML-first 작업 보조 |
 | 🏗️ **저수준 XML** | 데이터클래스 매핑 | OWPML 스키마 ↔ Python 객체 직접 조작 |
 | 🔄 **네임스페이스 호환** | 자동 정규화 | HWPML 2016 → 2011 자동 변환 |
 
@@ -195,10 +196,15 @@ python-hwpx
 │   ├── body.py          #   타입이 지정된 본문 모델
 │   └── common.py        #   범용 XML ↔ 데이터클래스
 ├── hwpx.tools
+│   ├── archive_cli      #   unpack/pack CLI 및 재패킹 메타데이터
 │   ├── text_extractor   #   텍스트 추출 파이프라인
+│   ├── text_extract_cli #   텍스트 추출 CLI
 │   ├── object_finder    #   객체 탐색 유틸리티
 │   ├── exporter         #   텍스트/HTML/Markdown 내보내기
-│   └── validator        #   스키마 유효성 검사 (hwpx-validate CLI)
+│   ├── validator        #   스키마 유효성 검사 (hwpx-validate CLI)
+│   ├── package_validator#   ZIP/OPC/HWPX 구조 검사
+│   ├── page_guard       #   layout-drift proxy
+│   └── template_analyzer#   레퍼런스 문서 분석/추출
 └── hwpx.templates       # 내장 빈 문서 템플릿
 ```
 
@@ -207,8 +213,26 @@ python-hwpx
 ```bash
 # HWPX 문서 스키마 유효성 검사
 hwpx-validate 문서.hwpx
+
+# ZIP/OPC/HWPX 패키지 구조 검사
+hwpx-validate-package 문서.hwpx
+
+# HWPX 풀기 / 다시 묶기
+hwpx-unpack 문서.hwpx ./unpacked
+hwpx-pack ./unpacked ./repacked.hwpx
+
+# 레퍼런스 템플릿 분석과 파트 추출
+hwpx-analyze-template 문서.hwpx --extract-dir ./template-parts --json
+
+# plain / markdown 텍스트 추출
+hwpx-text-extract 문서.hwpx --format markdown --output 문서.md
+
+# 레이아웃 드리프트 프록시 비교
+hwpx-page-guard --reference 원본.hwpx --output 결과.hwpx
 ```
 
+`hwpx-page-guard`는 렌더된 실제 쪽수를 계산하지 않습니다. 대신 단락 수, 표 수, shape/control 수, 명시적 page/column break, 텍스트 길이 통계를 비교해 레이아웃 드리프트 위험을 탐지하는 프록시 도구입니다.
+
 ## 문서
 
 | | |
 
@@ -2,6 +2,30 @@
 
 python-hwpx는 HWPX 컨테이너를 검증하고 편집하기 위한 여러 계층의 API를 제공합니다. 이 문서에서는 패키지 수준에서 문서를 여는 방법부터 문단과 주석을 다루는 고수준 도구까지 핵심 사용 패턴을 소개합니다.
 
+## CLI 워크플로
+
+라이브러리 API 외에도 템플릿 보존형 작업 흐름을 위한 CLI를 제공합니다.
+
+```bash
+# 패키지 구조 점검
+hwpx-validate-package sample.hwpx
+
+# XML-first 편집용 unpack / pack
+hwpx-unpack sample.hwpx ./sample-unpacked
+hwpx-pack ./sample-unpacked ./sample-repacked.hwpx
+
+# 템플릿 분석과 파트 추출
+hwpx-analyze-template sample.hwpx --extract-dir ./template-parts --json
+
+# 텍스트 추출
+hwpx-text-extract sample.hwpx --format markdown --output sample.md
+
+# 레이아웃 드리프트 프록시
+hwpx-page-guard --reference sample.hwpx --output edited.hwpx
+```
+
+`hwpx-page-guard`는 실제 렌더러의 쪽수를 계산하지 않고, 구조 및 텍스트 통계를 비교해 레이아웃 변화 위험을 탐지하는 프록시 검사기입니다.
+
 ## 빠른 예제 모음
 
 ### 예제 1: 문단 수 세기
 
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
 
 [project]
 name = "python-hwpx"
-version = "2.6"
+version = "2.7"
 description = "Hancom HWPX 패키지를 로드하고 편집하기 위한 Python 유틸리티 모음"
 readme = { file = "README.md", content-type = "text/markdown" }
 license = { file = "LICENSE" }
@@ -49,9 +49,12 @@ Documentation = "https://github.com/airmang/python-hwpx/tree/main/docs"
 Issues = "https://github.com/airmang/python-hwpx/issues"
 
 [project.scripts]
+hwpx-unpack = "hwpx.tools.archive_cli:unpack_main"
+hwpx-pack = "hwpx.tools.archive_cli:pack_main"
 hwpx-validate = "hwpx.tools.validator:main"
 hwpx-validate-package = "hwpx.tools.package_validator:main"
 hwpx-page-guard = "hwpx.tools.page_guard:main"
+hwpx-analyze-template = "hwpx.tools.template_analyzer:main"
 hwpx-text-extract = "hwpx.tools.text_extract_cli:main"
 
 [tool.setuptools]