Skip to content

Commit 4254265

Browse files
Fixing some changes
1 parent 73faa79 commit 4254265

File tree

11 files changed

+133
-57
lines changed

11 files changed

+133
-57
lines changed

.github/workflows/publish.yml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ jobs:
2525
python -m pip install build twine
2626
2727
- name: Build package
28+
working-directory: packages/markitdown/src # Change to the correct directory
2829
run: |
2930
rm -rf dist # Ensure a clean build
3031
python -m build
@@ -33,8 +34,9 @@ jobs:
3334
run: echo "::add-mask::$TEST_PYPI_API_TOKEN"
3435

3536
- name: Publish to TestPyPI
37+
working-directory: packages/markitdown/src # Change to the correct directory
3638
env:
3739
TEST_PYPI_API_TOKEN: ${{ secrets.TEST_PYPI_API_TOKEN }}
3840
run: |
3941
python -m twine upload --repository testpypi dist/* \
40-
--username __token__ --password $TEST_PYPI_API_TOKEN --verbose
42+
--username __token__ --password $TEST_PYPI_API_TOKEN

README.md

Lines changed: 61 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,58 +1,92 @@
11
# Openize.MarkItDown
22

3+
![Python Version](https://img.shields.io/badge/python-3.7%2B-blue)
4+
![License](https://img.shields.io/badge/license-MIT-green)
5+
![Status](https://img.shields.io/badge/status-alpha-orange)
6+
37
Openize.MarkItDown is a Python package that converts documents into Markdown format. It supports multiple file formats, provides flexible output handling, and integrates with LLMs for extended processing.
48

59
## Features
610

711
- Convert `.docx`, `.pdf`, `.xlsx`, and `.pptx` to Markdown.
8-
- Save Markdown files locally or insert them into an LLM.
12+
- Save Markdown files locally or send them to an LLM for processing.
913
- Structured with the **Factory & Strategy Pattern** for scalability.
10-
- Works on Windows and Linux-compatible paths.
14+
- Works with Windows and Linux-compatible paths.
15+
- Command-line interface for easy use.
16+
17+
## Requirements
18+
19+
This package depends on the Aspose libraries, which are commercial products:
20+
21+
- [Aspose.Words](https://purchase.aspose.com/buy/words/python)
22+
- [Aspose.Cells](https://purchase.aspose.com/buy/cells/python)
23+
- [Aspose.Slides](https://purchase.aspose.com/buy/slides/python)
24+
25+
You'll need to obtain valid licenses for these libraries separately. The package will install these dependencies, but you're responsible for complying with Aspose's licensing terms.
1126

1227
## Installation
1328

14-
1. Clone the repository:
29+
### From TestPyPI
1530

16-
```sh
17-
git clone https://github.com/openize-com/Openize.MarkItDown.git
18-
cd Openize.MarkItDown
19-
```
31+
```sh
32+
pip install -i https://test.pypi.org/simple/ openize-markitdown
33+
```
2034

21-
3. (Optional) Install the package:
35+
### From Source
2236

23-
```sh
24-
pip install -i https://test.pypi.org/simple/ Openize.MarkItDown
25-
```
37+
```sh
38+
git clone https://github.com/openize-com/Openize.MarkItDown.git
39+
cd Openize.MarkItDown
40+
pip install -e .
41+
```
2642

2743
## Usage
2844

29-
### Convert Documents to Markdown
45+
### Command Line Interface
3046

31-
```python
32-
from packages.markitdown.src.openize.markitdown.processor import DocumentProcessor
47+
```sh
48+
# Convert a file and save locally
49+
markitdown document.docx
3350

34-
processor = DocumentProcessor()
51+
# Specify output directory
52+
markitdown document.docx -o output_folder
3553

36-
# Convert files and save locally
37-
processor.process_document("document.docx", insert_into_llm=False)
38-
processor.process_document("presentation.pptx", insert_into_llm=False)
39-
processor.process_document("spreadsheet.xlsx", insert_into_llm=False)
40-
processor.process_document("sample.pdf", insert_into_llm=False)
54+
# Process with an LLM (requires OPENAI_API_KEY environment variable)
55+
markitdown document.docx --llm
4156
```
4257

43-
### Insert Markdown into LLM
58+
### Python API
4459

4560
```python
46-
processor.process_document("filename.docx", insert_into_llm=True)
61+
from openize.markitdown import DocumentProcessor
62+
63+
# Initialize with custom output directory
64+
processor = DocumentProcessor(output_dir="my_markdown_files")
65+
66+
# Convert files and save locally
67+
processor.process_document("document.docx")
68+
processor.process_document("presentation.pptx")
69+
processor.process_document("spreadsheet.xlsx")
70+
processor.process_document("sample.pdf")
71+
72+
# Send to LLM for processing (requires OPENAI_API_KEY environment variable)
73+
processor.process_document("document.docx", insert_into_llm=True)
4774
```
4875

49-
## Running Tests
76+
## Environment Variables
5077

51-
Run `pytest` to validate all use cases:
78+
- `OPENAI_API_KEY`: Required when using the `insert_into_llm=True` option or the `--llm` flag.
79+
80+
## Running Tests
5281

5382
```sh
83+
# Install test dependencies
84+
pip install pytest pytest-mock
85+
86+
# Run the tests
5487
pytest
5588
```
89+
5690
## Contributing
5791

5892
We appreciate your interest in contributing to this project! To ensure a smooth collaboration, please follow these steps when submitting a pull request:
@@ -63,12 +97,10 @@ We appreciate your interest in contributing to this project! To ensure a smooth
6397
4. **Submit a Pull Request (PR)** – Once your changes are ready, open a PR with a clear description.
6498
5. **Review & Feedback** – Our maintainers will review your PR and provide feedback if needed.
6599

66-
By contributing, you agree to the terms of the CLA and confirm that your changes comply with the project’s licensing policies.
67-
68-
We appreciate your contributions and look forward to working with you!
100+
By contributing, you agree to the terms of the CLA and confirm that your changes comply with the project's licensing policies.
69101

70102
## License
71103

72-
This wrapper is licensed under the MIT License. However, it depends on [CommercialLibrary](https://purchase.aspose.com/pricing/), which is a proprietary, closed-source library.
104+
This package is licensed under the MIT License. However, it depends on Aspose libraries, which are proprietary, closed-source libraries.
73105

74-
⚠️ Users must obtain a valid license for [CommercialLibrary](https://purchase.aspose.com/pricing/) separately. This repository does not include or distribute any proprietary components.
106+
⚠️ Users must obtain a valid license for Aspose libraries separately. This repository does not include or distribute any proprietary components.

__init__.py

Whitespace-only changes.
File renamed without changes.

main.py renamed to packages/markitdown/src/main.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
import argparse
22
from pathlib import Path
3-
from packages.markitdown.src.openize.markitdown.processor import DocumentProcessor
3+
from openize.markitdown.processor import DocumentProcessor
44

55
def run_conversion():
66
processor = DocumentProcessor()
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# packages/markitdown/src/openize/markitdown/__init__.py
2+
"""
3+
Openize.MarkItDown - Convert documents to Markdown format
4+
5+
This package provides utilities to convert various document formats
6+
(.docx, .pdf, .xlsx, .pptx) to Markdown format.
7+
"""
8+
9+
__version__ = "0.1.0"
10+
11+
from .processor import DocumentProcessor
12+
from .converters import WordConverter, PDFConverter, ExcelConverter, PowerPointConverter
13+
from .factory import ConverterFactory
14+
from .llm_strategy import SaveLocally, InsertIntoLLM
15+
16+
__all__ = [
17+
'DocumentProcessor',
18+
'WordConverter',
19+
'PDFConverter',
20+
'ExcelConverter',
21+
'PowerPointConverter',
22+
'ConverterFactory',
23+
'SaveLocally',
24+
'InsertIntoLLM',
25+
]

packages/markitdown/src/setup.cfg

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
[metadata]
2+
name = openize-markitdown
3+
version =25.3.1
4+
author = Openize
5+
author_email = packages@openize.com
6+
description = A document converter for Word, PDF, Excel, and PowerPoint to Markdown.
7+
long_description = file: README.md
8+
long_description_content_type = text/markdown
9+
license = MIT
10+
license_files = LICENSE
11+
url = https://github.com/openize-com/openize-markitdown
12+
project_urls =
13+
Bug Tracker = https://github.com/openize-com/openize-markitdown/issues
14+
classifiers =
15+
Programming Language :: Python :: 3
16+
Programming Language :: Python :: 3.7
17+
Programming Language :: Python :: 3.8
18+
Programming Language :: Python :: 3.9
19+
Programming Language :: Python :: 3.10
20+
License :: OSI Approved :: MIT License
21+
Operating System :: OS Independent
22+
Topic :: Text Processing :: Markup :: Markdown
23+
24+
[options]
25+
package_dir =
26+
= packages
27+
packages = find_namespace:
28+
python_requires = >=3.7
29+
install_requires =
30+
aspose-words>=23.0.0
31+
aspose-cells>=23.0.0
32+
aspose-slides>=23.0.0
33+
openai>=1.0.0
34+
35+
[options.packages.find]
36+
where = packages
37+
38+
[options.entry_points]
39+
console_scripts =
40+
markitdown = packages.markitdown.src.openize.markitdown.main:main

setup.py renamed to packages/markitdown/src/setup.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@
22

33
setup(
44
name="openize-markitdown",
5-
version="25.3.0",
6-
author="Umar",
7-
author_email="umar320@gmail.com",
5+
version =25.3.1
6+
author = Openize
7+
author_email = packages@openize.com
88
description="A document converter for Word, PDF, Excel, and PowerPoint to Markdown.",
99
long_description=open("README.md", "r", encoding="utf-8").read(),
1010
long_description_content_type="text/markdown",

0 commit comments

Comments
 (0)