The FileX Web API provides a RESTful interface for indexing and searching files using semantic embeddings. The server loads embedding models once at startup and keeps them in memory for fast responses.
python filex-web.py [--host HOST] [--port PORT]Default: http://127.0.0.1:8000 (localhost only, for security)
Options:
--host: Host to bind to (default: 127.0.0.1)--port: Port to bind to (default: 8000)--reload: Enable auto-reload for development
Example:
python filex-web.py --host 127.0.0.1 --port 8000The server will be available at:
- API:
http://127.0.0.1:8000 - Interactive docs:
http://127.0.0.1:8000/docs - Alternative docs:
http://127.0.0.1:8000/redoc
Press Ctrl+C in the terminal where the server is running. The server will clean up resources and shut down gracefully.
GET /api/repositories?scan=true&start_path=...
List all available .filex repositories from the registry. The registry automatically saves discovered repositories and persists them across server restarts.
Query Parameters:
scan(optional): Whether to scan for new repositories (default: false). When true, scans the filesystem and adds any newly discovered repositories to the registry.start_path(optional): Starting path for search when scan=true (default: current directory)
Response:
{
"repositories": [
{
"repo_path": "/path/to/.filex",
"work_tree": "/path/to",
"name": "project_name"
}
],
"count": 1
}Note: Repositories are automatically registered when:
- They are discovered during a scan
- They are accessed via the index, search, or stats endpoints
- A new repository is created during indexing
The registry is stored at ~/.filex/registry.json and persists across server restarts.
POST /api/index
Index files in a repository. Indexing runs in the background, allowing the API to respond immediately.
Request Body:
{
"repo_path": "/path/to/repository", // Optional: if not provided, searches from current dir
"path": "/path/to/file_or_dir", // Optional: if not provided, indexes all files
"force": false, // Force reindexing even if unchanged
"recursive": true, // Recursively index subdirectories
"extensions": [".txt", ".docx"] // Optional: only index these extensions
}Response:
{
"task_id": "/path/to/.filex",
"status": "started",
"message": "Indexing started in background"
}Note: Use the task_id to check progress via /api/progress/{task_id}.
POST /api/search
Search indexed files using natural language queries.
Request Body:
{
"repo_path": "/path/to/repository", // Optional
"query": "types of fruits", // Search query text
"top_k": 10, // Number of results (1-100)
"include_images": false, // Include image data in results
"max_image_size_mb": 1.0 // Max image size to include (0.1-10.0 MB)
}Response:
{
"query": "types of fruits",
"results": [
{
"file_path": "/path/to/file.txt",
"file_name": "file.txt",
"chunk_index": 0,
"chunk_text": "Apples, bananas, oranges...",
"similarity_score": 0.8542,
"image_data": "data:image/png;base64,...", // Only if include_images=true
"image_size_mb": 0.5 // Only if include_images=true
}
],
"count": 10
}POST /api/stats
Get detailed statistics about a repository.
Request Body:
{
"repo_path": "/path/to/repository" // Optional
}Response:
{
"repository_path": "/path/to/.filex",
"work_tree": "/path/to",
"index_statistics": {
"total_indexed_files": 100,
"text_files": 80,
"image_files": 20,
"non_text_files": 20,
"total_chunks": 500
},
"search_statistics": {
"total_chunks": 500,
"unique_files": 100,
"embedding_dimension": 768
},
"storage_statistics": {
"embeddings_bytes": 1536000,
"embeddings_mb": 1.46,
"metadata_bytes": 51200,
"metadata_kb": 50.0,
"total_bytes": 1587200,
"total_mb": 1.51
},
"file_types": {
".txt": {
"count": 50,
"total_size": 1024000,
"total_chunks": 250
},
".png": {
"count": 10,
"total_size": 5120000,
"total_chunks": 10
}
},
"eligible_files_count": 150
}GET /api/progress/{repo_id}
Get indexing progress for a repository. The repo_id is the repository path (same as task_id from index response).
Response:
{
"status": "indexing", // "starting", "indexing", "completed", "error"
"indexed": 50,
"total": 100,
"errors": 0
}DELETE /api/progress/{repo_id}
Clear completed indexing progress for a repository.
Response:
{
"message": "Progress cleared"
}FileX supports multiple .filex repositories. Each repository is independent:
- Use
/api/repositoriesto discover all repositories - Specify
repo_pathin API requests to target a specific repository - Each repository maintains its own index, embeddings, and search data
# List repositories
curl http://127.0.0.1:8000/api/repositories
# Index all files in a repository
curl -X POST http://127.0.0.1:8000/api/index \
-H "Content-Type: application/json" \
-d '{
"repo_path": "/path/to/repository",
"force": false,
"recursive": true
}'
# Check indexing progress
curl http://127.0.0.1:8000/api/progress/%2Fpath%2Fto%2F.filex
# Search files
curl -X POST http://127.0.0.1:8000/api/search \
-H "Content-Type: application/json" \
-d '{
"repo_path": "/path/to/repository",
"query": "types of fruits",
"top_k": 5
}'
# Get statistics
curl -X POST http://127.0.0.1:8000/api/stats \
-H "Content-Type: application/json" \
-d '{
"repo_path": "/path/to/repository"
}'import requests
BASE_URL = "http://127.0.0.1:8000"
# List repositories
repos = requests.get(f"{BASE_URL}/api/repositories").json()
print(f"Found {repos['count']} repositories")
# Index files
response = requests.post(
f"{BASE_URL}/api/index",
json={
"repo_path": "/path/to/repository",
"force": False,
"recursive": True
}
)
task_id = response.json()["task_id"]
# Check progress
progress = requests.get(f"{BASE_URL}/api/progress/{task_id}").json()
print(f"Indexed: {progress['indexed']}/{progress['total']}")
# Search
results = requests.post(
f"{BASE_URL}/api/search",
json={
"repo_path": "/path/to/repository",
"query": "types of fruits",
"top_k": 10
}
).json()
print(f"Found {results['count']} results")
# Get statistics
stats = requests.post(
f"{BASE_URL}/api/stats",
json={"repo_path": "/path/to/repository"}
).json()
print(f"Total files: {stats['index_statistics']['total_indexed_files']}")- Models are loaded once at server startup (or on first request)
- Models remain in memory for the lifetime of the server
- This enables fast API responses without model loading overhead
- Indexing runs in background threads (not blocking API requests)
- Progress is tracked and can be queried via
/api/progress/{repo_id} - Multiple indexing tasks can run concurrently for different repositories
- Image data in search results is optional and memory-constrained
max_image_size_mbparameter limits image data size- Large images are excluded from results to prevent memory issues
- The server binds to
127.0.0.1by default (localhost only) - This prevents external access to your file system
- For production use, consider adding authentication and HTTPS
- Model Loading: 5-10 seconds on first startup (models cached after first download)
- Indexing: ~1-5 seconds per file (depends on file size and model)
- Search: <1 second for repositories with thousands of chunks
- API Response: <100ms for most endpoints (after models are loaded)