-
Notifications
You must be signed in to change notification settings - Fork 10
feat: Add deployment scripts for source code startup and multi-instance support #37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
… ragflow_server in dcoker and deploy.sh # Conflicts: # .gitignore # scripts/README.md # scripts/deploy.sh
|
keyang.lk seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces comprehensive deployment and multi-instance support for RAGFlow by adding a new deployment script (deploy.sh) and enhancing Docker configurations to support running multiple ragflow_server instances for better scalability and load balancing.
Key Changes:
- New 1490-line deployment script with multi-instance support, component management, and port conflict detection
- Enhanced secret key management to ensure consistent JWT signing across multiple server instances
- Updated Docker entrypoint and nginx configuration for load balancing across multiple server instances
Reviewed changes
Copilot reviewed 13 out of 14 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/deploy.sh | Comprehensive deployment script (1490 lines) with multi-instance ragflow_server support, component toggles, port management, and service orchestration |
| scripts/README.md | Documentation for deployment script usage, configuration options, and multi-instance setup examples |
| docker/entrypoint.sh | Enhanced with multi-instance support, secret key management, and dynamic config generation |
| docker/nginx/ragflow.conf | Updated to use upstream load balancing with dynamically generated server lists |
| docker/service_conf.yaml.template | Added template variable support for ports and secret key |
| docker/docker-compose.yml | Added PowerRAG port mapping and notes for extra instance ports |
| docker/docker-compose-self-hosted-ob.yml | Made Gotenberg service optional via profiles |
| docker/.env.example | Added multi-instance configuration variables |
| mcp/server/server.py | Migrated from print statements to proper logging with file output |
| common/log_utils.py | Modified to support multiple log file basenames for multi-instance scenarios |
| common/constants.py | Made SERVICE_CONF configurable via RAGFLOW_SERVICE_CONF environment variable |
| api/ragflow_server.py | Added support for per-instance log basename configuration |
| api/utils/api_utils.py | Improved logging exception messages |
| .gitignore | Added generated files from deployment script |
| import os | ||
| import logging | ||
|
|
||
| from common.log_utils import init_root_logger | ||
| from plugin import GlobalPluginManager | ||
|
|
||
| import logging | ||
| import os | ||
| # Initialize logging as early as possible so show_configs() (INFO logs) won't be dropped. | ||
| # deploy.sh sets RAGFLOW_LOG_BASENAME=ragflow_server_<port> for multi-instance runs. | ||
| init_root_logger(os.environ.get("RAGFLOW_LOG_BASENAME", "ragflow_server")) | ||
| import signal | ||
| import sys |
Copilot
AI
Jan 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import statement moved after init_root_logger() call but before the imports it depends on. Lines 30-31 import signal and sys, which come after the init_root_logger() call on line 29. While this works, it creates an unusual import order. If init_root_logger() internally depends on these modules or if there are circular dependencies, this could cause issues. Consider keeping all imports together at the top of the file, or add a comment explaining why logging must be initialized before other imports.
| if [[ "${ENABLE_WEBSERVER}" -eq 1 ]]; then | ||
| echo "Starting ragflow_server..." | ||
| start_ragflow_servers | ||
|
|
||
| # nginx upstream include files are generated by start_ragflow_servers; | ||
| # start nginx after generation so it picks them up (no reload needed). | ||
| echo "Starting nginx..." | ||
| /usr/sbin/nginx | ||
|
|
||
| echo "Starting ragflow_server..." | ||
| while true; do | ||
| "$PY" api/ragflow_server.py ${INIT_SUPERUSER_ARGS} & | ||
| wait; | ||
| sleep 1; | ||
| done & | ||
| fi |
Copilot
AI
Jan 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing start logic for PowerRAG server. ENABLE_POWERRAG_SERVER is defined and set to 1 by default on line 37, and the start_powerrag_server() function is defined on lines 264-270, but the visible component startup section (lines 342-350) doesn't include a check to start the PowerRAG server. Based on the pattern for MCP server and admin server, there should be a corresponding if block checking ENABLE_POWERRAG_SERVER and calling start_powerrag_server.
|
|
||
| ```bash | ||
| ./scripts/deploy.sh start \ | ||
| --enable-webserver |
Copilot
AI
Jan 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing line continuation backslash. Line 106 ends with "--enable-webserver" but there's no backslash before the newline, causing the command to be incomplete. This will result in a syntax error when users try to copy-paste this command.
| --enable-webserver | |
| --enable-webserver \ |
| # - set per-instance logfile basename so logs are split by port | ||
| RAGFLOW_SERVICE_CONF="${conf_name}" \ | ||
| RAGFLOW_LOG_BASENAME="ragflow_server_${port}" \ | ||
| "$PY" api/ragflow_server.py ${INIT_SUPERUSER_ARGS} & |
Copilot
AI
Jan 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable INIT_SUPERUSER_ARGS is used on line 303 but is not defined anywhere in the visible changes. If this variable was previously defined and is still needed, it should be initialized with a default value (e.g., INIT_SUPERUSER_ARGS="${INIT_SUPERUSER_ARGS:-}") to prevent potential undefined variable errors when running with "set -u".
|
|
||
| def get_data_error_result(code=RetCode.DATA_ERROR, message="Sorry! Data missing!"): | ||
| logging.exception(Exception(message)) | ||
| logging.exception(f"Data error: {message}") |
Copilot
AI
Jan 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logging.exception() call here will log an exception that wasn't caught - it's not inside an exception handler. Use logging.error() instead of logging.exception() when you're not in an exception context. logging.exception() should only be used within an except block to include the traceback.
| logging.exception(f"Data error: {message}") | |
| logging.error(f"Data error: {message}") |
| #### 启动 task executors(range 模式) | ||
|
|
||
| ```bash | ||
| ./scripts/deploy.sh start --enable-taskexecutor\ |
Copilot
AI
Jan 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing line continuation backslash. Line 121 ends with "--enable-taskexecutor" but there's no backslash before the newline, causing the command to be incomplete. This will result in a syntax error when users try to copy-paste this command.
| ./scripts/deploy.sh start --enable-taskexecutor\ | |
| ./scripts/deploy.sh start --enable-taskexecutor \ |
| ENABLE_DATASYNC=${ENABLE_DATASYNC:-1} | ||
| ENABLE_MCP_SERVER=${ENABLE_MCP_SERVER:-0} | ||
| ENABLE_ADMIN_SERVER=${ENABLE_ADMIN_SERVER:-0} # Default close admin server | ||
| ENABLE_POWERRAG_SERVER=${ENABLE_POWERRAG_SERVER:-1} # Default close PowerRAG server |
Copilot
AI
Jan 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default value for ENABLE_POWERRAG_SERVER is inconsistent. Line 37 sets it to 1 (enabled by default), but this contradicts the comment on line 36 which says "Default close PowerRAG server" and the pattern established by other optional components (MCP and ADMIN_SERVER) which default to 0. The default should likely be 0 to match the comment and the "optional component" pattern.
| ENABLE_POWERRAG_SERVER=${ENABLE_POWERRAG_SERVER:-1} # Default close PowerRAG server | |
| ENABLE_POWERRAG_SERVER=${ENABLE_POWERRAG_SERVER:-0} # Default close PowerRAG server |
|
|
||
| if [[ "${idx}" -eq 0 ]]; then | ||
| port="${SVR_HTTP_PORT}" | ||
| conf_name="local.service_conf.yaml" |
Copilot
AI
Jan 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent configuration file naming between deploy.sh and entrypoint.sh. In deploy.sh (line 647), the main instance uses GLOBAL_SERVICE_CONF (defaults to "service_conf.yaml"), while in entrypoint.sh (lines 282, 319), it uses "local.service_conf.yaml". This inconsistency could cause configuration issues when switching between deployment modes. Consider using a consistent naming scheme or documenting this intentional difference.
| conf_name="local.service_conf.yaml" | |
| conf_name="service_conf.yaml" |
| ) | ||
|
|
||
| from peewee import OperationalError | ||
| from werkzeug.exceptions import NotFound |
Copilot
AI
Jan 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'NotFound' is not used.
| from werkzeug.exceptions import NotFound |
| rm -f "${out_file}" | ||
| while IFS= read -r line || [[ -n "$line" ]]; do | ||
| # shellcheck disable=SC2034 | ||
| SVR_HTTP_PORT="${ragflow_port}" ADMIN_SVR_HTTP_PORT="${admin_port}" \ | ||
| eval "echo \"$line\"" >> "${out_file}" | ||
| done < "${TEMPLATE_FILE}" |
Copilot
AI
Jan 9, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The render_service_conf function uses eval "echo \"$line\"" to expand environment variables from service_conf.yaml.template, which introduces a shell code injection risk. If an attacker can influence any of the environment variables used in the template (e.g., via orchestrator/UI that lets them set REDIS_PASSWORD, RAGFLOW_HOST, etc.), they can embed command substitutions like $(...) so that when eval re-parses the expanded line, arbitrary commands execute inside the container with the entrypoint's privileges. Replace the eval-based templating with a non-evaluating mechanism (e.g., envsubst or a simple variable substitution script that does not invoke the shell parser on untrusted data) so that environment values are treated as data, not executable shell code.
| rm -f "${out_file}" | |
| while IFS= read -r line || [[ -n "$line" ]]; do | |
| # shellcheck disable=SC2034 | |
| SVR_HTTP_PORT="${ragflow_port}" ADMIN_SVR_HTTP_PORT="${admin_port}" \ | |
| eval "echo \"$line\"" >> "${out_file}" | |
| done < "${TEMPLATE_FILE}" | |
| if ! command -v envsubst >/dev/null 2>&1; then | |
| echo "ERROR: envsubst is required to render service_conf from template." >&2 | |
| return 1 | |
| fi | |
| rm -f "${out_file}" | |
| SVR_HTTP_PORT="${ragflow_port}" ADMIN_SVR_HTTP_PORT="${admin_port}" \ | |
| envsubst < "${TEMPLATE_FILE}" > "${out_file}" |
Summary
Key Features
deploy.shscript for easy setup and deployment from source coderagflow_serverinstances in Docker and deployment scenariosChanges
scripts/deploy.sh- Comprehensive deployment script (1490+ lines)docker/entrypoint.shwith multi-instance supportdocker/docker-compose.ymlanddocker-compose-self-hosted-ob.ymldocker/nginx/ragflow.conffor load balancingdocker/.env.exampleSolution Description