Schema Transformer: Migrating MongoDB to Azure DocumentDB

Schema Transformer is a Python script designed to analyze MongoDB collection schemas and efficiently transform them into an Azure DocumentDB optimized structure. This ensures seamless compatibility and enhances query performance.

With this tool, you can generate index and sharding recommendations tailored specifically to your workload, making your migration smoother and more efficient.

Supported Versions

The tool supports the following versions:

Source: MongoDB (version 4.0 and above)
Target: Azure DocumentDB (all versions)

How to Run the Script

Prerequisites

Before running the assessment, ensure that the client machine meets the following requirements:

Access to both source MongoDB endpoint and target Azure DocumentDB endpoint, either over a private or public network via the specified IP or hostname.
Python (version 3.10 or above) must be installed.
PyMongo library must be installed (pip install pymongo).

Steps to Run the Assessment

Navigate to the SchemaMigration directory.
Open the command prompt and navigate to this directory.

Create a JSON file to define the collections to be migrated. Each section in the configuration will define the schema migration options for a set of collections (you can specify * to refer to all collections in the cluster and db.* to refer all collections within a database). Refer the next section for more details on configuration options. Here are some examples -

To specify all collections present in the cluster

{
    "sections": [
        {
            "include": [
                "*"
            ],
            "exclude": [],
            "migrate_shard_key": "false",
            "drop_if_exists": "true",
            "optimize_compound_indexes": "true"
        }
    ]
}

To specify all collections except a particular database

{
    "sections": [
        {
            "include": [
                "*"
            ],
            "exclude": [
                "db1.*"
            ],
            "migrate_shard_key": "false",
            "drop_if_exists": "true",
            "optimize_compound_indexes": "true"
        }
    ]
}

To specify all collections except few

{
    "sections": [
        {
            "include": [
                "*"
            ],
            "exclude": [
                "db1.coll1",
                "db2.coll2"
            ],
            "migrate_shard_key": "false",
            "drop_if_exists": "true",
            "optimize_compound_indexes": "true"
        }
    ]
}

To migrate specific collections

{
    "sections": [
        {
            "include": [
                "db1.coll1",
                "db2.coll2"
            ],
            "migrate_shard_key": "false",
            "drop_if_exists": "true",
            "optimize_compound_indexes": "true"
        }
    ]
}

To migrate different set of collections with different configuration options

{
    "sections": [
        {
            "include": [
                "*"
            ],
            "exclude": [
                "db1.coll1",
                "db2.coll2"
            ],
            "migrate_shard_key": "false",
            "drop_if_exists": "true",
            "optimize_compound_indexes": "true"
        },
        {
            "include": [
                "db1.coll1",
                "db2.coll2"
            ],
            "migrate_shard_key": "true",
            "drop_if_exists": "true",
            "optimize_compound_indexes": "true"
        }
    ]
}

To colocate collections with a reference collection

{
    "sections": [
        {
            "include": [
                "db1.coll2",
                "db1.coll3"
            ],
            "migrate_shard_key": "false",
            "drop_if_exists": "true",
            "optimize_compound_indexes": "true",
            "co_locate_with": "coll1"
        }
    ]
}

Note: The collection specified in co_locate_with must already exist in the same database as the collection being processed. If the reference collection is not found, the script will fail with an error.

Run the following command, providing the full path of the JSON file created in the previous step:
```
python main.py --config-file <path_to_your_json_file> --source-uri <source_mongo_connection_string> --dest-uri <destination_documentdb_connection_string>
```
Optional: Enable verbose mode for detailed logging of the migration process:
```
python main.py --config-file <path_to_your_json_file> --source-uri <source_mongo_connection_string> --dest-uri <destination_documentdb_connection_string> --verbose
```
The --verbose flag provides detailed output showing:
- Connection status to source and destination databases
- Configuration parsing details (include/exclude patterns, collections found)
- Step-by-step migration progress (drop, create, colocation, shard keys, indexes)
- Detailed decision-making logic (e.g., why certain indexes are optimized or skipped)
- Success/failure status for each operation
Optional: Run collection migrations in parallel (recommended for large collection counts):
```
python main.py --config-file <path_to_your_json_file> --source-uri <source_mongo_connection_string> --dest-uri <destination_documentdb_connection_string> --workers 16
```
The --workers flag controls how many collections are processed concurrently. Use 1 for sequential behavior. Default is 5.

This process will generate an Azure DocumentDB-optimized schema with index and sharding recommendations based on your workload.

Configuration Options

Option	Description
migrate_shard_key	Determines whether the existing shard key definition should be migrated. If set to `True`, the shard key is retained; if `False`, the target collection remains unsharded. Collections that are originally unsharded in the source will remain unsharded in the target, regardless of this setting. Default: `False`.
drop_if_exists	Specifies whether collections with the same name in the target should be dropped and recreated. If `True`, existing collections are removed before migration; if `False`, they remain unchanged. Default: `False`.
optimize_compound_indexes	Controls whether compound indexes should be optimized. If `True`, the script identifies redundant indexes and excludes them from migration; if `False`, all indexes are migrated as-is. Default: `False`.
co_locate_with	Specifies the name of a reference collection from the same database to colocate with. When specified, the target collection will be colocated with the reference collection for improved query performance. The reference collection must exist in the same database before colocation is applied, or an error will be thrown. This option is useful for optimizing queries that join or access related collections together. Default: `None`.

Command Line Options

Option	Required	Description
--config-file	Yes	Path to the JSON configuration file that defines collections to migrate and their migration settings.
--source-uri	Yes	MongoDB connection string for the source database (e.g., `mongodb://localhost:27017`).
--dest-uri	Yes	MongoDB/DocumentDB connection string for the destination database.
--workers	No	Number of worker threads used to process collections in parallel. Default is `5`. Use `1` for sequential behavior. Increase this value for faster migrations with many collections.
--verbose	No	Enable verbose output mode. When set, displays detailed logging of all operations including connection status, configuration parsing, collection enumeration, and step-by-step migration progress. Useful for debugging and monitoring long-running migrations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Schema Transformer: Migrating MongoDB to Azure DocumentDB

Supported Versions

How to Run the Script

Prerequisites

Steps to Run the Assessment

Configuration Options

Command Line Options

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Schema Transformer: Migrating MongoDB to Azure DocumentDB

Supported Versions

How to Run the Script

Prerequisites

Steps to Run the Assessment

Configuration Options

Command Line Options