Skip to content

Conversation

@xaviedoanhduy
Copy link

@xaviedoanhduy xaviedoanhduy commented Apr 17, 2025

Purpose

This migration script allows moving data from Enterprise documents* (EE) to the OCA dms* (CE) modules.

The goal is to preserve:

  • Folder hierarchy
  • Access rights (read/write groups)
  • Tags and tag categories
  • Files and their attachments

Approach

The migration is implemented as a post-init hook and works directly at the SQL level to avoid dependencies on EE models. This ensures:

  • Compatibility with CE environments (no EE models required).
  • Preservation of hierarchy, security, and data.

The migration flow has three steps:

1. Tags migration

  • documents.facetdms.category
  • documents.tagdms.tag
  • Prevents duplicates by checking existing categories and tags.

2. Folders migration

  • documents.folderdms.directory
  • Maintains parent/child relationships.
  • Root folders receive a default dms.storage.
  • Access rights migrated via:
    • documents_folder_res_groups_reldms.access.group (Write groups)
    • documents_folder_read_groups_reldms.access.group (Read groups)
  • Folder-level tags are also migrated via facet → tag mapping.

3. Files migration

  • documents.document (binary only) → dms.file
  • Preserves folder assignment and file metadata.
  • Keeps linked attachments by updating ir.attachment.res_model/res_id to the new dms.file.
  • Migrates file tags using the tag mapping.
  • Uses batch processing (1000 docs per batch) for scalability.

Data Mapping Details

EE Model / Table DMS Model / Table Notes
documents.facet dms.category Folder tag categories → Categories
documents.tag dms.tag Tag names + facet → Tags, with deduplication
documents.folder dms.directory Folder hierarchy → Directory hierarchy, sequence → color (fallback)
documents_folder_res_groups_rel dms.access.group Write group permissions (create/write/unlink)
documents_folder_read_groups_rel dms.access.group Read-only group permissions
documents.document (binary) dms.file Binary documents only, skipped inactive or non-binary docs
documents_document_tag_rel dms_file_tag_rel Many2many doc ↔ tags mapping
ir.attachment (linked to document) ir.attachment (relinked) Updated to point to new dms.file

Extra Notes

  • Colors default to randint(1, 11) when no valid sequence color is available.

  • A default database storage (dms.storage) is created if none exists.

  • Logging is verbose:

    • Created directories
    • Assigned groups
    • Migrated tags and files
  • Errors are logged per record without blocking the migration.

  • In the context of documents.document stored content via attachments (ir.attachment):

  • With dms.storage using save_type="database" (the default), the content will be recalculated and stored in dms.file. This can put additional load on the database.

  • With save_type="attachment", the storage requires directories that act as references, including the res_model and res_id of the record. Files within these directories will be linked to these values. For example, if the Contact directory references the res.partner model, it will create a directory named Admin storing res_model="res.partner" and res_id="3". Files attached to this partner will then be stored in the corresponding Admin directory. So if we choose to reference an existing attachment, it will require generating multiple directories corresponding to each attachment.

  • Given the options, I chose save_type="filestore", even though there is limited documentation about this type. In the current context, however, it seems to be the most appropriate choice.


Example Workflow

  1. Run module installation with the migration hook.
  2. Check that:
    • Categories and tags exist in dms.category and dms.tag.
    • Directories are created with correct hierarchy.
    • Access groups are properly assigned.
    • Files are migrated and linked to attachments.
    • Tags are assigned to both directories and files.

Validation

  • Before migration: Ensure EE tables (documents_*) exist and contain data.
  • After migration: Validate:
    • Number of categories, tags, and directories.
    • Randomly sample files to confirm attachments and tags.
    • Check group permissions are correctly migrated.

Migration should complete successfully and log a summary:

Successfully migrated X tags.
Successfully migrated Y folders (Z new root directories).
Successfully migrated N files.
Migration completed successfully.

@kobros-tech
Copy link
Contributor

@xaviedoanhduy

could you write desctiption on how to use it or configure if necessary, so that we could review easily?

@nilshamerlinck
Copy link

hi @kobros-tech you will receive an invitation to join the slack workspace where you'll find more information, see you there ;-)

@kobros-tech
Copy link
Contributor

kobros-tech commented May 29, 2025

@nilshamerlinck
thanks, I have now a stack account!

Copy link
Contributor

@kobros-tech kobros-tech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implement Review Only, would recommend adding test cases

@kobros-tech
Copy link
Contributor

@xaviedoanhduy

what can happen if there is a file that should be for a specific partner, shall we get the same file accessible by the same partner Only after migration?

@kobros-tech
Copy link
Contributor

kobros-tech commented Jun 1, 2025

all right I will ping other mind to tell about real life scenario and then we can apply

@wlin-kencove

@xaviedoanhduy
Copy link
Author

xaviedoanhduy commented Jul 30, 2025

@kobros-tech,

As far as I recall, dms.file/dms.directory are shared at the user - res.users level, while documents.document/documents.folder are assigned to contacts - res.partner.

I’d appreciate your thoughts if you have any ideas on how to map these fields between the two models.

Sorry for my omission, I'd like to clarify some points:

  • For your question, it seems that the answer is no. because the data from EE documents is not shared with any user (it can only be owned by 1 partner) but only through links or permission groups - I will give an example that the Internal folder has Read Groups (Write Groups) containing the Documents/Adminstrator permission group and the user named Demo is not in this group, he will not be able to see the above folder in the backend view - but he can have full read and write permissions if he owns the link generated from Share links.
  • EE documents, the documents will mostly not be accessible flexibly to portal users (even if that user is the owner of that document - it will only be seen through the backend in the Contact app). And for portal users (or even non-logged in users), they only need to know the 1 link created from Share links (documents.share model) - and that is the only way they can access these documents.
  • For CE dms, these sensitive records are controlled from another group model (dms.access.group) and allow all public users to see these documents if their group (res.groups - portal group) or maybe on users (res.users).
  • For the context of the current module, I am using the mechanism of creating dms.access.groups based on the name of the document and Read Groups (Write Groups) -> resulting in only users in the above permission groups can access it - and one thing is for sure, those permission groups do not contain portal users.

@kobros-tech
Copy link
Contributor

yes, it is much better.

once you are done you can ask me to review, good luck!

@xaviedoanhduy xaviedoanhduy changed the title [16.0][ADD] dms_import [16.0][ADD] dms_import: Migration data from documents EE to dms CE Sep 25, 2025
@xaviedoanhduy xaviedoanhduy force-pushed the 16.0-add-dms_import branch 3 times, most recently from de8ccd8 to d98f346 Compare September 29, 2025 10:19
@xaviedoanhduy xaviedoanhduy force-pushed the 16.0-add-dms_import branch 2 times, most recently from 9844447 to e06570c Compare October 16, 2025 09:36
@andreampiovesana
Copy link

nice to have

@xaviedoanhduy xaviedoanhduy force-pushed the 16.0-add-dms_import branch 7 times, most recently from 564fae4 to 0c05e18 Compare October 23, 2025 09:53
@xaviedoanhduy xaviedoanhduy force-pushed the 16.0-add-dms_import branch 4 times, most recently from 578bb77 to 2cdf519 Compare October 29, 2025 05:24
[IMP] dms_import: move pre_init_hook to post_init_hook

[REF][FIX] dms_import: Use new syntax, avoid sql injection and fix group creation bug

dms_import: also migrate achived data

[REF] dms_import: avoid duplication by forcing tags, categories, and default group permissions

[IMP] dms_import: improve performance, reduce batch size, and bypass heavy compute fields

[FIX] dms_import: handle duplicate name

[FIX] dms_import: standardize file names

[IMP] dms_import: improve unique_name_new to avoid long name

[FIX] dms_import: avoid check size when uploading files
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants