perf: Optimize user creation and deletion#22936
Conversation
…-core into fix-usergroup-member-loading
618358a to
bec4474
Compare
…-core into fix-usergroup-member-loading
jbee
left a comment
There was a problem hiding this comment.
Looks like a step in a good direction to me with having service methods that represent a user level action which take UID inputs and then mainly use SQL to perform the operation. This will create a mix of using SQL and manipulating the object model which we know has its risk. I hope we can eliminate more of the object model manipulation code in future improvements.
There was a problem hiding this comment.
Overall, the outcome for this scenario seems to be much improved, nice job 👍
Some comments:
- having a performance test included in the new performance test suite would be great to have. It would be nice to build this suite up, especially when fixing/improving specific performance issues.
- Introducing the use of
session().evict(outside of theCacheInvalidationListener) is something new in the system. This feels like it has the potential to introduce very subtle issues that might be hard to pin down in future. By doing this, we're trying to take control away from Hibernate, even though almost everything relies on it. Only time will tell.
Its a fair point. This is the necessary price of using raw SQL in a Hibernate managed context. The issue of the Redis cache is noted with a comment to deal with with an additional follow up. |
|
This reverts commit 79f3930.
* Optimize user creation and deletion



Summary
Fixes N+1 query performance problems when creating/updating users in systems with large numbers of users and user groups originally reported in https://dhis2.atlassian.net/browse/DHIS2-20614
Problem
Glowroot profiling on a production-scale database identified two major performance bottlenecks when creating a single user via the User app:
SchemaToDataFetcher loading ALL users.: During metadata import, uniqueness checking loaded all 221,647 users from the database (SELECT code, ldapid, username FROM userinfo - 75.6ms, 221K rows), even when importing just one user.
UserGroup member collection loading : Adding/removing users from groups called userGroup.addUser(user) / userGroup.removeUser(user), which triggered lazy initialization of the entire members collection. Similarly, updateUserGroups called userGroup.getMembers().size() to detect changes, also forcing full collection loading. This resulted in SELECT members0_.usergroupid ... FROM usergroupmembers INNER JOIN userinfo executing twice, loading ~12,500 member rows each time (161.4ms total).
Combined, creating a single user took 4,083ms CPU time and allocated 2.7 GB of memory. During production loading, these figures were observed to be much higher.
Both deletion handlers now use direct SQL to manage the join tables via removeAllMemberships(), with updateLastUpdated calls to maintain timestamps and evict L1/L2 caches.
Additionally, the Hibernate mapping ownership for User↔UserRole was inverted to align with User↔UserGroup semantics: User.userRoles is now inverse="true" (non-owning) and UserRole.members is the owning side. Previously, User.userRoles was the owning side with cascade="all", which caused Hibernate to manage the join table through the User entity - triggering cascade-driven chain loading during user deletion.
The @Property(owner = TRUE) annotation override on getUserRoles() ensures the DHIS2 schema/import framework still treats it as an owner property, so that resetNonOwnerProperties() doesn't clear userRoles during PATCH/PUT and collectObjectReferences() still collects role references. The join table is now synced via SQL in UserObjectBundleHook.postCommit().
Solution
SchemaToDataFetcherAdded a filtered
fetch(Schema, Collection)method that only queries for records matching unique property values being imported, using WHERE field IN (:values) clauses instead of loading all records. The old fetch(Schema) method was removed.UserGroupServiceReplaced Hibernate collection operations (userGroup.addUser()/removeUser()/getMembers().size()) with direct SQL via new UserGroupStore methods:
This avoids initializing the lazy members collection entirely.
The updateUserGroups method was rewritten to track membership changes via a changedGroups set rather than comparing getMembers().size() before and after.
canAddOrRemoveMemberAdded an overload accepting UserGroup directly to avoid redundant database lookups when the caller already has the entity loaded. The original code called canAddOrRemoveMember(uid) (which fetched the group by UID), then immediately
called getUserGroup(uid) again.
Additional fixes
Results
Glowroot traces for creating a single user with group assignment on a copy of the database where this problem was originally observed.
Gatling simulation test was performed and the feature branch was consistently faster across the board. 2000 users were imported into the Sierra Leone database for this test.
Mean Response Time (ms)
95th Percentile Response Time (ms)
⬇️ = faster (improvement), ⬆️ = slower (regression)
Known limitation
The direct SQL operations bypass Hibernate's event lifecycle, which means
PostCacheEventPublisherdoes not fire for these changes. In multi-instance deployments with Redis cache invalidation enabled (redis.cache.invalidation.enabled=ON), other nodes will not receive cache invalidation messages for usergroupmembers changes. This will be addressed in a follow-up ticket. Single-instance deployments are unaffected — local L1/L2 caches are explicitly evicted.
Testing
All existing tests pass. Tested with Glowroot on a copy of the production database where the problem was originally observed.
AI Disclaimer:
Portions of the PR were developed with the assistance of AI.