How it works
When indexing a resource (user, deposit, document, etc.), SONAR dumps all the fields of the linked resource in its index. This is not optimal as there are unnecessary data in indexes and sometimes twice!
See this user, for example: https://sonar.rero.ch/api/users/?q=309638
ALL fields of the organisation resource have been dumped twice: organisation + subdivision.organisation, most of this data is totally useless in this index.
The same problem happens with deposits.
Improvement suggestion
To be specified
Go through all resources elasticsearch dumpers and be mindful of which linked data really needs to be dumped (only what is useful for search)
How it works
When indexing a resource (user, deposit, document, etc.), SONAR dumps all the fields of the linked resource in its index. This is not optimal as there are unnecessary data in indexes and sometimes twice!
See this user, for example: https://sonar.rero.ch/api/users/?q=309638
ALL fields of the
organisationresource have been dumped twice:organisation+subdivision.organisation, most of this data is totally useless in this index.The same problem happens with deposits.
Improvement suggestion
To be specified
Go through all resources elasticsearch dumpers and be mindful of which linked data really needs to be dumped (only what is useful for search)