You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Dec 14, 2023. It is now read-only.
but it seems that the resulting /var/tmp/jieba.cache does not become accessible by the users as that file gets created with root:root owner and 600 permissions while its users run as mediacloud:mediacloud, so Jieba resorts to rebuilding that cache file on every call.
Fix jieba.cache's file permissions at build time so that Jieba library could access it; probably you just need to run that cache creation script with a different user in Dockerfile
Limit the storage that gets used by all service containers in production's docker-compose.yml where appropriate - you'll probably need storage_opt for that
We try to put at least a liberal cap on all services' resources so that if it goes rogue with CPU / RAM usage, it doesn't have impact on the host machine and doesn't make other services crash. It now turns out that we can run out of disk space too, and while systems monitoring would be a good yet reactive way to deal with that, we need to be proactive about it too and not let containers burn through the host machine's root partition (or if they do, the disk space limitation should be isolated to the container).
extract-and-vectorworkers tend to fill up/var/tmpwith gigabytes of pretty much identical files which are of the size of either 0 or 3332489:It took me a while to notice that a temporary file with a random name and a temporary file with a not-so-random name have identical file sizes:
Jieba is a Python library which does Chinese language tokenization for us. Given that it uses a dictionary to do that, it has to pre-load some stuff:
backend/apps/common/Dockerfile
Lines 139 to 144 in 04bc9c6
but it seems that the resulting
/var/tmp/jieba.cachedoes not become accessible by the users as that file gets created withroot:rootowner and600permissions while its users run asmediacloud:mediacloud, so Jieba resorts to rebuilding that cache file on every call.@jtotoole, could you:
jieba.cache's file permissions at build time so that Jieba library could access it; probably you just need to run that cache creation script with a different user inDockerfiledocker-compose.ymlwhere appropriate - you'll probably needstorage_optfor that