Skip to content

Unknown langCode: '' after downloading wikipedia articles #271

@saisubramaniam

Description

@saisubramaniam

Hi,

While installing Wikibrain (SR only) with the full English language, I encountered the below error.
The args during installation:
java -Xmx80g -cp wikibrain-withdeps-0.8.0.jar org.wikibrain.Loader org.wikibrain.Loader -l en -s fetchlinks -s download -s dumploader -s redirects -s wikitext -s lucene -s phrases -s sr

The error:

06:52:45.937 [main] INFO org.wikibrain.download.DumpFileDownloader - 28 files downloaded out of 28 files.
06:52:46.034 [main] INFO org.wikibrain.loader.pipeline.PipelineLoader - Successfully completed stage download
06:52:46.035 [main] INFO org.wikibrain.loader.pipeline.PipelineLoader - Beginning stage dumploader
ERROR StatusLogger No log4j2 configuration file found. Using default configuration: logging only errors to the console.
06:52:46.659 [main] INFO org.wikibrain.core.cmd.Env - Configured default logging at the Info Level
06:52:46.660 [main] INFO org.wikibrain.core.cmd.Env - To customize log4j2 set the 'log4j.configurationFile' system property or set EnvBuilder.setReconfigureLogging to$
06:52:49.124 [main] INFO org.wikibrain.conf.Configurator - configurator installed 75 providers for 38 classes
06:52:49.125 [main] INFO org.wikibrain.core.cmd.Env - using baseDir /mnt3/wikibrain/.
06:52:49.125 [main] INFO org.wikibrain.core.cmd.Env - using max vm heapsize of 74581MB
06:52:49.127 [main] INFO org.wikibrain.core.cmd.Env - using languages (EN)
06:52:49.127 [main] INFO org.wikibrain.core.cmd.Env - using maxThreads 16
06:52:49.127 [main] INFO org.wikibrain.core.cmd.Env - using tmpDir ./.tmp
06:52:49.347 [main] WARN org.wikibrain.core.dao.sql.WpDataSource - Raised connections per partition to 3
06:52:49.643 [main] INFO org.wikibrain.loader.DumpLoader - processing file: org.wikibrain.Loader
Exception in thread "main" java.lang.IllegalArgumentException: unknown langCode: ''
at org.wikibrain.core.lang.Language.getByLangCode(Language.java:102)
at org.wikibrain.core.cmd.FileMatcher.getLanguage(FileMatcher.java:210)
at org.wikibrain.loader.DumpLoader.load(DumpLoader.java:82)
at org.wikibrain.loader.DumpLoader.main(DumpLoader.java:257)
06:56:43.471 [main] ERROR org.parse4j.ParseObject - Request failed.
06:56:43.472 [main] WARN org.wikibrain.loader.pipeline.DiagnosticDao - Save of diagnostics failed:
org.json.JSONException: A JSONObject text must begin with '{' at 1 [character 2 line 1]
at org.json.JSONTokener.syntaxError(JSONTokener.java:433) ~[wikibrain-withdeps-0.8.0.jar:?]
at org.json.JSONObject.(JSONObject.java:194) ~[wikibrain-withdeps-0.8.0.jar:?]
at org.json.JSONObject.(JSONObject.java:321) ~[wikibrain-withdeps-0.8.0.jar:?]
at org.parse4j.command.ParseResponse.getJsonObject(ParseResponse.java:83) ~[wikibrain-withdeps-0.8.0.jar:?]
at org.parse4j.command.ParseResponse.getException(ParseResponse.java:71) ~[wikibrain-withdeps-0.8.0.jar:?]
at org.parse4j.ParseObject.save(ParseObject.java:483) ~[wikibrain-withdeps-0.8.0.jar:?]
at org.wikibrain.loader.pipeline.DiagnosticDao.save(DiagnosticDao.java:67) ~[wikibrain-withdeps-0.8.0.jar:?]
at org.wikibrain.loader.pipeline.DiagnosticDao.saveQuietly(DiagnosticDao.java:72) [wikibrain-withdeps-0.8.0.jar:?]
at org.wikibrain.loader.pipeline.PipelineLoader.quietlySaveDiagnostics(PipelineLoader.java:132) [wikibrain-withdeps-0.8.0.jar:?]
at org.wikibrain.loader.pipeline.PipelineLoader.run(PipelineLoader.java:113) [wikibrain-withdeps-0.8.0.jar:?]
at org.wikibrain.Loader.run(Loader.java:98) [wikibrain-withdeps-0.8.0.jar:?]
at org.wikibrain.Loader.main(Loader.java:136) [wikibrain-withdeps-0.8.0.jar:?]
06:56:43.530 [main] ERROR org.parse4j.ParseObject - Request failed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions