Description
The unstructured image used in docker-compose.yml misses the web server and exits immediately.
The correct image is downloads.unstructured.io/unstructured-io/unstructured-api:latest (instead than downloads.unstructured.io/unstructured-io/unstructured:latest)
Once this is fixed, the .env var should be set to (no https, and port 8000)
UNSTRUCTURED_API_URL=http://unstructured:8000/general/v0/general
Finally, keeping the UNSTRUCTURED_API_KEY empy to use unstructured only locally raises an error because goldenverba/components/reader/UnstructuredAPI.py makes it mandatory
I suggest making it optional to enable local-only processing
E.g. by changing goldenverba/components/util.py
diff --git a/goldenverba/components/util.py b/goldenverba/components/util.py
index f376e25..5b98b6c 100644
--- a/goldenverba/components/util.py
+++ b/goldenverba/components/util.py
@@ -46,16 +46,17 @@ def pca(X, k):
return X_pca
-def get_environment(config, value: str, env: str, error_msg: str) -> str:
+def get_environment(config, value: str, env: str, error_msg: str, optional : bool = False) -> str:
if value in config:
token = config[value].value
else:
token = os.environ.get(env)
if not token or token == "":
+ if optional: return ""
raise Exception(error_msg)
return token
def get_token(env: str, default: str = None) -> str:
# return token, but treat empty string als None
token = tok if bool(tok := os.getenv(env, None)) else default
and by changing goldenverba/components/reader/UnstructuredAPI.py
diff --git a/goldenverba/components/reader/UnstructuredAPI.py b/goldenverba/components/reader/UnstructuredAPI.py
index 57c8648..13cde49 100644
--- a/goldenverba/components/reader/UnstructuredAPI.py
+++ b/goldenverba/components/reader/UnstructuredAPI.py
@@ -40,6 +40,7 @@ class UnstructuredReader(Reader):
value="",
description="Set your Unstructured API Key here or set it as an environment variable `UNSTRUCTURED_API_KEY`",
values=[],
+ optional=True,
)
if os.getenv("UNSTRUCTURED_API_URL") is None:
@@ -62,6 +63,7 @@ class UnstructuredReader(Reader):
"API Key",
"UNSTRUCTURED_API_KEY",
"No Unstructured API Key detected",
+ optional=True,
)
api_url = get_environment(
config, "API URL", "UNSTRUCTURED_API_URL", "No Unstructured URL detected"
Description
The unstructured image used in docker-compose.yml misses the web server and exits immediately.
The correct image is downloads.unstructured.io/unstructured-io/unstructured-api:latest (instead than downloads.unstructured.io/unstructured-io/unstructured:latest)
Once this is fixed, the .env var should be set to (no https, and port 8000)
UNSTRUCTURED_API_URL=http://unstructured:8000/general/v0/general
Finally, keeping the UNSTRUCTURED_API_KEY empy to use unstructured only locally raises an error because goldenverba/components/reader/UnstructuredAPI.py makes it mandatory
I suggest making it optional to enable local-only processing
E.g. by changing goldenverba/components/util.py
and by changing goldenverba/components/reader/UnstructuredAPI.py