Skip to content

Wrong unstructured image for docker compose #387

@asterbini

Description

@asterbini

Description

The unstructured image used in docker-compose.yml misses the web server and exits immediately.
The correct image is downloads.unstructured.io/unstructured-io/unstructured-api:latest (instead than downloads.unstructured.io/unstructured-io/unstructured:latest)

Once this is fixed, the .env var should be set to (no https, and port 8000)
UNSTRUCTURED_API_URL=http://unstructured:8000/general/v0/general

Finally, keeping the UNSTRUCTURED_API_KEY empy to use unstructured only locally raises an error because goldenverba/components/reader/UnstructuredAPI.py makes it mandatory

I suggest making it optional to enable local-only processing
E.g. by changing goldenverba/components/util.py

diff --git a/goldenverba/components/util.py b/goldenverba/components/util.py
index f376e25..5b98b6c 100644
--- a/goldenverba/components/util.py
+++ b/goldenverba/components/util.py
@@ -46,16 +46,17 @@ def pca(X, k):
     return X_pca


-def get_environment(config, value: str, env: str, error_msg: str) -> str:
+def get_environment(config, value: str, env: str, error_msg: str, optional : bool = False) -> str:
     if value in config:
         token = config[value].value
     else:
         token = os.environ.get(env)
     if not token or token == "":
+        if optional: return ""
         raise Exception(error_msg)
     return token

 def get_token(env: str, default: str = None) -> str:
     # return token, but treat empty string als None
     token = tok if bool(tok := os.getenv(env, None)) else default

and by changing goldenverba/components/reader/UnstructuredAPI.py

diff --git a/goldenverba/components/reader/UnstructuredAPI.py b/goldenverba/components/reader/UnstructuredAPI.py
index 57c8648..13cde49 100644
--- a/goldenverba/components/reader/UnstructuredAPI.py
+++ b/goldenverba/components/reader/UnstructuredAPI.py
@@ -40,6 +40,7 @@ class UnstructuredReader(Reader):
                 value="",
                 description="Set your Unstructured API Key here or set it as an environment variable `UNSTRUCTURED_API_KEY`",
                 values=[],
+                optional=True,
             )

         if os.getenv("UNSTRUCTURED_API_URL") is None:
@@ -62,6 +63,7 @@ class UnstructuredReader(Reader):
             "API Key",
             "UNSTRUCTURED_API_KEY",
             "No Unstructured API Key detected",
+            optional=True,
         )
         api_url = get_environment(
             config, "API URL", "UNSTRUCTURED_API_URL", "No Unstructured URL detected"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions