Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
236 changes: 234 additions & 2 deletions docs/04-user-guide/03-integrations/02-hue.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,235 @@
# Hue
---
sidebar_label: Hue
---

**TODO:** File a subtask under [HDDS-9858](https://issues.apache.org/jira/browse/HDDS-9858) and complete this page or section.
# Integrating Apache Hue with Ozone

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious: Maybe we should change Apache Hue to Cloudera Hue everywhere?


Apache Hue provides a user-friendly web interface for interacting with various Hadoop ecosystem components, including file browsing. Hue can be configured to browse and manage data stored in Apache Ozone, leveraging Ozone's **HttpFS** interface, which offers WebHDFS-compatible REST endpoints.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Let's remove Hadoop as Hue can interact with other services too


## How Hue Interacts with Storage

Hue's File Browser and other components rely on Hadoop-compatible filesystem interfaces to:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. Hue can interact with S3, ABFS, GS using their APIs too


- Browse directory structures.
- List files and directories with their metadata.
- Upload and download files.
- Perform basic file management operations (create directory, rename, move, copy, delete).
- Provide paths for data access to integrated query engines (like Hive, Impala).

## Ozone's HttpFS Interface for Hue

Ozone enables Hue integration through its built-in **HttpFS service**, which typically runs as part of the Ozone Manager (OM).

- **WebHDFS Compatibility:** The HttpFS service exposes a REST API at `/webhdfs/v1` that mimics the HDFS WebHDFS API. Hue uses this API to perform filesystem operations.
- **Translation:** HttpFS receives HTTP requests from Hue and translates them into Ozone RPC calls to the Ozone Manager.
- **Authentication:** Supports Kerberos (SPNEGO) for secure clusters, allowing Hue to authenticate securely.
- **Impersonation:** Supports Hadoop's proxy user mechanism, allowing the Hue service user to perform operations on behalf of the logged-in Hue user.

:::info Note
While Hue might be configured with `ofs://` as its default filesystem (`fs_defaultfs`) for linking with query engines, the **File Browser** functionality primarily uses the **HttpFS/WebHDFS** endpoint (`webhdfs_url`) to interact with Ozone's namespace.
:::

## Configuration Requirements

### 1. Ozone HttpFS Configuration

Ensure the Ozone Manager's HTTP/HTTPS interface is enabled and configured correctly in `ozone-site.xml`. HttpFS runs as part of the OM.

```xml
<configuration>

<!-- Ensure OM HTTP(S) address is configured -->
<property>
<name>ozone.om.http.address</name>
<value>om-host.example.com:9874</value>
<description>Ozone Manager HTTP address.</description>
</property>
<property>
<name>ozone.om.https.address</name>
<value>om-host.example.com:9875</value>
<description>Ozone Manager HTTPS address.</description>
</property>
<property>
<name>ozone.om.http.enabled</name>
<value>true</value> <!-- Or false if only using HTTPS -->
<description>Enable OM HTTP endpoint.</description>
</property>
<property>
<name>hdds.http.policy</name>
<value>HTTP_ONLY</value> <!-- Or HTTPS_ONLY, HTTP_AND_HTTPS -->
<description>Policy for HTTP/HTTPS endpoints.</description>
</property>

<!-- Kerberos Authentication for HttpFS (if cluster is secure) -->
<property>
<name>ozone.om.http.auth.type</name>
<value>kerberos</value>
<description>Authentication type for OM HTTP endpoint.</description>
</property>
<property>
<name>ozone.om.http.kerberos.principal</name>
<value>HTTP/om-host.example.com@YOUR-REALM.COM</value>
<description>OM HTTP Kerberos principal (SPNEGO).</description>
</property>
<property>
<name>ozone.om.http.kerberos.keytab.file</name>
<value>/etc/security/keytabs/om-http.keytab</value> <!-- Path to OM HTTP keytab -->
<description>OM HTTP Kerberos keytab file.</description>
</property>

</configuration>
```

- Adjust hostnames, ports, security settings, and keytab paths according to your cluster setup.
- Restart Ozone Manager after making changes.

### 2. Hadoop Proxy User Configuration for Hue

To allow the Hue service user (e.g., `hue`) to impersonate end-users when accessing Ozone via HttpFS, configure Hadoop's proxy user settings in the `core-site.xml` used by the Ozone Manager.

```xml
<configuration>

<property>
<name>hadoop.proxyuser.hue.hosts</name>
<!-- List of hosts where Hue service runs, or '*' for any host -->
<value>hue-host.example.com,*</value>
<description>Allow the 'hue' user to proxy requests from these hosts.</description>
</property>

<property>
<name>hadoop.proxyuser.hue.groups</name>
<!-- List of groups whose members the 'hue' user can impersonate, or '*' for any group -->
<value>*</value>
<description>Allow the 'hue' user to impersonate users belonging to these groups.</description>
</property>

<!-- Repeat for other proxy users if necessary -->

</configuration>
```

- Replace `hue` with the actual OS user running the Hue service.
- Replace `hue-host.example.com`with the actual hostname(s) where Hue runs. Using`*` is less secure but often simpler for initial setup.
- Restart Ozone Manager after modifying `core-site.xml`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could link apache/ozone#9596 here once it's merged, as that config could also be needed for Hue.

### 3. Hue Configuration (`hue.ini`)

Configure Hue to use Ozone's HttpFS endpoint and optionally set the default filesystem path. Edit the `[desktop]`and`[[ozone]]`sections in`hue.ini`:

```ini
[desktop]
# Define the default filesystem for Hue applications (e.g., Hive, Impala jobs)
# Use ofs:// with your OM Service ID for HA or OM address for non-HA
fs_defaultfs=ofs://ozonecluster/
Comment on lines +122 to +124

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought this should be added under the [[ozone]] section. Could you please verify?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes +1


# Secret key for session signing (ensure this is set securely)
secret_key=YourSecretKeyForHueSessionSigning
Comment on lines +126 to +127

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this config, do you know why is it needed?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks unrelated and we can remove it.


[[ozone]]
# This section configures the Ozone filesystem interface in Hue

# URL for the Ozone Manager's HttpFS (WebHDFS compatible) endpoint
# Use https:// if TLS is enabled for OM HTTP endpoint
webhdfs_url=http://om-host.example.com:9874/webhdfs/v1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks incorrect, the webhdfs_url should be the HttpFS gateway endpoint.

Suggested change
webhdfs_url=http://om-host.example.com:9874/webhdfs/v1
webhdfs_url=http(s)://[***OZONE-HTTPFS-HOST***]:[***OZONE-HTTPFS-PORT***]/webhdfs/v1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes +1


# For secure clusters using Kerberos/SPNEGO for HttpFS:
# security_enabled=true

# For secure clusters using TLS/SSL:
# Set to the path of the CA certificate bundle if using custom CAs,
# or set to false to disable server certificate verification (INSECURE!).
# ssl_cert_ca_verify=true
# [[ssl]]
# cacerts=/path/to/ca_bundle.pem

# Set the default cluster name (optional, cosmetic)
# nice_name="My Ozone Cluster"
Comment on lines +143 to +147

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can safely remove this section


```

- Replace `ofs://ozonecluster/`with your correct`ofs` path prefix (using your OM service ID).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
- Replace `ofs://ozonecluster/`with your correct`ofs` path prefix (using your OM service ID).
- Replace `ofs://ozonecluster/` with your correct `ofs` path prefix (using your OM service ID).

- Replace `http://om-host.example.com:9874` with the actual HTTP(S) address of your Ozone Manager.
- Uncomment and configure `security_enabled`and`ssl_cert_ca_verify` as needed for secure clusters.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
- Uncomment and configure `security_enabled`and`ssl_cert_ca_verify` as needed for secure clusters.
- Uncomment and configure `security_enabled` and `ssl_cert_ca_verify` as needed for secure clusters.

- Restart the Hue service after modifying `hue.ini`.

## Using Hue with Ozone via HttpFS (Recommended for Browsing)

After successful configuration using HttpFS, users logging into Hue should be able to use the **File Browser** application to navigate the Ozone namespace with filesystem semantics.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also add a below note here (wording can be improved):

If logged-in users can still not see the Ozone Icon in the left navigation to access the filesystem, then they should check with the Hue admin if they have Hue user/group level access permission for it.

For that, the Hue admin can go in the Administer User section in the bottom of left navigation, then check the permissions for the specific user.

Ref: https://docs.cloudera.com/cdp-private-cloud-base/7.3.1/administering-hue/topics/hue-browse-ozone-fs.html#:~:text=Click%20your%20username,save%20the%20changes.


- **Browsing:** Navigate through volumes, buckets, and directories (especially in FSO buckets).
- **Operations:** Upload, download, create directories, rename, move, copy, delete files/directories (subject to user permissions in Ozone and limitations based on bucket layout).
- **File Viewing/Editing:** View and edit text-based files directly.
Comment on lines +160 to +162

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also have few limitations: https://docs-archive.cloudera.com/cdw-runtime/1.5.1/administering-hue/topics/hue-browse-ozone-fs-limitations.html

Maybe good to call them out in the Ozone docs as well?


Data stored in Ozone can also be accessed by other Hue applications like the **Hive** and **Impala** query editors by referencing tables whose `LOCATION`points to`ofs://`paths (configured via`fs_defaultfs` or explicitly in table definitions).

## Using Hue with Ozone via S3 API (Alternative)

Hue also supports browsing S3-compatible storage directly. You can configure Hue to connect to Ozone's S3 Gateway endpoint. This method is primarily useful for browsing **OBS (Object Store)** buckets or when S3 access patterns are preferred.
Comment on lines +166 to +168

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this whole section, I'm not aware that we have ever tested this. Do you know where this came from? Any resource mentioning this?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Hue does not support accessing Ozone via S3 API, so we should drop the related sections


### Hue Configuration for S3 (`hue.ini`)

Add or modify the `[[[s3]]]`section within `[desktop][[filebrowser]]`:

```ini
[desktop]
[[filebrowser]]
[[[s3]]]
# S3 API endpoint for the Ozone S3 Gateway
host=ozone-s3g.example.com:9878 # Replace with your S3 Gateway host and port

# Set to false if using HTTP, true for HTTPS
use_ssl=false

# AWS Region (often arbitrary for Ozone, but might be needed by Hue)
region=us-east-1

# Authentication Type: Set to 'AWS_V4' for standard S3 auth
auth_provider_type=AWS_V4

# Credentials can be sourced from environment variables, EC2 metadata,
# or explicitly set here (less secure). For explicit setting:
# access_key_id=YOUR_OZONE_S3_ACCESS_KEY
# secret_access_key=YOUR_OZONE_S3_SECRET_KEY

# Path style access is usually required for Ozone S3 Gateway
use_path_style=true
```

- Replace `ozone-s3g.example.com:9878` with your S3 Gateway address.
- Configure `use_ssl` based on your S3 Gateway setup.
- Ensure Hue has access to the necessary S3 credentials (e.g., via environment variables `AWS_ACCESS_KEY_ID`and`AWS_SECRET_ACCESS_KEY`for the Hue process, or by configuring them directly in`hue.ini`).

### Considerations for S3 Browsing

- **Bucket Layout:** Browsing via S3 works best with **OBS buckets** due to their flat namespace matching S3 semantics. Browsing FSO buckets via S3 will show objects with `/` delimiters, but directory operations will have the limitations described previously (non-atomic, performance impact).
- **Functionality:** The Hue S3 browser might offer slightly different features compared to the HDFS/WebHDFS browser (e.g., regarding permission display or specific operations).
- **Primary Use:** This method is suitable if your primary interaction with certain Ozone buckets is through the S3 API and you want a consistent browsing experience within Hue for those buckets.

**In summary, while both HttpFS and S3 can be used to connect Hue to Ozone, HttpFS with FSO buckets provides a richer, more performant filesystem browsing experience, whereas S3 is better suited for interacting with OBS buckets.**

## Bucket Layout Considerations

- **FSO Recommended (via HttpFS):** For the best experience with Hue's File Browser using the default HttpFS/WebHDFS connection, use **File System Optimized (FSO)** buckets. FSO provides the hierarchical directory structure and filesystem semantics that Hue expects, leading to more intuitive browsing and efficient operations.
- **OBS (via S3):** If browsing **Object Store (OBS)** buckets, configuring Hue to connect directly via the S3 API is generally preferred, as it aligns better with OBS's flat namespace and object semantics.
- **FSO via S3:** Browsing FSO buckets via Hue's S3 connector is possible but inherits the limitations of S3 access to FSO (non-atomic directory operations, potential performance issues for directory-heavy tasks).

## Troubleshooting

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!


- **Cannot Connect / "Could not connect to WebHDFS":**
- Verify the `webhdfs_url`in`hue.ini` is correct and points to the running OM HTTP(S) endpoint.
- Check network connectivity and firewalls between Hue and OM nodes.
- Ensure the OM HTTP endpoint is enabled (`ozone.om.http.enabled`or`hdds.http.policy`).
- Check OM logs for errors related to HttpFS.
- **Authentication Errors (Secure Clusters):**
- Verify Kerberos principal and keytab settings for `ozone.om.http.kerberos.*`in`ozone-site.xml`.
- Ensure the Hue server has a valid Kerberos ticket if `security_enabled=true`in`hue.ini`.
- Check SPNEGO negotiation logs in OM.
- **Permission Denied / Impersonation Errors:**
- Verify `hadoop.proxyuser.<hue_user>.*`settings in OM's`core-site.xml`.
- Check Ozone ACLs for the user attempting the operation via Hue. Ensure the *end-user* (not just the Hue service user) has the necessary permissions on the target Ozone path.
- If using Ranger, check Ranger policies.
- **File Operations Fail:**
- Check Ozone ACLs/Ranger policies.
- Ensure the target bucket is an **FSO bucket** for operations relying on directory semantics.
- Check OM logs for specific error messages related to the failed operation.