Skip to content

Commit b37e494

Browse files
committed
fix: update tika page
1 parent 3283265 commit b37e494

1 file changed

Lines changed: 8 additions & 1 deletion

File tree

cellsflows/1_preset_flows/5_data-formatting-and-conversion/32_tika-contents-extraction.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,11 +15,17 @@ Extract and index textual contents using Tika service.
1515
[Apache Tika](https://tika.apache.org/) is an independent, open source, content extractor that supports a very wide range of file formats. It can
1616
even support OCR for extracting text from images. This flow sends file contents to Tika and gets the textual information to be indexed internally by the Cells search engine.
1717

18+
### Prerequisites
19+
20+
- **Index content enabled**. You should enable **index content** in Cells Admin's console > Search engine > index content
21+
- Tika docker image with OCR enabled. The **full** variant include OCR capability e.g: **apache/tika:3.2.3.0-full**
22+
1823
### Install with Docker
1924

2025
Installing with Docker is as simple as running the following command:
26+
2127
```
22-
docker run -d -p 9998:9998 apache/tika:latest
28+
docker run -d -p 9998:9998 apache/tika:latest-full
2329
```
2430

2531
### How It Works
@@ -39,6 +45,7 @@ Tika provides also further metadata extraction, that can be indexed by Cells sea
3945

4046

4147
### Trigger Type
48+
4249
Event-based
4350

4451
### JSON Representation

0 commit comments

Comments
 (0)