- Project work for the course Text Technology @ Uni Stuttgart
- Dramas from earlier centuries can be hard to follow.
- They’re filled with archaic language and references long-lost to time.
- This often breaks the flow of the story and makes it difficult to grasp how characters relate to one another.
- But what if we could visualize those relationships instead?
- What if we could visualize which characters interact with one another?
- That’s what this project is about, turning complex, classical drama into a clear visualization of character interaction.
- Collect : We collect the drama corpus from DraCor, specifically English drama corpus
- Prepare : We convert the collected XML data to a format suitable for Neo4J
- Access : We expose a web UI to visualize the relationships
- The data collected is TEI encoded XML data
- Data relevant to the project is extracted using XPath
- We introduced eXist-DB, an XML native database to store the TEI encoded data for easy access.
- We interact with the database using RESTful APIs exposed by eXist-DB
- The project is organized as a monorepo with currently 4 different modules
- The
commonmodule contains the common utilities used across the project - The
scrapermodule contains the code related to the scraper - The
processormodule contains code related to the processor - The
visualizermodule contains code for visualizing character interactions - The
api/directory is the root directory storing the collection files for a popular REST API client called Bruno which we use for testing
.
├── README.md
├── docker-compose.yaml
├── pom.xml
├── api/
├── common/
├── processor/
├── visualizer/
├── samples
│ ├── anon-a-larum-for-london.xml
│ └── drama.xml
└── scraper/
- Easiest way to set up the project for use it to bring up the docker compose environment.
- Install Docker Desktop
- After setting up docker, please run the following command at the root of the project.
docker compose -f docker-compose.yaml up -d
⚠️ If you get an errorCannot connect to the Docker daemon at unix:///Users/user1/.docker/run/docker.sock. Is the docker daemon running?, please ensure docker is up and running.
- Please note that it will take some time for the eXist-DB to be up and running, please run the following command and verify the mentioned log appears
docker logs exist-db25 Jun 2025 19:03:53,165 [main] INFO (JettyStart.java [run]:289) - Server has started, listening on:
25 Jun 2025 19:03:53,165 [main] INFO (JettyStart.java [run]:291) - http://172.18.0.2:8080/
25 Jun 2025 19:03:53,165 [main] INFO (JettyStart.java [run]:291) - https://172.18.0.2:8443/
25 Jun 2025 19:03:53,165 [main] INFO (JettyStart.java [run]:294) - Configured contexts:
25 Jun 2025 19:03:53,165 [main] INFO (JettyStart.java [run]:300) - /exist (eXist XML Database)
25 Jun 2025 19:03:53,168 [main] INFO (JettyStart.java [run]:316) - /exist/iprange (IPrange filter)
25 Jun 2025 19:03:53,168 [main] INFO (JettyStart.java [run]:300) - / (eXist-db portal)
25 Jun 2025 19:03:53,169 [main] INFO (JettyStart.java [run]:316) - /iprange (IPrange filter)
⚠️ As a pre-requisite, please have Java (preferably JDK v 21+) and maven installed 💡 If you have IntelliJ IDEA installed, open the code as a project will simplify the process
- Make sure you are at the root directory
- Run
mvn compileormvn packageto compile the project - For running each application, run their respective jars.
- Here is an example for the scraper module
java -jar scraper/target/scraper.jar- This will bring up the scraper
- Similarly, one can bring up the processor and visualizer
⚠️ Please take care to bring up the dependent DB and modules for the application to work 💡 You can know if the services are up using the health endpointcurl -s -X GET http://localhost:<port>/healthRefer the port mappings in How to test the code section
-
You can test the code with a REST API Client like Bruno or using cURL
-
All the cURL commands are provided in the file curl-commands.sh
-
If you are using Bruno, open the folder
apiin bruno to view the collection -
Following are the port mappings for the services
| Service | Port |
|---|---|
| eXist-DB | 8080 |
| Scraper | 8081 |
| Processor | 8082 |
| Visualizer | 8083 |
| Neo4j server | 7687 |
| Neo4j UI | 7474 |
- Once the services and the database are up and running, perform a REST API PUT request to
/load/allendpoint of the scraper to start the process. - You can verify the data insertion using RESTful API provided by eXist-DB as documented here
💡 Some handy sample data is provided in the directory
samplesif you want to explore the data
The visualizer module provides a web-based interface for visualizing character interactions in dramas:
- Make sure the stack is up and running, refer
- Open a web browser and navigate to
http://localhost:8083 - Select a drama from the dropdown menu to visualize character interactions
- Interact with the graph:
- Hover over nodes to see character details
- Click on a character to view detailed interaction information
- Drag nodes to rearrange the graph
The visualizer provides a force-directed graph where:
- Nodes represent characters (color-coded by gender)
- Links represent interactions between characters
- Link thickness indicates the number of interactions
The graph data in Neo4j can be visualized in the UI using queries written in Cypher.
- Open the Neo4j UI
http://localhost:7474/on a browser - Login using the user
neo4jand passwordyour_password - Query data using Cypher
Some interesting queries are
- Get a list of all drama titles
MATCH (d:Drama) RETURN d.title as title ORDER BY title;- Get characters of a drama
Replace $dramaTitle with the name of the drama
MATCH (d:Drama {title: $dramaTitle})-[:HAS_CHARACTER]->(c:Character)
RETURN c.name as name, c.sex as gender- Get character interactions in a drama
Replace $dramaTitle with the name of the drama
MATCH (d:Drama {title: $dramaTitle})-[:HAS_CHARACTER]->(c1:Character)
MATCH (c1)-[r:INTERACTS_WITH]-(c2:Character)
WHERE c1.name < c2.name // To avoid duplicate pairs
RETURN c1.name as source, c2.name as target, r.interactionCount as value


