AI TINKERERS Hackaton 10/12/25
This project extracts structured railway safety knowledge from unstructured incident documents and prepares it for ingestion into a Neo4J graph-based safety system.
Its purpose is to help railway safety personnel analyze incidents, identify root causes, and justify maintenance or budget decisions using a connected safety knowledge graph. The data is from the SNCF opendata and can be viewed here: https://eu.ftp.opendatasoft.com/sncf/rapports/rapport-securite-2023.PDF
This a basic version of the graph, it should be enriched to demonstrate its full capability.
Documents are converted to markdown using Docling. This provides a uniform text format for analysis.
The markdown document is segmented into sections based on titles (#, ##, etc.).
Each chunk is processed independently.
Outputs:
incident_typepossible_impactspossible_root_causes(with HIGH/LOW likelihood)- Cypher defining:
- Incident β Impact
- Incident β RootCause
- Impact β RootCause
Outputs:
- root cause (chosen from a fixed taxonomy)
- normalized causal category
- Cypher linking:
- Category β RootCause
Outputs:
- impact category
- normalized sub-impact subtype
- optional inferred root cause
- Cypher linking:
- SubImpact β Impact
Each classifier produces a JSON file:
βββ incidents.json βββ causes.json βββ impacts.json
All files are ready for Neo4J ingestion.
POST /convert_pdf?pdf_path=path/to/file.pdf
POST /process_incidents?md_path=path/to/file.md
POST /process_causes?md_path=path/to/file.md
POST /process_impacts?md_path=path/to/file.md
The pipeline produces structured JSON files (incidents.json, causes.json, impacts.json) that already contain fully generated Cypher statements.
These Cypher statements express all required graph relationships for a safety-knowledge model.
The system generates and ingests four major types of nodes:
- IncidentType
- Impact
- SubImpact
- RootCause
- SubCause
And several relationship types:
(:IncidentType)-[:HAS_IMPACT]->(:Impact)(:IncidentType)-[:HAS_CAUSE {likelihood}]->(:RootCause)(:Impact)-[:LINKED_TO_CAUSE]->(:RootCause)(:SubImpact)-[:EST_SOUS_IMPACT_DE]->(:Impact)
These relations form a multi-layered safety graph connecting:
Incident β Impact β Cause
This creates a navigable structure where analysts can:
- explore causal chains
- compare incident families
- trace common failure patterns
- justify maintenance decisions
- support risk scoring and prioritization
The ingestion route (/neo4j_ingest) reads the three JSON files and extracts every Cypher statement.
driver.execute_query(cypher)