Automated LCI Data Extraction

💫This repository provides an automated data extraction and matrix construction tool for databases such as Ecoinvent. It can directly process native EcoSpold v2 (.spold) files without relying on commercial software, automatically handling data parsing, cleaning, and standardization, and ultimately generating a sparse flow × process matrix.

With this workflow, researchers can:

Efficiently extract and integrate LCI data, avoiding the complexity and errors of manual handling;

Obtain a high-quality numerical matrix consisting of tens of thousands of processes and flows ;

Retain rich textual information to support semantic modeling and machine learning tasks;

Build a solid data foundation for missing data prediction and automated LCA analysis.

Input and Output

Input

EcoSpold v2 .spold datasets from the Ecoinvent 3.11 various release (drop them under data/spold/ or point ECOSPOLD_ROOT to another directory).
FilenameToActivityLookup.csv (semicolon separated) that maps file prefixes to activity names and locations (defaults to data/FilenameToActivityLookup.csv, override with FILENAME_LOOKUP).
batch_number.txt, which stores the rolling batch ID (created automatically in outputs/ unless you set LCA_BATCH_FILE).

Output Each run creates a timestamped batch folder (<output_root>/<MMDD>_<batch_number>/) containing:

A CSV per .spold file with cleaned intermediate and elementary exchanges (same basename as the source file).
Logs and diagnostics: processing_debug.txt, summary.txt, failed_files.txt, and the spotlight lists non1_amount_files.csv and neg1_amount_files.csv.
global_activity_mapping.csv, recording every activity ID discovered across the dataset.

The helper build_lca_matrix step then consolidates the per-activity CSVs into a sparse flow × process matrix and writes it to LCA_MATRIX_TARGET (defaults to the batch folder).

Environment overrides

ECOSPOLD_ROOT — directory that holds raw .spold files (default: data/spold/).
FILENAME_LOOKUP — path to the lookup CSV (default: data/FilenameToActivityLookup.csv).
LCA_OUTPUT_ROOT — parent directory for batch outputs (default: outputs/).
LCA_BATCH_FILE — custom location for the batch counter file.
LCA_MATRIX_SOURCE/LCA_MATRIX_TARGET — override the matrix builder’s input/output paths.
COMPARE_DIR1/COMPARE_DIR2 — set when using the directory comparison helper.

📝Run

⛏️Installation

We recommend using a virtual environment.

git clone https://github.com/IceLab-X/AI4LCA_LCI_Data_Extraction_from_ecoinvent.git
cd Automated-LCI-Data-Extraction-Protocol
python3 -m venv .venv && source .venv/bin/activate  (in Linux, or in Windows: python -m venv .venv && .venv\Scripts\activate)
pip install -r requirements.txt

▶️ Running the pipeline

Place licensed .spold files under data/spold/ (or export ECOSPOLD_ROOT).
Copy FilenameToActivityLookup.csv into data/ (or set FILENAME_LOOKUP).
Launch the notebook (JUPYTER_CONFIG_DIR=. jupyter lab in Linux, or jupyter lab' in Windows) and run Automated_LCI_Data_Extraction_Protocol.ipynb` top to bottom.
It takes time （for ecoinvent 3.11 dataset, takes 2-3 hours）to run the code, Please check Python 3 (ipykernel) in the jupyter for the operation state.
Inspect the new batch folder under outputs/ for CSVs (the CSV file in Ecoinvent 3.11 is more than 2GB and hence it has exceed the limit of Microsoft Excel, try VS code or any other application), logs, and diagnostics.
(Optional) Rerun the matrix builder with LCA_MATRIX_SOURCE/LCA_MATRIX_TARGET to aggregate a specific batch elsewhere.

Data Preparation

To run this project, you need to download the ecoinvent 3.11 datasets (ecoSpold02 format). Go to the ecoinvent website and download one of the following archives (choose the system model you need):

ecoinvent 3.11_cutoff_ecoSpold02.7z
ecoinvent 3.11_consequential_ecoSpold02.7z
ecoinvent 3.11_apos_ecoSpold02.7z

⚠️ Download only one of the three, depending on your application (cut-off, consequential, or APOS).

After extracting, you will get a folder named like:

ecoinvent 3.11_cutoff_ecoSpold02
ecoinvent 3.11_consequential_ecoSpold02
ecoinvent 3.11_apos_ecoSpold02

Inside this folder you should see at least two subfolders:

datasets/ (contains .spold files)
MasterData/ (contains .xml files)

Place the extracted folder under the data/ directory of this repository, for example:

project_root/
│
├─ data/
│   ├─ spold/                        # (symlink here if desired)
│   ├─ ecoinvent 3.11_cutoff_ecoSpold02/
│   │    ├─ datasets/
│   │    └─ MasterData/
│
└─ outputs/

Now you can run the processing scripts as described above.

💐Contributing to Automated LCI Data Extraction Protocol

Reporting bugs. To report a bug, simply open an issue in the GitHub Issues.
Suggesting enhancements. To submit an enhancement suggestion, including completely new features or minor improvements on existing features, please open an issue in the GitHub Issues.
Pull requests. If you made improvements to FidelityFusion, fixed a bug, or had a new example, feel free to send us a pull-request.
Asking questions. To get help on how to use FidelityFusion or its functionalities, you can open a discussion in the GitHub.

🤗Citation

💥Please cite our paper if you find it helpful :) SemaNet: Bridging Words and Numbers For Predicting Missing Environmental Data in Life Cycle Assessment. DOI: https://doi.org/10.1021/acs.est.5c07557

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Doc 中文		Doc 中文
data		data
.DS_Store		.DS_Store
Automated_LCI_Data_Extraction_Protocol.ipynb		Automated_LCI_Data_Extraction_Protocol.ipynb
Automated_LCI_Data_Extraction_Protocol_CN.ipynb		Automated_LCI_Data_Extraction_Protocol_CN.ipynb
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automated LCI Data Extraction

Input and Output

Environment overrides

📝Run

⛏️Installation

▶️ Running the pipeline

Data Preparation

💐Contributing to Automated LCI Data Extraction Protocol

🤗Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Automated LCI Data Extraction

Input and Output

Environment overrides

📝Run

⛏️Installation

▶️ Running the pipeline

Data Preparation

💐Contributing to Automated LCI Data Extraction Protocol

🤗Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages