Atombridge is a Streamlit app that helps you go from a PDF (with STEM/TEM figures) or a STEM/TEM image crop to valid crystal structures in CIF format. It combines:
- Figure extraction and LLM-assisted figure selection from papers
- Slider-based image cropping with lattice analysis from the crop
- ASE RAG–guided code generation to produce structures and CIFs
- Built-in validation: 3D viewer, atom-overlap checks, and optional M3GNET relaxation
- Create an environment (Windows/macOS/Linux; Python 3.10 strongly recommended)
python -m venv .venv
. .venv/Scripts/activate # Windows: .\.venv\Scripts\activate
python -m pip install --upgrade pip
pip install -r requirements.txt- Optional: set your Google GenAI key for Gemini
set GOOGLE_API_KEY=your_key_here # PowerShell: $env:GOOGLE_API_KEY='...'- Run the app
streamlit run streamlit_app.pyOpen the URL printed by Streamlit (usually http://localhost:8501).
- Select a paper (upload or from
papers/) and click "Extract Figures". - Pick a CIF and click "Auto-select figure for CIF" to let the LLM choose the most relevant STEM/TEM micrograph (not plots/graphs).
- Crop a region using sliders and preview the crop.
- Click "Analyze lattice (STEM)" and enter a scale (nm per pixel). The app estimates two in‑plane lattice vectors and the inter‑vector angle.
- Click "Create CIF" to generate structures guided by paper text and (if present) your measured lattice constraints.
- Visualize in 3D (py3Dmol), check atom overlaps, and (optionally) validate stability with M3GNET.
- LLM‑guided figure selection: Chooses only STEM/TEM micrographs using caption/page text and CIF hints.
- Robust cropping + analysis: Simple slider crop, CLAHE enhancement, multi‑strategy peak detection, and DBSCAN clustering. If that fails, a Fourier‑domain fallback recovers lattice vectors from FFT peaks.
- Improved figure extraction:
extract_figures_v2merges nearby image regions and segments subfigures with adaptive thresholding. - Codegen via ASE RAG: Uses Gemini with a small retrieval set from the ASE source tree to produce solid‑starter Python code and CIFs.
- Validation built‑in:
- 3D unit‑cell viewer (py3Dmol)
- Atom‑overlap check with pass/fail banners and per‑file min‑distance details
- Optional M3GNET relaxation ("Validate stable structure") with per‑file energies
- Google GenAI (Gemini): The app uses LangChain’s
init_chat_modelwith providergoogle_genai. SupplyGOOGLE_API_KEYin your environment or in the app sidebar. Default model isgemini-2.5-flash; you can choosegemini-2.5-pro. - Materials Project API (optional): Enter your MP API key in the sidebar to enable MP validation tools.
- Python: 3.10 recommended (for best compatibility with optional scientific stacks).
- Requirements: see
requirements.txt. The core app avoids heavy scientific deps by default. - Optional M3GNET: If you click "Validate stable structure," the app attempts a one‑time
conda install --no-deps m3gnetinto your active conda env (requires Conda to be available). If that’s not possible, the UI will instruct you to run the command manually. - Windows: The app forces UTF‑8 I/O to avoid charmap errors when handling non‑ASCII text. The file watcher is set to "poll" to avoid cross‑drive errors.
streamlit_app.py: Main UI and orchestration.src/figures.py:extract_figures_v2: Merged‑bbox figure discovery + subfigure segmentation + caption association.- Heuristics for
Figure.is_temand TEM relevance scoring.
src/stem_analysis.py:measure_lattice_vectors: CLAHE enhancement + peak detection (skimage/OpenCV/morphology) + neighbor clustering + lattice vectors; FFT fallback when needed.minimal_cif_from_lattice: Writes a minimal 2D CIF with the measured lattice.
src/create_ASE_RAG.py: ASE RAG builder; retrieves relevant ASE snippets and queries Gemini to synthesize code.src/utils_paper_and_code.py: PDF parsing, code extraction, and robust subprocess execution.src/structure_checks.py:check_atom_distances+distance_summary: Per‑CIF overlap validation and min‑distance reporting.validate_m3gnet: Optional M3GNET relaxation. Tries to auto‑installm3gnetinto the active conda env (no deps) if missing.
- Load a paper and extract figures.
- If you already have a CIF, use "Auto‑select figure for CIF". Otherwise, proceed to cropping a likely STEM region.
- Crop with sliders and preview.
- Analyze lattice (enter nm/pixel). If DBSCAN fails, the FFT fallback usually succeeds on periodic images.
- Generate CIFs. The app injects your measured lattice constraints into the LLM prompt.
- Inspect results:
- View in 3D
- Overlap check (green success or red details expander)
- Optional: Validate stable structure (M3GNET)
- "Atom‑overlap check failed …"
- Open the warnings expander to see which atoms are too close. Regenerate with better constraints or adjust structure code.
- "M3GNET validation unavailable …"
- Ensure you are running in a conda environment and have
condain PATH. The app attemptsconda install --no-deps m3gnet -p <your_env>when you first validate. If it fails, run that command manually.
- Ensure you are running in a conda environment and have
- "Lattice analysis failed: Could not find two primary lattice directions."
- Provide a slightly larger crop with consistent contrast and a correct scale. The FFT fallback will kick in automatically in most cases.
- Encoding errors like
charmap- Handled by the app’s UTF‑8 settings; if you see any, report where it occurred.
Issues and PRs are welcome! Please include screenshots or sample PDFs/images when reporting figure‑extraction or lattice‑analysis problems so we can reproduce and tune the detectors.
See LICENSE.