An intelligent system that automatically generates Failure Mode and Effects Analysis (FMEA) from both structured and unstructured data using Large Language Models
- Overview
- Features
- Architecture
- Installation
- Quick Start
- Usage
- Configuration
- Project Structure
- Examples
- API Reference
- Contributing
- License
Traditional FMEA is manual, time-consuming, and expert-dependent. This system revolutionizes the process by:
- Automating extraction of failure information from customer reviews, complaints, and reports
- Processing structured data from Excel/CSV files
- Using LLMs for intelligent semantic understanding
- Computing risk scores (Severity, Occurrence, Detection)
- Generating actionable insights with recommended actions
Organizations receive failure information in multiple formats:
- Unstructured: Customer reviews, complaint text, incident reports
- Structured: Excel spreadsheets, CSV files with failure data
This system provides a unified, intelligent solution to convert all these inputs into a standardized FMEA.
- ✅ Dual Input Support: Process both structured and unstructured data
- 🤖 LLM-Powered Extraction: Uses Mistral/LLaMA/GPT models for intelligent entity extraction
- 📊 Automated Risk Scoring: Calculates S, O, D scores and RPN automatically
- 🎯 Action Priority Classification: Categorizes risks as Critical, High, Medium, Low
- 📈 Visual Analytics: Interactive dashboards with charts and risk matrices
- 💾 Multiple Export Formats: Excel, CSV, JSON
- 🔄 Hybrid Processing: Combine multiple data sources seamlessly
- 🚀 Production-Ready: Modular, extensible, well-documented code
- NLP Processing: Sentiment analysis, keyword extraction, text cleaning
- Rule-Based Fallback: Works even without LLM for faster processing
- Batch Processing: Handle large datasets efficiently
- Deduplication: Intelligent removal of similar failure modes
- Configurable: YAML-based configuration for easy customization
User Input (Text/CSV/Excel)
↓
┌────────────────────┐
│ Data Preprocessing │ ← Text cleaning, validation, sentiment analysis
└────────────────────┘
↓
┌────────────────────┐
│ LLM Extraction │ ← Extract: Failure Mode, Effect, Cause, Component
└────────────────────┘
↓
┌────────────────────┐
│ Risk Scoring │ ← Calculate: Severity, Occurrence, Detection
└────────────────────┘
↓
┌────────────────────┐
│ FMEA Generator │ ← Compute RPN, prioritize, recommend actions
└────────────────────┘
↓
┌────────────────────┐
│ Output & Export │ ← Dashboard, Excel, CSV, JSON
└────────────────────┘
- Python 3.9 or higher
- 8GB RAM minimum (16GB recommended for LLM)
- GPU (optional, for faster LLM inference)
git clone <repository-url>
cd Symboisis# Windows
python -m venv venv
venv\Scripts\activate
# Linux/Mac
python3 -m venv venv
source venv/bin/activatepip install -r requirements.txtpython -c "import nltk; nltk.download('punkt'); nltk.download('stopwords'); nltk.download('averaged_perceptron_tagger')"python -m spacy download en_core_web_sm# Copy example environment file
copy .env.example .env
# Edit .env with your settings (optional)FASTEST WAY - Process your actual datasets:
python process_my_data.pyThis will automatically process:
- ✅ Your FMEA.csv (161 industrial failure modes)
- ✅ Car reviews from archive (3) folder (Ford, Toyota, Honda)
- ✅ Create hybrid analysis combining both
- ✅ Export all results to
output/folder
📖 See YOUR_DATA_GUIDE.md for detailed instructions on working with your datasets!
streamlit run app.pyNavigate to http://localhost:8501 in your browser.
# From unstructured text
python cli.py --text reviews.csv --output fmea_output.xlsx
# From structured data
python cli.py --structured failures.csv --output fmea_output.xlsx
# Hybrid mode
python cli.py --text reviews.csv --structured failures.csv --output fmea_output.xlsxpython examples.pyThis will run 3 demonstration examples and generate sample FMEAs.
- Start the dashboard:
streamlit run app.py - Select input type: Unstructured, Structured, or Hybrid
- Upload files or paste text
- Click "Generate FMEA"
- View results: Metrics, tables, charts
- Export: Download as Excel or CSV
from fmea_generator import FMEAGenerator
import yaml
# Load configuration
with open('config/config.yaml', 'r') as f:
config = yaml.safe_load(f)
# Initialize generator
generator = FMEAGenerator(config)
# Generate from text
reviews = ["Brake failure on highway...", "Engine overheated..."]
fmea_df = generator.generate_from_text(reviews, is_file=False)
# Generate from structured file
fmea_df = generator.generate_from_structured('data.csv')
# Export
generator.export_fmea(fmea_df, 'output/fmea.xlsx', format='excel')# Basic usage
python cli.py --text input.csv --output result.xlsx
# With summary report
python cli.py --text input.csv --output result.xlsx --summary
# Faster rule-based mode (no LLM)
python cli.py --text input.csv --output result.xlsx --no-model
# Custom configuration
python cli.py --text input.csv --config custom_config.yaml --output result.xlsxEdit config/config.yaml to customize:
model:
name: "mistralai/Mistral-7B-Instruct-v0.2" # LLM model
max_length: 512
temperature: 0.3
device: "auto" # auto, cuda, cpu
quantization: true # Use 4-bit quantizationrisk_scoring:
severity:
high_keywords: ["critical", "catastrophic", "severe"]
medium_keywords: ["moderate", "significant"]
low_keywords: ["minor", "slight"]
default: 5text_processing:
min_review_length: 10
negative_threshold: 0.3 # Sentiment threshold
max_reviews_per_batch: 100
enable_sentiment_filter: trueSymboisis/
├── src/
│ ├── preprocessing.py # Data preprocessing module
│ ├── llm_extractor.py # LLM-based extraction
│ ├── risk_scoring.py # Risk scoring engine
│ ├── fmea_generator.py # Main FMEA generator
│ └── utils.py # Utility functions
├── config/
│ └── config.yaml # Configuration file
├── output/ # Generated FMEAs
├── archive (3)/ # Sample car review data
├── app.py # Streamlit dashboard
├── cli.py # Command-line interface
├── examples.py # Usage examples
├── requirements.txt # Python dependencies
├── .env.example # Environment variables template
└── README.md # This file
reviews = [
"Brake failure during heavy rain, very dangerous!",
"Engine overheated and seized, no warning lights."
]
fmea_df = generator.generate_from_text(reviews, is_file=False)Output:
| Failure Mode | Effect | Severity | Occurrence | Detection | RPN | Priority |
|---|---|---|---|---|---|---|
| Brake failure | Unable to stop | 10 | 7 | 8 | 560 | Critical |
| Engine seized | Vehicle breakdown | 9 | 6 | 7 | 378 | High |
Input CSV:
failure_mode,effect,cause,component
Brake system failure,Cannot stop vehicle,Worn brake pads,Brake System
Engine overheating,Engine damage,Coolant leak,Cooling Systemfmea_df = generator.generate_from_structured('failures.csv')# Process actual car review data
fmea_df = generator.generate_from_text('archive (3)/Scraped_Car_Review_ford.csv', is_file=True)
# Generate summary
from utils import generate_summary_report
print(generate_summary_report(fmea_df))Main class for FMEA generation.
generate_from_text(text_input, is_file=False)
- Generate FMEA from unstructured text
- Args:
text_input(str or list),is_file(bool) - Returns: DataFrame
generate_from_structured(file_path)
- Generate FMEA from structured CSV/Excel
- Args:
file_path(str) - Returns: DataFrame
generate_hybrid(structured_file, text_input)
- Generate FMEA from both sources
- Args:
structured_file(str),text_input(str or list) - Returns: DataFrame
export_fmea(fmea_df, output_path, format='excel')
- Export FMEA to file
- Args:
fmea_df(DataFrame),output_path(str),format(str)
Handles data cleaning and preprocessing.
Extracts failure information using LLMs.
Calculates risk scores and RPN.
Run the examples to test the system:
python examples.pyThis will:
- Generate FMEA from sample reviews
- Process structured data
- Analyze real car reviews (if available)
- Analyze equipment failure reports
- Process quality control data
- Generate preventive maintenance schedules
- Process customer complaints
- Analyze warranty claims
- Identify safety issues
- Analyze adverse event reports
- Process medical device failures
- Improve patient safety
- Analyze bug reports
- Process incident tickets
- Identify system vulnerabilities
This system is suitable for:
- Academic research papers
- Case studies
- Benchmarking studies
- Tool comparisons
- Industry reports
Key Advantages:
- Reproducible results
- Configurable parameters
- Comprehensive logging
- Export capabilities
# Use rule-based mode instead
python cli.py --text input.csv --output result.xlsx --no-model- Enable quantization in config.yaml
- Use smaller batch sizes
- Use rule-based mode
- Use GPU if available
- Enable quantization
- Reduce batch size
- Use rule-based mode for faster results
| Mode | Speed | Accuracy |
|---|---|---|
| LLM (GPU) | ~2 reviews/sec | High |
| LLM (CPU) | ~0.3 reviews/sec | High |
| Rule-based | ~50 reviews/sec | Medium |
| Component | Minimum | Recommended |
|---|---|---|
| RAM | 8GB | 16GB |
| GPU | None | 8GB VRAM |
| Disk | 2GB | 10GB |
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Submit a pull request
This project is licensed under the MIT License.
Developed as a production-grade academic and industry project for automated FMEA generation.
- HuggingFace for transformer models
- Streamlit for dashboard framework
- Open-source community
For issues, questions, or suggestions:
- Open an issue on GitHub
- Check the documentation
- Run examples for guidance
- Fine-tuned domain-specific models
- Fuzzy FMEA support
- Real-time monitoring
- Multi-language support
- Integration with PLM systems
- Advanced analytics
- Mobile app
Transforming failure analysis with AI