Releases: MyWebIntelligence/My-Web-Intelligence-v2
MWI 1.0.0 — Reproducible Digital Methods for Web Controversy Analysis Flagship Release
My Web Intelligence 1.0.0
First stable release of the flagship repository for reproducible web corpus analysis.
MWI is a Python-based research tool designed for digital humanities and information-communication sciences (ICS). It enables researchers to collect, qualify, and analyze web corpora with full methodological transparency.
Purpose
MWI operationalizes the "pragmatics of digital enunciation" framework, providing computational tools to:
- Map digital ecosystems and online controversies
- Identify strategic positions of speakers in heterogeneous web corpora
- Analyze discourse dynamics through network and semantic analysis
Key Features
Data Collection
- Focus crawling with configurable depth and relevance scoring
- SerpAPI integration for bootstrapping corpora from search engines
- Mercury Parser pipeline for high-quality content extraction
- SEO Rank enrichment for domain authority metrics
- Media analysis (images, videos, audio) with metadata extraction
Semantic Analysis
- Paragraph-level embeddings (OpenAI, Mistral, Gemini, HuggingFace, Ollama)
- Pseudolinks computation — semantic similarity between paragraphs across pages
- NLI classification (entailment/neutral/contradiction) via Cross-Encoder models
- Cosine and LSH-based similarity for scalable corpus analysis
Export and Visualization
- CSV exports (pages, domains, media, pseudolinks)
- GEXF graphs for Gephi network visualization
- Aggregated exports at page and domain levels
- Raw corpus export for external NLP pipelines
Infrastructure
- Docker and Docker Compose support
- SQLite database with migration system
- Bilingual documentation (English/French)
- Automated tests with pytest
Installation
# Docker Compose (recommended)
./scripts/docker-compose-setup.sh
# Local Python
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
python scripts/install-basic.py
python mywi.py db setupDocumentation
Citation
If you use MWI in your research, please cite:
@article{lakel2021mwi,
author = {Lakel, Amar},
title = {My web intelligence : un outil pour l'analyse du web et des réseaux},
journal = {I2D - Information, données \& documents},
year = {2021},
volume = {1},
number = {1},
pages = {96--103},
doi = {10.3917/i2d.211.0096}
}Related Resources
- R Package: mwiR — R analysis bridge for MWI exports
- Data Repository: NAKALA Collection
- Project Website: mywebintelligence.net
Institutional Context
Developed at MICA Laboratory (Mediation, Information, Communication, Arts), Universite Bordeaux Montaigne, as part of the E3D research team (Etudes Digitales: du Dispositif au Document).
Migration Notice
This repository (mwi) is the active flagship. The legacy JavaScript repository (MyWebIntelligence) is now archived.
What's New in 1.0.0
- Complete rewrite in Python with modular architecture
- Paragraph-level embedding pipeline with multiple providers
- NLI-based semantic relation classification
- Pseudolinks computation and export (paragraph/page/domain levels)
- Docker Compose deployment with interactive setup
- Mercury Parser integration for content extraction
- SEO Rank API integration
- Comprehensive test suite
- Bilingual documentation
Contributing
Contributions welcome. Please open an issue to discuss proposed changes before submitting a pull request.
License
MIT License — see LICENSE
Maintainer: Amar Lakel (@alakel)
Lab: MICA, Universite Bordeaux Montaigne
Contact: amar.lakel@u-bordeaux-montaigne.fr
MWI 1.0.0 — Reproducible Digital Methods for Web Controversy Analysis Flagship Release
My Web Intelligence 1.0.0
First stable release of the flagship repository for reproducible web corpus analysis.
Zenodo Link
MWI is a Python-based research tool designed for digital humanities and information-communication sciences (ICS). It enables researchers to collect, qualify, and analyze web corpora with full methodological transparency.
Purpose
MWI operationalizes the "pragmatics of digital enunciation" framework, providing computational tools to:
- Map digital ecosystems and online controversies
- Identify strategic positions of speakers in heterogeneous web corpora
- Analyze discourse dynamics through network and semantic analysis
Key Features
Data Collection
- Focus crawling with configurable depth and relevance scoring
- SerpAPI integration for bootstrapping corpora from search engines
- Mercury Parser pipeline for high-quality content extraction
- SEO Rank enrichment for domain authority metrics
- Media analysis (images, videos, audio) with metadata extraction
Semantic Analysis
- Paragraph-level embeddings (OpenAI, Mistral, Gemini, HuggingFace, Ollama)
- Pseudolinks computation — semantic similarity between paragraphs across pages
- NLI classification (entailment/neutral/contradiction) via Cross-Encoder models
- Cosine and LSH-based similarity for scalable corpus analysis
Export and Visualization
- CSV exports (pages, domains, media, pseudolinks)
- GEXF graphs for Gephi network visualization
- Aggregated exports at page and domain levels
- Raw corpus export for external NLP pipelines
Infrastructure
- Docker and Docker Compose support
- SQLite database with migration system
- Bilingual documentation (English/French)
- Automated tests with pytest
Installation
# Docker Compose (recommended)
./scripts/docker-compose-setup.sh
# Local Python
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
python scripts/install-basic.py
python mywi.py db setupDocumentation
Citation
If you use MWI in your research, please cite:
@article{lakel2021mwi,
author = {Lakel, Amar},
title = {My web intelligence : un outil pour l'analyse du web et des réseaux},
journal = {I2D - Information, données \& documents},
year = {2021},
volume = {1},
number = {1},
pages = {96--103},
doi = {10.3917/i2d.211.0096}
}Related Resources
- R Package: mwiR — R analysis bridge for MWI exports
- Data Repository: NAKALA Collection
- Project Website: mywebintelligence.net
Institutional Context
Developed at MICA Laboratory (Mediation, Information, Communication, Arts), Universite Bordeaux Montaigne, as part of the E3D research team (Etudes Digitales: du Dispositif au Document).
Migration Notice
This repository (mwi) is the active flagship. The legacy JavaScript repository (MyWebIntelligence) is now archived.
What's New in 1.0.0
- Complete rewrite in Python with modular architecture
- Paragraph-level embedding pipeline with multiple providers
- NLI-based semantic relation classification
- Pseudolinks computation and export (paragraph/page/domain levels)
- Docker Compose deployment with interactive setup
- Mercury Parser integration for content extraction
- SEO Rank API integration
- Comprehensive test suite
- Bilingual documentation
Contributing
Contributions welcome. Please open an issue to discuss proposed changes before submitting a pull request.
License
MIT License — see LICENSE
Maintainer: Amar Lakel (@alakel)
Lab: MICA, Universite Bordeaux Montaigne
Contact: amar.lakel@u-bordeaux-montaigne.fr