Skip to content

Releases: MyWebIntelligence/My-Web-Intelligence-v2

MWI 1.0.0 — Reproducible Digital Methods for Web Controversy Analysis Flagship Release

26 Jan 13:01
ccdd6eb

Choose a tag to compare

My Web Intelligence 1.0.0

First stable release of the flagship repository for reproducible web corpus analysis.

MWI is a Python-based research tool designed for digital humanities and information-communication sciences (ICS). It enables researchers to collect, qualify, and analyze web corpora with full methodological transparency.

Purpose

MWI operationalizes the "pragmatics of digital enunciation" framework, providing computational tools to:

  • Map digital ecosystems and online controversies
  • Identify strategic positions of speakers in heterogeneous web corpora
  • Analyze discourse dynamics through network and semantic analysis

Key Features

Data Collection

  • Focus crawling with configurable depth and relevance scoring
  • SerpAPI integration for bootstrapping corpora from search engines
  • Mercury Parser pipeline for high-quality content extraction
  • SEO Rank enrichment for domain authority metrics
  • Media analysis (images, videos, audio) with metadata extraction

Semantic Analysis

  • Paragraph-level embeddings (OpenAI, Mistral, Gemini, HuggingFace, Ollama)
  • Pseudolinks computation — semantic similarity between paragraphs across pages
  • NLI classification (entailment/neutral/contradiction) via Cross-Encoder models
  • Cosine and LSH-based similarity for scalable corpus analysis

Export and Visualization

  • CSV exports (pages, domains, media, pseudolinks)
  • GEXF graphs for Gephi network visualization
  • Aggregated exports at page and domain levels
  • Raw corpus export for external NLP pipelines

Infrastructure

  • Docker and Docker Compose support
  • SQLite database with migration system
  • Bilingual documentation (English/French)
  • Automated tests with pytest

Installation

# Docker Compose (recommended)
./scripts/docker-compose-setup.sh

# Local Python
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
python scripts/install-basic.py
python mywi.py db setup

Documentation

Citation

If you use MWI in your research, please cite:

@article{lakel2021mwi,
  author = {Lakel, Amar},
  title = {My web intelligence : un outil pour l'analyse du web et des réseaux},
  journal = {I2D - Information, données \& documents},
  year = {2021},
  volume = {1},
  number = {1},
  pages = {96--103},
  doi = {10.3917/i2d.211.0096}
}

Related Resources

Institutional Context

Developed at MICA Laboratory (Mediation, Information, Communication, Arts), Universite Bordeaux Montaigne, as part of the E3D research team (Etudes Digitales: du Dispositif au Document).

Migration Notice

This repository (mwi) is the active flagship. The legacy JavaScript repository (MyWebIntelligence) is now archived.

What's New in 1.0.0

  • Complete rewrite in Python with modular architecture
  • Paragraph-level embedding pipeline with multiple providers
  • NLI-based semantic relation classification
  • Pseudolinks computation and export (paragraph/page/domain levels)
  • Docker Compose deployment with interactive setup
  • Mercury Parser integration for content extraction
  • SEO Rank API integration
  • Comprehensive test suite
  • Bilingual documentation

Contributing

Contributions welcome. Please open an issue to discuss proposed changes before submitting a pull request.

License

MIT License — see LICENSE


Maintainer: Amar Lakel (@alakel)
Lab: MICA, Universite Bordeaux Montaigne
Contact: amar.lakel@u-bordeaux-montaigne.fr

MWI 1.0.0 — Reproducible Digital Methods for Web Controversy Analysis Flagship Release

26 Jan 12:19
ccdd6eb

Choose a tag to compare

My Web Intelligence 1.0.0

First stable release of the flagship repository for reproducible web corpus analysis.
Zenodo Link
MWI is a Python-based research tool designed for digital humanities and information-communication sciences (ICS). It enables researchers to collect, qualify, and analyze web corpora with full methodological transparency.

Purpose

MWI operationalizes the "pragmatics of digital enunciation" framework, providing computational tools to:

  • Map digital ecosystems and online controversies
  • Identify strategic positions of speakers in heterogeneous web corpora
  • Analyze discourse dynamics through network and semantic analysis

Key Features

Data Collection

  • Focus crawling with configurable depth and relevance scoring
  • SerpAPI integration for bootstrapping corpora from search engines
  • Mercury Parser pipeline for high-quality content extraction
  • SEO Rank enrichment for domain authority metrics
  • Media analysis (images, videos, audio) with metadata extraction

Semantic Analysis

  • Paragraph-level embeddings (OpenAI, Mistral, Gemini, HuggingFace, Ollama)
  • Pseudolinks computation — semantic similarity between paragraphs across pages
  • NLI classification (entailment/neutral/contradiction) via Cross-Encoder models
  • Cosine and LSH-based similarity for scalable corpus analysis

Export and Visualization

  • CSV exports (pages, domains, media, pseudolinks)
  • GEXF graphs for Gephi network visualization
  • Aggregated exports at page and domain levels
  • Raw corpus export for external NLP pipelines

Infrastructure

  • Docker and Docker Compose support
  • SQLite database with migration system
  • Bilingual documentation (English/French)
  • Automated tests with pytest

Installation

# Docker Compose (recommended)
./scripts/docker-compose-setup.sh

# Local Python
python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
python scripts/install-basic.py
python mywi.py db setup

Documentation

Citation

If you use MWI in your research, please cite:

@article{lakel2021mwi,
  author = {Lakel, Amar},
  title = {My web intelligence : un outil pour l'analyse du web et des réseaux},
  journal = {I2D - Information, données \& documents},
  year = {2021},
  volume = {1},
  number = {1},
  pages = {96--103},
  doi = {10.3917/i2d.211.0096}
}

Related Resources

Institutional Context

Developed at MICA Laboratory (Mediation, Information, Communication, Arts), Universite Bordeaux Montaigne, as part of the E3D research team (Etudes Digitales: du Dispositif au Document).

Migration Notice

This repository (mwi) is the active flagship. The legacy JavaScript repository (MyWebIntelligence) is now archived.

What's New in 1.0.0

  • Complete rewrite in Python with modular architecture
  • Paragraph-level embedding pipeline with multiple providers
  • NLI-based semantic relation classification
  • Pseudolinks computation and export (paragraph/page/domain levels)
  • Docker Compose deployment with interactive setup
  • Mercury Parser integration for content extraction
  • SEO Rank API integration
  • Comprehensive test suite
  • Bilingual documentation

Contributing

Contributions welcome. Please open an issue to discuss proposed changes before submitting a pull request.

License

MIT License — see LICENSE


Maintainer: Amar Lakel (@alakel)
Lab: MICA, Universite Bordeaux Montaigne
Contact: amar.lakel@u-bordeaux-montaigne.fr