Skip to content
/ HI4HC Public

HI4HC and AAAAD: Exploring a Hierarchical Method and Dataset Using Hybrid Intelligence for Remote Sensing Scene Captioning

License

Notifications You must be signed in to change notification settings

jaycecd/HI4HC

Repository files navigation

HI4HC and AAAAD: Exploring a Hierarchical Method and Dataset Using Hybrid Intelligence for Remote Sensing Scene Captioning

This is an ongoing project. We will continuously enhance the model's performance and update the dataset. Click the Star in the top right corner to follow us and stay updated with the latest research outcomes!

🔥 Updates

  • [2025.03.05] The key value in the AAAAD has been modified to "element" to maintain consistency with the paper.
  • [2024.12.10] We have released AAAAD, feel free to use it! see How to use AAAAD?

⭐ Visualization

  • Our HI4HC-WebUI for automatically filtering and supplementing geographical element labels for GUT, CUT, and CDT:

  • HI4HC_demonstration_video.mp4

    Please wear headphones for a better understanding of HI4HC_WebUI

Note: The complete code will be released after the paper is accepted.

Table of content

  1. Overall Strategy for Hierarchical Captioning for Remote Sensing Scenes
  2. HI4HC: Hybrid Intelligence for Remote Sensing Scene Hierarchical Captioning
  3. AAAAD: A Hierarchical Caption Dataset for Remote Sensing Scene Based on Hybrid Intelligence
  4. Paper
  5. Acknowledgement
  6. License

1.Overall Strategy for Hierarchical Captioning for Remote Sensing Scenes

2.HI4HC: Hybrid Intelligence for Remote Sensing Scene Hierarchical Captioning

  • Global and category knowledge graphs used for automatic cleansing and supplementing geographical element labels.

3.AAAAD: A Hierarchical Caption Dataset for Remote Sensing Scene Based on Hybrid Intelligence

How to use AAAAD?

The AAAAD dataset consists of two parts: a remote sensing imagery dataset and a hierarchical description dataset. The remote sensing imagery dataset is derived from the AID dataset, while the hierarchical caption dataset includes geographical element captions, spatial relation captions, and scene-level captions.

  1. Download the remote sensing imagery dataset from AAAAD: The remote sensing imagery in AAAAD is sourced from the AID dataset. You can either download the AID dataset and preprocess (center-cropped) it to 512x512 resolution or directly download our preprocessed version:

  2. Download the hierarchical description dataset from AAAAD:

    • The Dataset_AAAAD_Hierarchical_Caption.json file in this repository contains the hierarchical caption dataset of AAAAD, structured as follows:
{
    "dataset": "AAAAD",
    "category": 
    {
        "church": 
        [
            {
                "image_id": 1,
                "file_name": "church_1.png",
                "split": "test",
                "hierarchical_caption": 
                {
                    "element": "building, city, cityscape, skyscraper, scenery, architecture, library, tower, street, real world location, town, outdoors, road, house, from above, car, tree, water, fountain",
                    "relation": "this aerial photo depicts a city area with multiple buildings and structures. the most striking feature is a large elliptical building with a blue-green roof, possibly a stadium or auditorium. surrounding this central structure are various other buildings of different shapes and sizes, including a semi-circular design adjacent to the elliptical structure.",
                    "scene": "commercial"
                }
            }
        ]
    }
}
  • Qualitative comparison between existing RSI caption datasets and AAAAD (ours).

  • Quantitative statistical results of AAAAD.

  • Quantitative comparison between AAAAD and existing remote sensing caption datasets.

  • Comparison of AAAAD and existing remote sensing caption datasets across different dimensions (element, attributes, spatial relations).

  • Statistical analysis of AAAAD and existing remote sensing caption datasets.

  • Comparison of semantic similarity between AAAAD and existing remote sensing caption datasets.

  • Direct comparison of remote sensing scenes generated by different algorithms using traditional single-level captions and hierarchical captions as prompts.

4.Paper

HI4HC and AAAAD: Exploring a Hierarchical Method and Dataset Using Hybrid Intelligence for Remote Sensing Scene Captioning

Please cite the following paper if you find it useful for your research:

@article{ren2024hi4hc,
title = {HI4HC and AAAAD: Exploring a hierarchical method and dataset using hybrid intelligence for remote sensing scene captioning},
journal = {International Journal of Applied Earth Observation and Geoinformation},
author={Jiaxin Ren, Wanzeng Liu, Jun Chen, and Shunxi Yin},
volume = {139},
pages = {104491},
year = {2025},
issn = {1569-8432},
doi = {https://doi.org/10.1016/j.jag.2025.104491},
url = {https://www.sciencedirect.com/science/article/pii/S1569843225001384}
}

5.Acknowledgement

  • Kohya's GUI. This repository primarily provides a Gradio GUI for Kohya's Stable Diffusion trainers. Moreover, we drew inspiration from its annotator's WebUI to implement automatic filtering of geographical element labels for GUT, CUT, and CDT.
  • Deep Danbooru. A deep learning model trained on the Danbooru dataset using the ResNet architecture, specifically designed for recognizing and tagging content and attributes in anime-style images.
  • WD14. An advanced version of Deep Danbooru, combining a larger dataset and deeper network structure to support a broader range of tags and improve tag prediction accuracy.
  • BLIP-2. A model that unifies the framework for visual-language pre-training and fine-tuning, enabling multimodal learning and cross-modal understanding.

7.License

This repo is distributed under MIT License. The code can be used for academic purposes only.

About

HI4HC and AAAAD: Exploring a Hierarchical Method and Dataset Using Hybrid Intelligence for Remote Sensing Scene Captioning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published