HI4HC and AAAAD: Exploring a Hierarchical Method and Dataset Using Hybrid Intelligence for Remote Sensing Scene Captioning

This is an ongoing project. We will continuously enhance the model's performance and update the dataset. Click the Star in the top right corner to follow us and stay updated with the latest research outcomes!

🔥 Updates

[2025.03.05] The key value in the AAAAD has been modified to "element" to maintain consistency with the paper.
[2024.12.10] We have released AAAAD, feel free to use it! see How to use AAAAD?

⭐ Visualization

Our HI4HC-WebUI for automatically filtering and supplementing geographical element labels for GUT, CUT, and CDT:
HI4HC_demonstration_video.mp4

Please wear headphones for a better understanding of HI4HC_WebUI

Note: The complete code will be released after the paper is accepted.

Table of content

Overall Strategy for Hierarchical Captioning for Remote Sensing Scenes
HI4HC: Hybrid Intelligence for Remote Sensing Scene Hierarchical Captioning
AAAAD: A Hierarchical Caption Dataset for Remote Sensing Scene Based on Hybrid Intelligence
Paper
Acknowledgement
License

1.Overall Strategy for Hierarchical Captioning for Remote Sensing Scenes

2.HI4HC: Hybrid Intelligence for Remote Sensing Scene Hierarchical Captioning

Global and category knowledge graphs used for automatic cleansing and supplementing geographical element labels.

3.AAAAD: A Hierarchical Caption Dataset for Remote Sensing Scene Based on Hybrid Intelligence

How to use AAAAD?

The AAAAD dataset consists of two parts: a remote sensing imagery dataset and a hierarchical description dataset. The remote sensing imagery dataset is derived from the AID dataset, while the hierarchical caption dataset includes geographical element captions, spatial relation captions, and scene-level captions.

Download the remote sensing imagery dataset from AAAAD: The remote sensing imagery in AAAAD is sourced from the AID dataset. You can either download the AID dataset and preprocess (center-cropped) it to 512x512 resolution or directly download our preprocessed version:
- Dataset_AAAAD_Imagery: Hugging Face or Baidu NetDisk (code: cjql)
Download the hierarchical description dataset from AAAAD:
- The Dataset_AAAAD_Hierarchical_Caption.json file in this repository contains the hierarchical caption dataset of AAAAD, structured as follows:

{
    "dataset": "AAAAD",
    "category": 
    {
        "church": 
        [
            {
                "image_id": 1,
                "file_name": "church_1.png",
                "split": "test",
                "hierarchical_caption": 
                {
                    "element": "building, city, cityscape, skyscraper, scenery, architecture, library, tower, street, real world location, town, outdoors, road, house, from above, car, tree, water, fountain",
                    "relation": "this aerial photo depicts a city area with multiple buildings and structures. the most striking feature is a large elliptical building with a blue-green roof, possibly a stadium or auditorium. surrounding this central structure are various other buildings of different shapes and sizes, including a semi-circular design adjacent to the elliptical structure.",
                    "scene": "commercial"
                }
            }
        ]
    }
}

Qualitative comparison between existing RSI caption datasets and AAAAD (ours).
Quantitative statistical results of AAAAD.
Quantitative comparison between AAAAD and existing remote sensing caption datasets.
Comparison of AAAAD and existing remote sensing caption datasets across different dimensions (element, attributes, spatial relations).
Statistical analysis of AAAAD and existing remote sensing caption datasets.
Comparison of semantic similarity between AAAAD and existing remote sensing caption datasets.
Direct comparison of remote sensing scenes generated by different algorithms using traditional single-level captions and hierarchical captions as prompts.

4.Paper

HI4HC and AAAAD: Exploring a Hierarchical Method and Dataset Using Hybrid Intelligence for Remote Sensing Scene Captioning

Please cite the following paper if you find it useful for your research:

@article{ren2024hi4hc,
title = {HI4HC and AAAAD: Exploring a hierarchical method and dataset using hybrid intelligence for remote sensing scene captioning},
journal = {International Journal of Applied Earth Observation and Geoinformation},
author={Jiaxin Ren, Wanzeng Liu, Jun Chen, and Shunxi Yin},
volume = {139},
pages = {104491},
year = {2025},
issn = {1569-8432},
doi = {https://doi.org/10.1016/j.jag.2025.104491},
url = {https://www.sciencedirect.com/science/article/pii/S1569843225001384}
}

5.Acknowledgement

Kohya's GUI. This repository primarily provides a Gradio GUI for Kohya's Stable Diffusion trainers. Moreover, we drew inspiration from its annotator's WebUI to implement automatic filtering of geographical element labels for GUT, CUT, and CDT.
Deep Danbooru. A deep learning model trained on the Danbooru dataset using the ResNet architecture, specifically designed for recognizing and tagging content and attributes in anime-style images.
WD14. An advanced version of Deep Danbooru, combining a larger dataset and deeper network structure to support a broader range of tags and improve tag prediction accuracy.
BLIP-2. A model that unifies the framework for visual-language pre-training and fine-tuning, enabling multimodal learning and cross-modal understanding.

7.License

This repo is distributed under MIT License. The code can be used for academic purposes only.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
images		images
.gitignore		.gitignore
Dataset_AAAAD_Hierarchical_Caption.json		Dataset_AAAAD_Hierarchical_Caption.json
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HI4HC and AAAAD: Exploring a Hierarchical Method and Dataset Using Hybrid Intelligence for Remote Sensing Scene Captioning

🔥 Updates

⭐ Visualization

Table of content

1.Overall Strategy for Hierarchical Captioning for Remote Sensing Scenes

2.HI4HC: Hybrid Intelligence for Remote Sensing Scene Hierarchical Captioning

3.AAAAD: A Hierarchical Caption Dataset for Remote Sensing Scene Based on Hybrid Intelligence

How to use AAAAD?

4.Paper

5.Acknowledgement

7.License

About

Uh oh!

Releases

Packages

Uh oh!

License

jaycecd/HI4HC

Folders and files

Latest commit

History

Repository files navigation

HI4HC and AAAAD: Exploring a Hierarchical Method and Dataset Using Hybrid Intelligence for Remote Sensing Scene Captioning

🔥 Updates

⭐ Visualization

Table of content

1.Overall Strategy for Hierarchical Captioning for Remote Sensing Scenes

2.HI4HC: Hybrid Intelligence for Remote Sensing Scene Hierarchical Captioning

3.AAAAD: A Hierarchical Caption Dataset for Remote Sensing Scene Based on Hybrid Intelligence

How to use AAAAD?

4.Paper

5.Acknowledgement

7.License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Packages