Best practices in combining multi-hazard damage imagery training datasets for damage detection for a deep learning neural network

January 6, 2021 | by Earth Numerics Team

Importance of Rapid Damage Assessment

Accurate and timely assessment of building damage after natural disasters is essential for effective resource allocation and response. Traditional manual interpretation of satellite and aerial imagery is slow, labor-intensive, and costly, often involving large teams or crowdsourced labor to identify damaged structures.

The xBD Dataset

To address these challenges, the xBD dataset was created in partnership with DIU and humanitarian organizations. It contains pre- and post-disaster imagery from 19 natural disaster events (earthquakes, floods, hurricanes, wildfires, tsunamis, etc.). Buildings are labeled in four damage categories:

No damage
Minor damage
Major damage
Destroyed

xBD is one of the largest publicly available multi-disaster damage datasets, but it is highly imbalanced, with “no-damage” labels dominating.

Techniques for Handling Imbalanced Data

Data-level techniques: oversampling, undersampling, and synthetic sample generation (e.g., SMOTE).
Algorithm-level methods: adjusting class weights, modifying loss functions, or altering network architecture.
Hybrid strategies: combining data and algorithm enhancements.
Transfer learning: leveraging pre-trained models such as ImageNet-trained CNNs.

Techniques for Handling Imbalanced Data

Balanced datasets achieved significantly better performance. Key results include:
- Wind-related disasters performed best due to large sample sizes and balanced damage distributions.
- Earthquake data performed worst, as most images showed only “no-damage.”
- Flooding and tsunami datasets improved when pooled with other datasets, since their damage is harder to visually identify.
- Wildfire and volcanic events performed well individually and when pooled, as severe damage is visually obvious in imagery.

Challenges of Class Imbalance

Deep learning models trained on imbalanced datasets tend to overfit and perform poorly. Gathering balanced training data is difficult due to high costs, satellite coverage limitations, and the need for expert labeling or site visits. As a result, combining data across events or relying on imbalanced datasets is common, but this introduces performance limitations.

Model Architecture and Training

The study modifies the xBD baseline model by enhancing a ResNet50 architecture with:
- Additional convolution, max-pooling, and batch normalization layers
- A dropout layer to reduce overfitting
- Longer training epochs with early stopping
Building images were clipped from satellite imagery using bounding boxes expanded to capture surrounding context.

Conclusions

Deep learning can effectively classify building damage from satellite imagery if:

Datasets are balanced to reduce overfitting
Models use techniques like dropout, batch normalization, and weighted classes
Training is fine-tuned with early stopping
Disaster datasets are pooled only when beneficial

Improving automated damage classification enables faster response, better planning, and stronger resilience in the face of increasing climate-driven disasters.