Skip to content

Retrieving Satellite Imagery

This section describes the process of retrieving and preparing satellite imagery for use in machine learning models. Topics include:

  1. Data Sources: Overview of publicly available satellite imagery datasets.
  2. Image retrieval: How to retrieve the images.
  3. Preprocessing: Steps to clean and prepare the imagery for analysis.

Data sources

Allowed to use on comercial applications:

NOT allowed for commercial purposes:

  • MapTiller tiles
  • Google Maps Data
  • Open Street Map (only if the resulting dataset is made public)

Image retrieval

We have a script that retrieve aerial imagery from the allowed providers above. To use it, you'll need all the geometries in a single folder.

For the steps below, we'll use ./datasets/test_dataset/ as the base folder.

  1. Create the dataset folder:
mkdir datasets/test_dataset
mkdir datasets/test_dataset/site_boundaries
  1. Put all GeoJSON boundary files inside the datasets/test_dataset/site_boundaries folder.
  2. Run the image retrieval script:
python scripts/image_retrieval.py --provider naip --zoom 18 ./datasets/test_dataset/site_boundaries ./datasets/test_dataset/naip_imagery --workers 32

Check the image_retrieval.py documentation for more details.

Preprocessing

After the previous step completes, you'll have two folders:

  • One containing all GeoJSON with site boundaries.
  • Another containing geo-located TIFF images of the site + some buffer.

That is not the format used by the training pipeline we are using. Machine learning models don't understand what a TIFF image or a GeoJSON are, so we need to convert it to a format it is able to understand and use.

The COCO dataset format is one of the many industry standard formats for ML training and image recognition.

To convert our geospatial data to a COCO dataset JSON file + JPEGs, follow these instructions:

  1. Create a temporary folder and move all GeoJSONs and GeoTIFF inside.
  2. Run the following script: convert_geotiff_and_polygon_to_coco.py
python scripts/convert_geotiff_and_polygon_to_coco.py --input_dir datasets/test_dataset/MY_TMP_DIR --output_dir datasets/test_dataset/coco_dataset

The output should be JPEG files + a JSON file with all the site annotations. These can be used to directly train the networks or for import on Roboflow, for revision and improving annotations.