Retrieving Satellite Imagery
This section describes the process of retrieving and preparing satellite imagery for use in machine learning models. Topics include:
- Data Sources: Overview of publicly available satellite imagery datasets.
- Image retrieval: How to retrieve the images.
- Preprocessing: Steps to clean and prepare the imagery for analysis.
Data sources
Allowed to use on comercial applications:
- USGS NAIP:
- Raw files: https://nrcs.app.box.com/v/naip/folder/294895386975
- Tile Server: https://gis.apfo.usda.gov/arcgis/rest/services/NAIP/USDA_CONUS_PRIME/
- Max zoom level: 18
- ESRI (need to confirm license!):
- Max zoom level: 19
- Tile server: https://server.arcgisonline.com/ArcGIS/rest/services/World_Imagery
NOT allowed for commercial purposes:
- MapTiller tiles
- Google Maps Data
- Open Street Map (only if the resulting dataset is made public)
Image retrieval
We have a script that retrieve aerial imagery from the allowed providers above. To use it, you'll need all the geometries in a single folder.
For the steps below, we'll use ./datasets/test_dataset/ as the base folder.
- Create the dataset folder:
mkdir datasets/test_dataset
mkdir datasets/test_dataset/site_boundaries- Put all GeoJSON boundary files inside the
datasets/test_dataset/site_boundariesfolder. - Run the image retrieval script:
python scripts/image_retrieval.py --provider naip --zoom 18 ./datasets/test_dataset/site_boundaries ./datasets/test_dataset/naip_imagery --workers 32Check the image_retrieval.py documentation for more details.
Preprocessing
After the previous step completes, you'll have two folders:
- One containing all GeoJSON with site boundaries.
- Another containing geo-located TIFF images of the site + some buffer.
That is not the format used by the training pipeline we are using. Machine learning models don't understand what a TIFF image or a GeoJSON are, so we need to convert it to a format it is able to understand and use.
The COCO dataset format is one of the many industry standard formats for ML training and image recognition.
To convert our geospatial data to a COCO dataset JSON file + JPEGs, follow these instructions:
- Create a temporary folder and move all GeoJSONs and GeoTIFF inside.
- Run the following script: convert_geotiff_and_polygon_to_coco.py
python scripts/convert_geotiff_and_polygon_to_coco.py --input_dir datasets/test_dataset/MY_TMP_DIR --output_dir datasets/test_dataset/coco_datasetThe output should be JPEG files + a JSON file with all the site annotations. These can be used to directly train the networks or for import on Roboflow, for revision and improving annotations.