Implementing product thinking for satellite imagery datasets
Transitioning from centralized raster warehouses to domain-scoped data products shifts the primary failure vector from storage capacity to metadata contract enforcement. This guide isolates a single, high-impact implementation step: configuring a versioned, domain-bound SpatioTemporal Asset Catalog (STAC) ingestion pipeline that treats multispectral satellite imagery as a consumable product rather than a raw archive. The workflow operationalizes Product Thinking for GIS Datasets by enforcing strict spatial scoping rules, automated schema validation, and deterministic rollback paths.
Architectural Baseline: Domain-Scoped Ingestion
In legacy monolithic systems, raster data is ingested into shared buckets with implicit coordinate reference system (CRS) assumptions and deferred cataloging. The Geospatial Data Mesh Fundamentals paradigm inverts this by treating each spatial domain as an autonomous product boundary. Under this model, Data Mesh vs Traditional GIS Architecture distinctions manifest in explicit contract enforcement at the ingestion edge rather than post-hoc ETL reconciliation.
Spatial Domain Boundary Design dictates that every tile must be validated against a deterministic polygon envelope before entering the product catalog. This prevents silent CRS drift and ensures downstream consumers receive spatially consistent assets. The pipeline must apply Scoping Rules for Spatial Products at the ingestion layer, rejecting tiles that fall outside the declared boundary_wkt or lack mandatory projection extensions.
Declarative Pipeline Configuration
The tactical objective is to deploy a declarative configuration that binds Sentinel-2 L2A tiles to a specific spatial domain while publishing them as versioned data products. This requires coupling rio-tiler tiling parameters with pystac validation hooks.
# product-config.yaml
spatial_domain:
crs: "EPSG:4326"
boundary_wkt: "POLYGON((-122.5 37.7, -122.3 37.7, -122.3 37.9, -122.5 37.9, -122.5 37.7))"
product_metadata:
product_id: "sentinel2-msi-l2a-v2.1"
version_strategy: "semantic"
lifecycle_stage: "production"
ingestion:
driver: "rio-tiler"
tile_size: 256
resampling: "bilinear"
validation_hook: "pystac_validator --strict"
output_format: "cloud-optimized-geotiff"
checksum_algorithm: "sha256"
Idempotent Execution & Atomic Commit
Idempotency is enforced through pre-flight validation, content-addressable storage, and atomic staging. The pipeline must never mutate existing catalog entries without explicit version increments.
Execute the pipeline with explicit dry-run validation before committing to the production bucket:
stac-ingest run \
--config product-config.yaml \
--dry-run=true \
--output-bucket s3://geo-mesh-prod/sentinel2/ \
--log-level DEBUG \
--checksum-verify
The --dry-run=true flag triggers schema validation, spatial envelope intersection checks, and hash computation without writing to the target bucket. If the dry-run passes, remove the flag and append --commit to trigger atomic writes via multipart upload with server-side encryption:
stac-ingest run \
--config product-config.yaml \
--commit \
--output-bucket s3://geo-mesh-prod/sentinel2/ \
--log-level INFO
Diagnostic Workflow: Metadata Contract Drift
When the pipeline throws STACValidationError: Missing required field 'proj:geometry', the root cause typically traces to upstream gdalwarp operations stripping projection metadata during domain clipping. Traditional architectures mask this by relying on implicit CRS assumptions; domain-driven ingestion requires explicit contract validation. Metadata Cataloging for Raster/Vector workflows must intercept this drift before catalog publication.
Execute the following diagnostic sequence to isolate the drift:
# 1. Inspect raw tile metadata for projection extensions
rio info --format json s3://raw-bucket/tile_123.tif | jq '.proj'
# 2. Validate STAC item schema against strict spec
pystac validate s3://geo-mesh-prod/sentinel2/items/tile_123.json --strict
# 3. Trace clipping command history for missing creation options
grep "gdalwarp" /var/log/ingest/pipeline.log | tail -n 5
If proj returns null, the clipping step bypassed -co "PROFILE=GeoTIFF". Remediate by injecting explicit creation options into the warp call. Refer to the official GDAL Warp Documentation for creation option specifications:
gdalwarp \
-t_srs EPSG:4326 \
-te -122.5 37.7 -122.3 37.9 \
-co "TILED=YES" -co "COMPRESS=DEFLATE" -co "PROFILE=GeoTIFF" \
s3://raw-bucket/tile_123.tif s3://staging/tile_123_clipped.tif
Re-run validation. If the error persists, verify that the proj:geometry field is populated with a valid GeoJSON geometry object matching the clipped extent.
Spatial Product Versioning & Lifecycle Enforcement
Spatial Product Versioning Strategies must align with semantic versioning (MAJOR.MINOR.PATCH) to guarantee deterministic consumer upgrades. Major increments require schema-breaking changes (e.g., CRS migration). Minor increments denote additive metadata or processing improvements. Patch increments cover reprocessing of corrupted tiles without altering the spatial contract.
Spatial Product Lifecycle Management dictates that assets transition through staging → validation → production states. Promotion to production requires:
- Successful
pystac validate --strictexecution - SHA-256 checksum match between source and staged COG
- Explicit domain owner sign-off via governance webhook
The STAC specification mandates strict adherence to extension schemas. Consult the official SpatioTemporal Asset Catalog Specification for required fields and extension compliance matrices.
Cross-Team Governance & Escalation Paths
Cross-Team Governance Workflows require automated routing of validation failures to the appropriate ownership tier. The following escalation matrix standardizes incident response:
| Severity | Trigger Pattern | Automated Action | Escalation Path |
|---|---|---|---|
| P3 | STACValidationError: Missing optional field |
Log warning, proceed with --allow-missing |
Domain Data Steward (Slack #geo-catalog) |
| P2 | proj:geometry null or mismatched CRS |
Halt pipeline, quarantine tile in s3://geo-mesh-quarantine/ |
Platform Engineer + GIS Steward (Jira GEO-INGEST) |
| P1 | boundary_wkt intersection failure or checksum mismatch |
Rollback staging, invalidate cache, trigger re-clipping | Architecture Review Board + Incident Commander |
Idempotent rollback is executed via:
stac-ingest rollback \
--product-id sentinel2-msi-l2a-v2.1 \
--version 2.1.3 \
--target-bucket s3://geo-mesh-prod/sentinel2/ \
--confirm
This command removes the failed version from the catalog index, restores the previous manifest, and emits a PRODUCT_ROLLBACK_COMPLETE event to the governance bus. All pipeline executions must be logged with structured JSON output for audit trails and drift analysis.