Implementing product thinking for satellite imagery datasets

Transitioning from centralized raster warehouses to domain-scoped data products shifts the primary failure vector from storage capacity to metadata contract enforcement. This guide isolates a single, high-impact implementation step: configuring a versioned, domain-bound SpatioTemporal Asset Catalog (STAC) ingestion pipeline that treats multispectral satellite imagery as a consumable product rather than a raw archive. The workflow operationalizes Product Thinking for GIS Datasets by enforcing strict spatial scoping rules, automated schema validation, and deterministic rollback paths.

Architectural Baseline: Domain-Scoped Ingestion

In legacy monolithic systems, raster data is ingested into shared buckets with implicit coordinate reference system (CRS) assumptions and deferred cataloging. The Geospatial Data Mesh Fundamentals paradigm inverts this by treating each spatial domain as an autonomous product boundary. Under this model, Data Mesh vs Traditional GIS Architecture distinctions manifest in explicit contract enforcement at the ingestion edge rather than post-hoc ETL reconciliation.

Spatial Domain Boundary Design dictates that every tile must be validated against a deterministic polygon envelope before entering the product catalog. This prevents silent CRS drift and ensures downstream consumers receive spatially consistent assets. The pipeline must apply Scoping Rules for Spatial Products at the ingestion layer, rejecting tiles that fall outside the declared boundary_wkt or lack mandatory projection extensions.

Declarative Pipeline Configuration

The tactical objective is to deploy a declarative configuration that binds Sentinel-2 L2A tiles to a specific spatial domain while publishing them as versioned data products. This requires coupling rio-tiler tiling parameters with pystac validation hooks.

yaml
# product-config.yaml
spatial_domain:
  crs: "EPSG:4326"
  boundary_wkt: "POLYGON((-122.5 37.7, -122.3 37.7, -122.3 37.9, -122.5 37.9, -122.5 37.7))"
product_metadata:
  product_id: "sentinel2-msi-l2a-v2.1"
  version_strategy: "semantic"
  lifecycle_stage: "production"
ingestion:
  driver: "rio-tiler"
  tile_size: 256
  resampling: "bilinear"
  validation_hook: "pystac_validator --strict"
  output_format: "cloud-optimized-geotiff"
  checksum_algorithm: "sha256"

Idempotent Execution & Atomic Commit

Idempotency is enforced through pre-flight validation, content-addressable storage, and atomic staging. The pipeline must never mutate existing catalog entries without explicit version increments.

Execute the pipeline with explicit dry-run validation before committing to the production bucket:

bash
stac-ingest run \
  --config product-config.yaml \
  --dry-run=true \
  --output-bucket s3://geo-mesh-prod/sentinel2/ \
  --log-level DEBUG \
  --checksum-verify

The --dry-run=true flag triggers schema validation, spatial envelope intersection checks, and hash computation without writing to the target bucket. If the dry-run passes, remove the flag and append --commit to trigger atomic writes via multipart upload with server-side encryption:

bash
stac-ingest run \
  --config product-config.yaml \
  --commit \
  --output-bucket s3://geo-mesh-prod/sentinel2/ \
  --log-level INFO

Diagnostic Workflow: Metadata Contract Drift

When the pipeline throws STACValidationError: Missing required field 'proj:geometry', the root cause typically traces to upstream gdalwarp operations stripping projection metadata during domain clipping. Traditional architectures mask this by relying on implicit CRS assumptions; domain-driven ingestion requires explicit contract validation. Metadata Cataloging for Raster/Vector workflows must intercept this drift before catalog publication.

Execute the following diagnostic sequence to isolate the drift:

bash
# 1. Inspect raw tile metadata for projection extensions
rio info --format json s3://raw-bucket/tile_123.tif | jq '.proj'

# 2. Validate STAC item schema against strict spec
pystac validate s3://geo-mesh-prod/sentinel2/items/tile_123.json --strict

# 3. Trace clipping command history for missing creation options
grep "gdalwarp" /var/log/ingest/pipeline.log | tail -n 5

If proj returns null, the clipping step bypassed -co "PROFILE=GeoTIFF". Remediate by injecting explicit creation options into the warp call. Refer to the official GDAL Warp Documentation for creation option specifications:

bash
gdalwarp \
  -t_srs EPSG:4326 \
  -te -122.5 37.7 -122.3 37.9 \
  -co "TILED=YES" -co "COMPRESS=DEFLATE" -co "PROFILE=GeoTIFF" \
  s3://raw-bucket/tile_123.tif s3://staging/tile_123_clipped.tif

Re-run validation. If the error persists, verify that the proj:geometry field is populated with a valid GeoJSON geometry object matching the clipped extent.

Spatial Product Versioning & Lifecycle Enforcement

Spatial Product Versioning Strategies must align with semantic versioning (MAJOR.MINOR.PATCH) to guarantee deterministic consumer upgrades. Major increments require schema-breaking changes (e.g., CRS migration). Minor increments denote additive metadata or processing improvements. Patch increments cover reprocessing of corrupted tiles without altering the spatial contract.

Spatial Product Lifecycle Management dictates that assets transition through stagingvalidationproduction states. Promotion to production requires:

  1. Successful pystac validate --strict execution
  2. SHA-256 checksum match between source and staged COG
  3. Explicit domain owner sign-off via governance webhook

The STAC specification mandates strict adherence to extension schemas. Consult the official SpatioTemporal Asset Catalog Specification for required fields and extension compliance matrices.

Cross-Team Governance & Escalation Paths

Cross-Team Governance Workflows require automated routing of validation failures to the appropriate ownership tier. The following escalation matrix standardizes incident response:

Severity Trigger Pattern Automated Action Escalation Path
P3 STACValidationError: Missing optional field Log warning, proceed with --allow-missing Domain Data Steward (Slack #geo-catalog)
P2 proj:geometry null or mismatched CRS Halt pipeline, quarantine tile in s3://geo-mesh-quarantine/ Platform Engineer + GIS Steward (Jira GEO-INGEST)
P1 boundary_wkt intersection failure or checksum mismatch Rollback staging, invalidate cache, trigger re-clipping Architecture Review Board + Incident Commander

Idempotent rollback is executed via:

bash
stac-ingest rollback \
  --product-id sentinel2-msi-l2a-v2.1 \
  --version 2.1.3 \
  --target-bucket s3://geo-mesh-prod/sentinel2/ \
  --confirm

This command removes the failed version from the catalog index, restores the previous manifest, and emits a PRODUCT_ROLLBACK_COMPLETE event to the governance bus. All pipeline executions must be logged with structured JSON output for audit trails and drift analysis.