s3-orchestrator

Monitoring

Monitoring

The orchestrator emits three observability streams:

  • Prometheus metrics/metrics (95+ metrics under the s3o_ prefix); see the full reference table below.
  • OpenTelemetry traces — every HTTP request, manager operation, and backend S3 call; OTLP-exported to Tempo or compatible collector.
  • Structured JSON logsslog to stdout, including a dedicated audit subset with "audit": true.

A ready-to-import Grafana dashboard covering every metric ships at grafana/s3-orchestrator.json.

Web dashboard

When ui.enabled is true, the dashboard at {path}/ shows a live snapshot of:

  • Storage summary — total bytes used/capacity across all backends
  • Backend quota — bytes used/limit with progress bars per backend, object counts, active multipart uploads
  • Monthly usage — API requests, egress, and ingress per backend with limits
  • Object tree — interactive collapsible file browser. Buckets and directories are collapsed by default; click to expand. Each directory shows a rollup file count and total size.
  • Configuration — virtual bucket names, write routing strategy, replication factor, rebalance status, rate limiting, encryption status
  • Logs — recent structured log output from an in-memory ring buffer (last 5,000 entries). Filter by severity level and search by text. Logs are available immediately on page load — no need to SSH into the host.

The dashboard also provides management actions:

  • Upload — upload files to any virtual bucket directly from the browser (up to 512 MiB per file)
  • Download — download individual objects by clicking the download icon on any file in the tree
  • Delete — delete individual objects by clicking the delete icon on any file in the tree
  • Rebalance — trigger an on-demand rebalance using the configured strategy and settings
  • Clean Excess — remove over-replicated copies that exceed the replication factor
  • Sync — import pre-existing objects from a backend’s S3 bucket into the proxy database. Select a backend and a virtual bucket — objects already in the database are skipped, and objects belonging to other virtual buckets are excluded.

On-demand reconciliation

When a backend loses data (expired credentials, provider outage, accidental deletion), the metadata database retains stale entries that cause log noise in the rebalancer, replicator, and scrubber. The reconcile endpoint walks both the backend (paginated ListObjects) and the metadata DB (paginated cursor over object_locations ordered by key) as ascending key streams, then merges them in lockstep. Memory is bounded by a 1000-entry page size on each side regardless of total object count, so a backend holding millions of objects reconciles without OOM. Keys belonging to sibling virtual buckets stored on the same backend are skipped — each virtual bucket reconciles in its own pass.

# Reconcile all backends
curl -X POST -H "X-Admin-Token: $TOKEN" http://localhost:9000/admin/api/reconcile

# Reconcile a single backend
curl -X POST -H "X-Admin-Token: $TOKEN" http://localhost:9000/admin/api/reconcile?backend=g3

Response:

{"status":"ok","imported":0,"removed":57,"backends_scanned":1}
  • imported: objects on the backend but not in the DB (brought under management)
  • removed: DB entries whose objects no longer exist on the backend (stale metadata cleaned up)
  • backends_scanned: how many backends were processed

The background reconciler (reconcile.enabled: true) runs the same logic on a timer. The admin endpoint is for immediate use after incidents.

The dashboard requires authentication. Users log in at {path}/login with the admin_key and admin_secret configured in the ui section. Sessions last 24 hours.

The dashboard is server-rendered HTML. The object tree uses JavaScript for lazy-loaded directory expansion — directories fetch their children on click via the /ui/api/tree endpoint.

JSON endpoints at {path}/api/dashboard, {path}/api/tree, and {path}/api/logs return data for programmatic access or integration with other tools. The logs endpoint accepts optional query parameters: level (minimum severity: DEBUG, INFO, WARN, ERROR), since (RFC3339 timestamp), component, and limit. Management endpoints ({path}/api/delete, {path}/api/upload, {path}/api/rebalance, {path}/api/clean-excess, {path}/api/sync) accept POST requests. The download endpoint ({path}/api/download?key=...) accepts GET requests. All API endpoints require authentication.

Health endpoints

Two health endpoints serve different purposes:

Liveness (/health) — always returns HTTP 200. Use this for liveness probes (Consul checks, K8s livenessProbe). The service stays in rotation during temporary database outages.

curl http://localhost:9000/health
# {"status":"ok"}
# or {"status":"degraded"} when the database circuit breaker is open

Readiness (/health/ready) — returns HTTP 200 when the service is ready to handle traffic, HTTP 503 during startup (before migrations and backend initialization complete) and during shutdown drain. Use this for readiness probes (K8s readinessProbe, Nomad on_update = "require_healthy").

curl http://localhost:9000/health/ready
# {"status":"ready"}      — 200
# {"status":"not ready"}  — 503

The HTTP response body is intentionally minimal — only the status field is returned, so log aggregators can grep on a fixed pattern. To identify which instance answered a probe in a multi-instance deployment, query GET /admin/api/workers (which includes the instance identifier) or correlate by source IP.

Background worker health

/health only reflects database breaker state. Background services (replicator, cleanup queue, lifecycle, pending reaper, …) are supervised by the lifecycle manager and recover on their own, but a service that is running yet failing every tick looks identical to a healthy one in /health.

GET /admin/api/workers returns a JSON snapshot of every registered background service’s last-tick health, including last_success, last_failure, last_error, and consecutive_failures. Use it during incidents to distinguish:

curl -H "X-Admin-Token: $TOKEN" http://localhost:9000/admin/api/workers
{
  "workers": [
    {"name": "cleanup_queue", "last_success": "2026-05-12T18:42:01Z", "consecutive_failures": 0},
    {"name": "replicator", "last_failure": "2026-05-12T18:45:14Z", "last_error": "connection refused", "consecutive_failures": 3}
  ]
}

Workers in proxy-only deployments return 503 from this endpoint because no worker pool is registered.

The same data flows into Prometheus as s3o_worker_last_success_timestamp_seconds, s3o_worker_consecutive_failures, and s3o_worker_ticks_total{result} so alerting can run without scraping the admin endpoint. Suggested alert shapes are in the metrics table below.

Grafana dashboard

A comprehensive Grafana dashboard is included at grafana/s3-orchestrator.json. Import it via Grafana’s UI (Dashboards → Import → Upload JSON file) or provision it from disk. It expects a Prometheus datasource with UID prometheus.

The dashboard covers all emitted metrics, organised by domain into rows: overview, quota & storage, request performance, backend operations, manager operations, circuit breaker & degraded mode, replication, usage tracking, rate limiting & rejections, rebalancer, drain & lifecycle, cleanup queue & audit, encryption, object data cache, integrity verification, Redis, over-replication cleanup, pending PUT intents, and authentication (streaming SigV4). Rows for less frequently inspected domains are collapsed by default.

Key Prometheus metrics

If telemetry.metrics.enabled is true, metrics are exposed at /metrics. Two listener modes:

  • Inline (telemetry.metrics.listen empty, the default): /metrics is served on the same listener as the public S3 API. Convenient for single-port deployments and the docker-compose / local-dev workflow. In this mode /debug/pprof/* endpoints are not registered — mounting pprof on the public S3 listener would leak runtime internals (de-anonymized stack frames via /debug/pprof/heap, the command line and flag values via /debug/pprof/cmdline) and offer a DoS amplifier (/debug/pprof/profile?seconds=300 triggers minutes of CPU profiling on demand).
  • Dedicated listener (telemetry.metrics.listen set, e.g. 127.0.0.1:9001): /metrics and /debug/pprof/* are both mounted on the dedicated listener. Bind to 127.0.0.1 or a private network interface so the surface is only reachable from operators inside the trust boundary. The nomad demo uses 0.0.0.0:9001 so the docker-compose Prometheus container can scrape via the bridge gateway; production deployments should tighten this to 127.0.0.1:9001 or a private network address.

Once the dedicated listener is configured, captures look like:

# 60-second allocation profile during a load spike
curl -o /tmp/allocs.pb.gz "http://127.0.0.1:9001/debug/pprof/allocs?seconds=60"
go tool pprof -top -cum -alloc_space /tmp/allocs.pb.gz

# Instantaneous live-heap snapshot
curl -o /tmp/heap.pb.gz "http://127.0.0.1:9001/debug/pprof/heap"
go tool pprof -top -cum -inuse_space /tmp/heap.pb.gz

The standard net/http/pprof endpoints are available: /debug/pprof/, /debug/pprof/profile, /debug/pprof/heap, /debug/pprof/allocs, /debug/pprof/goroutine, /debug/pprof/block, /debug/pprof/mutex, /debug/pprof/trace, /debug/pprof/cmdline, /debug/pprof/symbol.

Key metrics to alert on:

MetricWhat to watch
s3o_quota_bytes_available{backend="..."}Alert when approaching 0 — backend is almost full (accounts for orphan bytes)
s3o_quota_orphan_bytes{backend="..."}Elevated values mean backends have physically unreleased space from pending cleanups
s3o_circuit_breaker_state{name="database"}Alert when > 0 — database is unreachable (1=open, 2=half-open)
s3o_circuit_breaker_state{name="<backend>"}Alert when > 0 — backend is unreachable or credentials expired
s3o_replication_pendingAlert when consistently > 0 — replicas are falling behind
s3o_replication_health_copies_totalNon-zero means health-aware replication is creating replacement copies for circuit-broken backends
s3o_over_replication_pendingObjects with more copies than the replication factor — should return to 0 after cleanup runs
s3o_over_replication_errors_totalCleanup errors — indicates backends or metadata issues preventing excess copy removal
s3o_requests_total{status_code="5xx"}Alert on elevated 5xx rates
s3o_http_panic_recovered_total{route}Any non-zero rate is an alert: a handler panicked and the recovery middleware returned a 500. Pivot via the matching http.PanicRecovered audit entry for the captured stack and request id
s3o_degraded_write_rejections_totalWrites being rejected due to degraded mode
s3o_usage_limit_rejections_totalOperations rejected by usage limits
s3o_rate_limit_rejections_totalRequests rejected by per-IP rate limiting
s3o_admission_rejections_totalRequests rejected at the hard admission limit
s3o_load_shed_totalRequests probabilistically shed before the hard admission limit
s3o_early_rejections_totalUploads rejected before body transmission (no backend capacity)
s3o_list_pages_capped_totalNon-zero rate means real workloads are hitting the ListObjects page cap; profile before tuning
s3o_cleanup_queue_depthAlert when consistently > 0 — orphaned objects are failing cleanup
s3o_cleanup_queue_processed_total{status="exhausted"}Items that exceeded max retries — graduated to the DLQ
s3o_cleanup_dlq_depthAlert when > 0 — at least one unrecoverable orphan needs operator action
s3o_cleanup_dlq_enqueued_total{backend="..."}Rate of graduations per backend; one backend dominating means its delete path is broken
s3o_cleanup_queue_stale_claims_recovered_total{backend="..."}Non-zero rate means a worker died mid-process or cleanup_queue.claim_grace_period is shorter than realistic worst-case row processing time
s3o_cleanup_enqueue_failures_total{backend,reason,stage}Alert on any non-zero rate of stage="enqueue" — backend object exists with no cleanup_queue row (orphan-leak risk). Pivot via the matching storage.OrphanEnqueueFailed audit event for backend/key/size, then run POST /admin/api/reconcile after DB recovery
s3o_audit_events_total{event="..."}Audit log volume by event type — useful for detecting unusual activity
s3o_pending_intents_enqueued_totalRate of PUT intents inserted (write-path PUT-before-COMMIT pattern). Should track the PutObject rate closely; a sustained gap suggests the pending pattern is bypassed
s3o_pending_intents_resolved_total{status}Pending intents resolved by status (committed = atomic commit happy path, promoted = reaper found backend object and promoted the intent, dropped = reaper found HEAD 404 and dropped the intent, ambiguous = HEAD failed for non-404 reasons, already_resolved = race). A sustained ambiguous rate means the reaper cannot reach the backend
s3o_pending_intents_depthCurrent unresolved intents. Alert when consistently above the batch_size of the reaper — the reaper is not keeping up
s3o_drain_activeCount of in-flight backend drain operations (Inc/Dec so concurrent drains compose); 0 means no drains are running. Page on s3o_drain_active > 0 for 6h (drain stuck)
s3o_drain_race_aborted_totalPutObject attempts aborted after drain started mid-write. Any non-zero rate is benign (the orchestrator recovers automatically) but a sustained rate suggests longer-than-expected gaps between EligibleForWrite and the backend PUT — typically very large objects against a fast-draining backend
time() - s3o_worker_last_success_timestamp_seconds{service="..."}Alert when greater than the worker’s expected tick interval times a margin (e.g. > 4 * interval) — the service has not completed a successful tick in that window
s3o_worker_consecutive_failures{service="..."}Alert when consistently > 0 — the service is running but every tick fails; logs and /admin/api/workers carry the underlying error
rate(s3o_worker_ticks_total{result="error"}[15m])Persistent error rate; alongside consecutive_failures distinguishes flapping from sustained failure
s3o_auth_streaming_requests_total{variant}Rate of streaming-payload SigV4 PUTs by variant — track which client SDKs are sending streaming uploads
s3o_auth_streaming_rejections_total{reason}Alert on any non-zero rate — every increment is a chunk-validation failure (tampered body, malformed framing, length mismatch, or signature mismatch)
s3o_encryption_errors_total{op,error_type}Any non-zero rate indicates encryption/decryption failures. error_type="stream_failed" specifically flags transport errors that surfaced mid-stream (after the encryptor/decryptor was constructed).
s3o_encrypt_existing_objects_total{status="error"}Failures during bulk encryption of existing data
s3o_decrypt_existing_objects_total{status="error"}Failures during bulk decryption of existing data
s3o_key_rotation_objects_total{status="error"}Failures during key rotation
s3o_redis_fallback_activeAlert when 1 — Redis is unavailable, using local counters
s3o_redis_operations_total{operation,status}Track Redis operation success/error rates
s3o_cache_hits_total / s3o_cache_misses_totalCache hit ratio — low hit rate may indicate the cache is undersized or the workload is not read-heavy
s3o_cache_evictions_totalHigh eviction rate suggests max_size is too small for the working set
s3o_cache_size_bytes / s3o_cache_entriesCurrent cache utilization — watch for the cache staying near max_size

Full metric reference

All metrics are prefixed with s3o_. Exposed at /metrics when telemetry.metrics.enabled is true.

MetricTypeLabelsDescription
s3o_build_infoGaugeversion, go_versionBuild metadata
s3o_requests_totalCountermethod, status_codeHTTP request count
s3o_request_duration_secondsHistogrammethodRequest latency
s3o_request_size_bytesHistogrammethodUpload sizes
s3o_response_size_bytesHistogrammethodDownload sizes
s3o_inflight_requestsGaugemethodCurrently processing
s3o_backend_requests_totalCounteroperation, backend, statusBackend S3 API calls
s3o_backend_duration_secondsHistogramoperation, backendBackend latency
s3o_manager_requests_totalCounteroperation, backend, statusManager-level operations
s3o_manager_duration_secondsHistogramoperation, backendManager latency
s3o_quota_bytes_usedGaugebackendCurrent bytes used
s3o_quota_bytes_limitGaugebackendQuota limit
s3o_quota_orphan_bytesGaugebackendBytes reserved by pending cleanup items
s3o_quota_bytes_availableGaugebackendRemaining space (limit − used − orphan)
s3o_objects_countGaugebackendStored object count
s3o_active_multipart_uploadsGaugebackendIn-progress uploads
s3o_rebalance_objects_moved_totalCounterstrategy, statusObjects moved by rebalancer
s3o_rebalance_bytes_moved_totalCounterstrategyBytes moved by rebalancer
s3o_rebalance_runs_totalCounterstrategy, statusRebalancer executions
s3o_rebalance_duration_secondsHistogramstrategyRebalancer execution time
s3o_rebalance_skipped_totalCounterreasonRebalancer runs skipped
s3o_rebalance_pendingGaugeObjects planned for rebalance
s3o_replication_pendingGaugeObjects below replication factor
s3o_replication_copies_created_totalCounterReplica copies created
s3o_replication_errors_totalCounterReplication errors
s3o_replication_duration_secondsHistogramReplication cycle time
s3o_replication_runs_totalCounterstatusReplication worker executions
s3o_replication_health_copies_totalCounterCopies created to replace copies on circuit-broken backends
s3o_over_replication_pendingGaugeObjects exceeding the replication factor
s3o_over_replication_removed_totalCounterExcess copies removed
s3o_over_replication_errors_totalCounterOver-replication cleanup errors
s3o_over_replication_runs_totalCounterstatusOver-replication worker executions
s3o_over_replication_duration_secondsHistogramOver-replication cleanup cycle time
s3o_circuit_breaker_stateGaugename0=closed, 1=open, 2=half-open (name: “database” or backend name)
s3o_circuit_breaker_transitions_totalCountername, from, toState transitions per component
s3o_degraded_reads_totalCounteroperationBroadcast reads in degraded mode
s3o_degraded_cache_hits_totalCounterCache hits during degraded reads
s3o_degraded_write_rejections_totalCounteroperationWrites rejected in degraded mode
s3o_usage_api_requestsGaugebackendCurrent month API request count
s3o_usage_egress_bytesGaugebackendCurrent month egress bytes
s3o_usage_ingress_bytesGaugebackendCurrent month ingress bytes
s3o_usage_limit_rejections_totalCounteroperation, limit_typeOperations rejected by usage limits
s3o_cleanup_queue_enqueued_totalCounterreasonItems added to the cleanup retry queue
s3o_cleanup_queue_processed_totalCounterstatusItems processed from the cleanup queue (success/success_absent/retry/exhausted)
s3o_cleanup_queue_depthGaugeCurrent pending items in the cleanup queue
s3o_cleanup_queue_stale_claims_recovered_totalCounterbackendCleanup_queue rows whose stale claim was reclaimed by a later worker tick
s3o_cleanup_dlq_depthGaugeUnrecoverable orphans waiting in the cleanup dead-letter table
s3o_cleanup_dlq_enqueued_totalCounterbackendCleanup rows graduated to the dead-letter after exhausting retries
s3o_cleanup_enqueue_failures_totalCounterbackend, reason, stageCleanup-queue enqueue attempts that failed after a successful backend write (orphan risk)
s3o_pending_intents_enqueued_totalCounterIn-flight PUT intents inserted before the backend write
s3o_pending_intents_resolved_totalCounterstatusPending PUT intents resolved by status (committed, promoted, dropped, ambiguous, already_resolved)
s3o_pending_intents_depthGaugeCurrent number of unresolved pending PUT intents
s3o_rate_limit_rejections_totalCounterRequests rejected by per-IP rate limiting
s3o_admission_rejections_totalCounterRequests rejected by server-level admission control
s3o_list_pages_capped_totalCounterListObjects calls that exited at the per-request page cap
s3o_lifecycle_deleted_totalCounterObjects deleted by lifecycle expiration
s3o_lifecycle_failed_totalCounterObjects that failed lifecycle deletion
s3o_lifecycle_runs_totalCounterstatusLifecycle worker executions
s3o_worker_ticks_totalCounterservice, resultLocked-ticker service ticks by service and outcome
s3o_worker_last_success_timestamp_secondsGaugeserviceUnix time of the most recent successful tick per service
s3o_worker_consecutive_failuresGaugeserviceConsecutive failed ticks per service since the last success
s3o_audit_events_totalCountereventAudit log entries emitted
s3o_drain_activeGaugeCount of in-flight backend drain operations
s3o_drain_objects_moved_totalCounterObjects migrated during drain
s3o_drain_bytes_moved_totalCounterBytes migrated during drain
s3o_drain_race_aborted_totalCounterPutObject attempts aborted after drain started mid-write
s3o_encryption_operations_totalCounteropEncrypt/decrypt operations
s3o_encryption_errors_totalCounterop, error_typeEncryption/decryption failures
s3o_encryption_unknown_key_id_totalCounterDecryption attempts with unknown keyID (primary key fallback)
s3o_encrypt_existing_objects_totalCounterstatusObjects processed by encrypt-existing
s3o_decrypt_existing_objects_totalCounterstatusObjects processed by decrypt-existing
s3o_key_rotation_objects_totalCounterstatusDEKs re-wrapped by key rotation
s3o_redis_operations_totalCounteroperation, statusRedis command outcomes
s3o_redis_fallback_activeGauge1 when Redis is unavailable and using local counters
s3o_cache_hits_totalCounterObject data cache hits
s3o_cache_misses_totalCounterObject data cache misses
s3o_cache_evictions_totalCounterObject data cache evictions (LRU or TTL)
s3o_cache_size_bytesGaugeCurrent memory used by cached objects
s3o_cache_entriesGaugeCurrent number of cached objects
s3o_cache_flush_totalCounterAdmin-triggered object data cache flushes
s3o_cache_admin_invalidations_totalCounterAdmin-triggered single-key cache invalidations
s3o_integrity_checks_totalCounteroperationIntegrity hash verifications performed (read, scrub)
s3o_integrity_errors_totalCounteroperationHash mismatches detected (corrupted copies enqueued for cleanup)
s3o_auth_streaming_requests_totalCountervariantStreaming-payload SigV4 PUTs by variant
s3o_auth_streaming_rejections_totalCounterreasonChunk-validation failures (tampered body, signature mismatch, etc.)
s3o_notification_outbox_depthGaugePending webhook events queued for delivery
s3o_notifications_delivered_totalCounterendpointSuccessfully POSTed webhook events
s3o_notifications_failed_totalCounterendpointWebhook POST failures (before retry)
s3o_notifications_dropped_totalCounterendpointWebhook events dropped after exhausting max_retries
s3o_http_panic_recovered_totalCounterrouteHandler panics caught by the recovery middleware

Quota metrics are refreshed from PostgreSQL every 30 seconds (no backend API calls).

OpenTelemetry tracing

Spans are emitted for every HTTP request, manager operation, and backend S3 call. The service registers as s3-orchestrator (resource.service.name). Traces propagate via W3C traceparent headers. Configured to export via gRPC OTLP to Tempo or any OTLP-compatible collector.

Trace-to-log correlation — every JSON log line emitted within an active span automatically includes trace_id and span_id fields. Log aggregators (Loki, etc.) can use these fields to link logs to their corresponding traces in Tempo or any OpenTelemetry-compatible tracing backend. Only log calls that receive a context.Context with an active span include trace context; application-level logs without a span context are unaffected.

Structured logs

All logs are JSON to stdout. Key fields: msg, level, error, backend, operation.

Audit logs are a subset of structured logs with "audit": true. Every S3 API request and significant internal operation emits an audit entry with a request_id for correlation. Filter audit entries in your log pipeline with a JSON query on the audit field.

Key audit events:

EventSourceDescription
s3.PutObject, s3.GetObject, s3.DeleteObjects, etc.HTTP layerS3 API request with method, path, bucket, status, duration
storage.PutObject, storage.GetObject, storage.DeleteObjects, etc.Storage layerBackend operation with key, backend name, size
rebalance.start, rebalance.move, rebalance.completeRebalancerObject redistribution runs
replication.start, replication.copy, replication.completeReplicatorReplica creation runs
storage.MultipartCleanupMultipart cleanupStale upload cleanup
cleanup_queue.processedCleanup queueOrphaned object successfully deleted on retry
cleanup_queue.already_absentCleanup queueBackend DELETE returned 404 — the object is already gone. Row dropped as idempotent success instead of retrying nine more times
cleanup_queue.claim_recoveredCleanup queueA row whose claim aged past the grace period was reclaimed by a different worker tick (typical after a process crash or rolling-deploy overlap)
cleanup_queue.exhausted_to_dlqCleanup queueRow graduated to cleanup_dlq after exhausting retries
over_replication.start, over_replication.complete, over_replication.removeOver-replication cleanerSurplus replica removal cycle; .remove carries the per-copy decision
pending_reaper.promotedPending reaperThe reaper HEADed the backend, found the object, and promoted the pending intent into a committed object_locations row
pending_reaper.droppedPending reaperThe reaper HEADed the backend and got 404. No orphan exists; the intent row is dropped
pending_reaper.supersededPending reaperA later write for the same key completed and superseded the pending intent before the reaper resolved it
storage.OrphanEnqueueFailedCoordinatorThe cleanup-queue enqueue path itself failed after a successful backend write (DB outage). Carries backend / key / size / stage so an operator can reconcile manually once DB connectivity returns
storage.UploadPartMultipartA multipart part upload completed successfully
http.PanicRecoveredHTTP panic-recovery middlewareA handler panicked and the recovery layer returned a 500 to the client. Carries route, method, path, and the panic value. The matching error-level slog entry carries the captured stack trace

Each S3 API request produces two correlated audit entries (HTTP-level and storage-level) sharing the same request_id. Internal operations (rebalance, replication) generate their own correlation IDs. The request_id also appears as a s3o.request_id attribute on OpenTelemetry spans.

HTTP panic recovery

A panic inside an HTTP handler is caught by the panic-recovery middleware applied to the S3 and admin route groups. The recovery contract:

  • The client gets a structured response, not a connection reset. S3 routes receive an XML <Error><Code>InternalError</Code><Message>...Request ID: ...</Message></Error> body with HTTP 500; admin routes receive the same shape as a JSON {"error": "...Request ID: ..."}.
  • The Prometheus counter s3o_http_panic_recovered_total{route} increments. A non-zero value is an immediate alert candidate.
  • A slog.ErrorContext line is written at level ERROR with component=httputil, the route, the method and path, the panic value, the captured Go stack, and a trace_id / span_id if a span was active when the panic occurred.
  • An http.PanicRecovered audit entry is emitted with the same correlation request_id as the failing request.
  • If an OpenTelemetry span was active on the request, it is marked as failed via SetStatus(Error) + RecordError so traces in Tempo highlight the failure.

UI routes are intentionally not wrapped in the first iteration (mounted on the same mux as the S3 catch-all; the bulk of panic risk is on S3 and admin anyway). The recovery message deliberately does not echo the panic value to the client; only the request ID is returned so support tickets can be correlated back to the orchestrator log line.

Clients can supply their own correlation ID via the X-Request-Id request header; otherwise the orchestrator generates one. The ID is returned in the X-Amz-Request-Id response header.

Trace-to-log correlation — JSON log output includes trace_id and span_id fields on every line emitted within an active OpenTelemetry span. Log aggregators like Grafana Loki can extract these fields to link directly from a log entry to the corresponding trace in Tempo, and vice versa.