s3-orchestrator

Background Services

Background Services

The orchestrator runs a set of long-running background workers that keep the metadata layer consistent with the backends, handle replication / cleanup / lifecycle, and refresh observability state. All locked tasks apply a random startup jitter of up to half the tick interval before the first tick, preventing thundering herd on the advisory lock when multiple instances start simultaneously.

Worker reference

TaskIntervalAdvisory LockDescription
Usage flush + metricsconfigurable (default 30s)When Redis configuredFlushes usage counters to PostgreSQL, then refreshes quota stats, usage baselines, object counts, and multipart counts. Updates Prometheus gauges. Adaptive mode shortens interval near limits. Advisory lock is acquired whenever Redis is configured (regardless of health) to prevent double-counting during recovery.
Stale multipart cleanup1hYesAborts multipart uploads older than 24h and deletes their temporary part objects.
Cleanup queue1mYesRetries failed backend object deletions with exponential backoff (1m to 24h, max 10 attempts). On the tenth consecutive failure the row graduates to cleanup_dlq for operator action; orphan_bytes stays incremented because the bytes are still on disk.
Rebalancerconfigurable (default 6h)YesMoves objects between backends per strategy. Only runs when enabled.
Replicatorconfigurable (default 5m)YesCreates copies of under-replicated objects. Only runs when factor > 1. Runs once at startup.
Over-replication cleanerconfigurable (default 5m)YesRemoves excess copies of objects that exceed the replication factor. Only runs when factor > 1.
Lifecycle1hYesDeletes objects matching lifecycle rules whose created_at exceeds expiration_days. Only runs when rules are configured.
Reconcilerconfigurable (default 24h)YesScans each backend for untracked objects and imports them into the metadata database via SyncBackend. Only runs when reconcile.enabled: true.
Pending reaperconfigurable (default 1m)YesResolves PUT-before-COMMIT intents that survived a failed metadata commit. HEADs the destination backend and either promotes the row into object_locations (object present) or drops the intent (object absent). Skips intents younger than min_age so in-flight PUTs are not interrupted.
Scrubberconfigurable (default 6h)YesRandom-samples objects, fetches and re-hashes them, and enqueues a cleanup if the stored content_hash does not match. Only runs when integrity.enabled: true and scrubber_interval > 0.
Notification drainer5sNoDrains notification_outbox rows by POSTing CloudEvents JSON to configured webhook endpoints. Optional HMAC signing per endpoint.
CB watchdog1mNoChecks all circuit breakers for stale half-open probes. If a probe has been in flight longer than 2 minutes, resets the circuit to open so a new probe can be dispatched. Prevents circuits from getting stuck half-open when traffic stops.

Concurrency

Background services (rebalancer, replicator, over-replication cleaner, cleanup queue) share the admission semaphore with HTTP requests, so max_concurrent_requests is the total budget for both HTTP and background backend operations.

Multi-instance behavior

The advisory locks are PostgreSQL session-scoped — if one instance holds the lock for a task, other instances skip that tick silently. Each task runs on exactly one instance at any moment. The Notification drainer and CB watchdog are not locked because they’re idempotent and per-instance state is acceptable for them.

See docs/deployment.md for the multi-instance deployment model.