
Background Services
Background Services
The orchestrator runs a set of long-running background workers that keep the metadata layer consistent with the backends, handle replication / cleanup / lifecycle, and refresh observability state. All locked tasks apply a random startup jitter of up to half the tick interval before the first tick, preventing thundering herd on the advisory lock when multiple instances start simultaneously.
Worker reference
| Task | Interval | Advisory Lock | Description |
|---|---|---|---|
| Usage flush + metrics | configurable (default 30s) | When Redis configured | Flushes usage counters to PostgreSQL, then refreshes quota stats, usage baselines, object counts, and multipart counts. Updates Prometheus gauges. Adaptive mode shortens interval near limits. Advisory lock is acquired whenever Redis is configured (regardless of health) to prevent double-counting during recovery. |
| Stale multipart cleanup | 1h | Yes | Aborts multipart uploads older than 24h and deletes their temporary part objects. |
| Cleanup queue | 1m | Yes | Retries failed backend object deletions with exponential backoff (1m to 24h, max 10 attempts). On the tenth consecutive failure the row graduates to cleanup_dlq for operator action; orphan_bytes stays incremented because the bytes are still on disk. |
| Rebalancer | configurable (default 6h) | Yes | Moves objects between backends per strategy. Only runs when enabled. |
| Replicator | configurable (default 5m) | Yes | Creates copies of under-replicated objects. Only runs when factor > 1. Runs once at startup. |
| Over-replication cleaner | configurable (default 5m) | Yes | Removes excess copies of objects that exceed the replication factor. Only runs when factor > 1. |
| Lifecycle | 1h | Yes | Deletes objects matching lifecycle rules whose created_at exceeds expiration_days. Only runs when rules are configured. |
| Reconciler | configurable (default 24h) | Yes | Scans each backend for untracked objects and imports them into the metadata database via SyncBackend. Only runs when reconcile.enabled: true. |
| Pending reaper | configurable (default 1m) | Yes | Resolves PUT-before-COMMIT intents that survived a failed metadata commit. HEADs the destination backend and either promotes the row into object_locations (object present) or drops the intent (object absent). Skips intents younger than min_age so in-flight PUTs are not interrupted. |
| Scrubber | configurable (default 6h) | Yes | Random-samples objects, fetches and re-hashes them, and enqueues a cleanup if the stored content_hash does not match. Only runs when integrity.enabled: true and scrubber_interval > 0. |
| Notification drainer | 5s | No | Drains notification_outbox rows by POSTing CloudEvents JSON to configured webhook endpoints. Optional HMAC signing per endpoint. |
| CB watchdog | 1m | No | Checks all circuit breakers for stale half-open probes. If a probe has been in flight longer than 2 minutes, resets the circuit to open so a new probe can be dispatched. Prevents circuits from getting stuck half-open when traffic stops. |
Concurrency
Background services (rebalancer, replicator, over-replication cleaner, cleanup queue) share the admission semaphore with HTTP requests, so max_concurrent_requests is the total budget for both HTTP and background backend operations.
Multi-instance behavior
The advisory locks are PostgreSQL session-scoped — if one instance holds the lock for a task, other instances skip that tick silently. Each task runs on exactly one instance at any moment. The Notification drainer and CB watchdog are not locked because they’re idempotent and per-instance state is acceptable for them.
See docs/deployment.md for the multi-instance deployment model.