
Style Guide
Author: Alex Freidah
Table of Contents
- Core Principles
- Comment Types and Spacing
- File Headers
- Go Conventions
- Project Structure and Layers
- Dependency Injection
- Adding a New Component
- Error Handling
- Logging and Audit
- Tracing
- Metrics
- Testing
- Code Style
- Versioning
- Documentation Updates
- Branch Naming
Core Principles
- ASCII-only characters - Never use Unicode em-dashes, en-dashes, or box-drawing characters
- Dashes, not equals - Always use
-for dividers, never= - Box comment spacing - ALL box comments (79-char file headers and 73-char sections) ALWAYS have a blank line after
- Professional tone - No personal references, no numbered lists, no casual language
- Self-documenting - Code explains why, not just what
- Streaming over buffering - Use
io.Pipeand streaming patterns for object data; never buffer entire objects in memory - Buffer pooling - Use
bufpool.Copyinstead ofio.Copyfor all streaming I/O to reuse buffers and reduce GC pressure - Context propagation - Pass
context.Contextthrough all function chains for cancellation, tracing, and audit correlation
Comment Types and Spacing
File Header (79 characters)
Format:
Spacing Rules:
- Blank line after title
- Blank line after metadata
- Blank line before closing divider
- Blank line after closing divider - always separate box from code
Major Section Box (73 characters)
Format:
Spacing Rules:
- Use ALL CAPS for section name
- Blank line AFTER closing divider - separates section from code
- Used for major logical divisions (e.g., PUBLIC API, INTERNALS, TYPES)
Single-Line Comments
Standard Go comments placed directly above the code they describe:
- NO blank line before code - placed directly above the block
- Use lowercase or sentence case
- Used for minor divisions or labels within functions
Inline Comments
- Use sparingly
- Explain why, not what
- Keep concise (< 50 characters)
Comment Type Decision Tree
Key Rule: ALL box comments (79-char and 73-char) have a blank line after. Single-line comments have no extra spacing.
File Headers
Every .go file starts with a 79-char header block:
Rules:
- Use
//comments (not/* */blocks) - Title line describes the file’s scope, not the package
- Description covers purpose, key behaviors, and dependencies
- The
packagedeclaration follows immediately after the closing divider + blank line
Go Conventions
Indentation
- 1 tab - Go standard (
gofmtenforced)
Imports
Group imports in three blocks separated by blank lines:
Order: stdlib, internal packages, external packages.
Naming
- Exported types get standard Go doc comments placed directly above the declaration
- Constants grouped by concern with
constblocks, named inCamelCase - Sentinel errors use
Errprefix:ErrObjectNotFound,ErrDBUnavailable - S3Error type wraps HTTP status + S3 error code for typed error responses
- Operation names are consistent PascalCase strings matching S3 API names:
"PutObject","GetObject","CreateMultipartUpload"
Struct Organization
Group related fields with inline comments explaining non-obvious fields:
Interface Design: Consumer-Declared Interfaces
This codebase follows the Go-idiomatic “accept interfaces, return structs” pattern: producer packages export concrete *Type values with no producer-side interface, and each consumer declares its own narrow interface listing only the methods it actually calls. The concrete type satisfies every consumer’s local interface because Go interfaces are structurally typed.
Applied across internal/store (the per-role store interfaces consumed at use sites), internal/worker (Ops, CleanupOps, ScrubberOps in ops_runtime.go), and the internal/proxy/* subpackages (MultipartCore/MultipartCoordinator, ObjectCore/ObjectCoordinator, WritepathCore).
Rationale:
- A consumer’s dependency footprint is documented in its own source file.
- Adding a method to a producer type never bloats existing consumer mocks.
- Tests can mock at the granularity of what is used (typically 4–10 methods), not the full producer surface.
- Aligns with Rob Pike’s “accept interfaces, return structs” guideline.
Trade-offs:
- Each consumer declares its own small interface (extra text, but localized).
- The composition layer (the root
proxypackage,internal/di) still holds concrete types — it owns construction and is the seam where interfaces meet implementations.
Where the interfaces live:
| Location | Holds |
|---|---|
internal/proxy/<consumer>/consumer_interfaces.go | All narrow interfaces this consumer declares against other proxy subpackages and external clients |
internal/worker/ops_runtime.go | Worker-side Ops / CleanupOps / ScrubberOps interfaces against the proxy infrastructure |
internal/store/core/interfaces.go | Per-role narrow store interfaces (consumers compose them when they need to declare a minimal store dependency) |
Naming convention: <Consumer><Provider> — e.g. the multipart subpackage’s view of *infra.Core is MultipartCore; the object subpackage’s view of *writepath.Coordinator is ObjectCoordinator. The prefix names the consumer, the suffix names the producer concept.
Constructor shape: consumers take the interfaces, not concrete pointers. Composition-layer code (the root proxy package, DI providers) passes the concrete *infra.Core, *writepath.Coordinator, *object.Manager, etc., and the concrete types satisfy the interfaces implicitly.
Mocking: generated mocks are not produced eagerly. When a test actually needs to mock a consumer-declared interface, add a //go:generate mockgen -source=consumer_interfaces.go -destination=mock/<file>.go -package=<pkg>mock directive at the top of the consumer interface file and run make generate. Until a mock is needed, the interface declaration alone documents the dependency surface — generating unused mocks is busywork.
When NOT to declare a narrow interface (#918). A consumer-side interface earns its keep when at least one of these is true:
- Multiple implementations actually exist (a real polymorphism point, e.g.
keySourceover S3 iter + DB iter). - A test fake genuinely benefits from the seam — a hand-rolled fake or
gomock-generated mock that lets tests exercise the consumer without standing up the real producer (e.g.worker.Opsmocked across ~60 test sites;admin.ReplicatorOps/OverReplicationOps/ScrubberOps/Reconcilerwhose fakes drive admin handler branches). - An import cycle would otherwise form (e.g.
readpath.LocationCache— without the interface,readpathwould have to importobjectwhich already importsreadpath). - The interface models a real domain boundary between subsystems (
drain.Core,MultipartCore,ObjectCore,WritepathCore).
If none apply — single impl, single consumer, no test fake, no cycle, no boundary — pass the concrete *Type directly. Examples cut under #918: readpath.ObjectLocationLister, multipart.MultipartCoordinator, multipart.StaleCleaner, accounting.UsageTracker, the worker.RebalancerStore/ReplicatorStore/OverReplicationStore/CleanupWorkerStore/ScrubberStore/PendingReaperStore store-role wrappers (workers now take core.MetadataStore directly).
Producer-side interfaces are an anti-pattern. *infra.Core, *writepath.Coordinator, *object.Manager, *multipart.Manager are exported as concrete pointer types with no sibling Core/Coordinator/Manager interface that mirrors their public surface. Producer-side interfaces force every consumer to mock the full producer API, which is exactly what this pattern is built to avoid.
Logger is not a behavior dependency. Never include Log() *slog.Logger in a consumer-declared interface. The logger is observability infrastructure: it has no return value the consumer depends on, and the per-component scope is a property of the consumer itself, not of the producer. Components build their own log *slog.Logger field in the constructor body via slog.Default().With(logfmt.Component("<slug>")) (see Logging and Audit), so each subsystem owns its component name and tests do not need to thread a logger through dependency interfaces.
Single-method interface names follow the Go -er convention. A single-method interface is named after its method, with the verb in agent-noun form: Read → Reader, Close → Closer, Stringer for String(). Names ending in -Ops, -Store, -ing, or other shapes that do not describe an actor get flagged by golangci-lint / sonar (rule S8196) and should be renamed:
| Method | Good | Bad |
|---|---|---|
Read(p []byte) (int, error) | Reader | IOOps |
Close() error | Closer | Closeable |
GetAllObjectLocations(...) | ObjectLocationLister | LocationStore |
CleanupStaleMultipartUploads(...) | staleMultipartCleaner | multipartCleanupOps |
UpdateUsageLimits(...) | usageLimitsApplier | usageLimitsHook |
UpdateQuotaMetrics(...) | quotaMetricsRefresher | metricsHook |
For interfaces that exist to provide a value (typical “Acct” / “Stores” / “Config” getters), name the interface after the returned type plus Provider or Source — RecorderProvider for Acct() *Recorder, ConfigSource for Config() *Config. The Provider / Source suffix is also an agent noun and satisfies the rule.
Multi-method interfaces are exempt: core.MetadataStore, worker.Ops, ObjectCore, MultipartCoordinator describe a role (or a composite of sub-roles), not a single action, so the -er form does not apply.
Per-Backend Accounting: Use the Recorder
Per-backend usage and per-operation metric accounting flow through one shared helper: internal/proxy/accounting.Recorder. Every consumer holds the same *accounting.Recorder (resolved via core.Acct()) and calls the named methods rather than the underlying Usage().Record(...) or RecordOperation(...):
| Intent | Use | Avoid |
|---|---|---|
| Backend call attempted, no bytes (success or failure) | acct.APICall(backend) | Usage().Record(b, 1, 0, 0) |
| N backend calls (paginated lists, multipart bulks) | acct.APICalls(backend, n) | Usage().Record(b, n, 0, 0) |
| Successful GET-like (bytes left the backend) | acct.Egress(backend, size) | Usage().Record(b, 1, size, 0) |
| Successful PUT-like (bytes arrived at the backend) | acct.Ingress(backend, size) | Usage().Record(b, 1, 0, size) |
| Per-operation Prometheus histogram | acct.Operation(op, backend, start, err) | RecordOperation(op, b, start, err) |
| Common combo: Operation + Ingress | acct.PutSuccess(op, backend, size, start) | both above, two lines |
| Common combo: Operation + Egress | acct.GetSuccess(op, backend, size, start) | both above, two lines |
| The call reached the backend and failed | acct.OperationFailed(op, backend, start, err) | Operation + APICall, two lines |
The cardinal rule “every backend call costs one API-call charge regardless of outcome” lives in the method bodies, not in repeated inline comments at every call site. The argument-order risk of the bare Record(b, 1, 0, size) vs Record(b, 1, size, 0) form is eliminated because ingress vs egress is named in the method.
Exceptions — not every Usage call goes through Recorder:
BackendManager.RecordUsage(b, apiCalls, egress, ingress)is the admin escape hatch that takes an arbitrary tuple. It bypasses the per-attempt rule on purpose and usesUsage().Recorddirectly.- Tests that exercise the tracker’s own semantics (
manager_usage_test.go) callUsage().Recorddirectly — they’re verifying the tracker, not the accounting rule.
Variable Naming
Avoid shadowing package imports with local variable names. When a function receives or creates a backend.ObjectBackend, name the variable be, not backend:
Typed Constants
Use typed string constants for values compared in multiple places. Bare string comparisons (e.g., routingStrategy == "spread") are error-prone:
No Empty Cleanup Funcs
Never return func() {} to satisfy a (T, cleanup func(), error) signature when the “no-cleanup” branch has nothing to do. SonarQube flags empty function literals (rule go:S1186) and the empty literal hides whether the branch meant to be empty or someone forgot to wire the cleanup.
Why: Past SonarQube findings on PRs that returned func() {} as the no-op cleanup arm of an in-memory-vs-tempfile sink. The repo has hit the same lint multiple times across different abstractions.
How to apply: Attach the cleanup as a method on the returned type so the caller writes defer x.Cleanup() once and the method internally branches on which underlying resource (if any) actually needs releasing. The lint never sees an empty literal, and the caller code stays identical across branches.
When the cleanup truly must be a callback (e.g., interface contract), wrap it in sync.OnceFunc and have it do something meaningful (clear a pointer, decrement a counter) so the body is never literally empty.
Concurrency Patterns
syncutil.AtomicConfig[T]for hot-reloadable config (wrapsatomic.Pointer[T]withStore/Load)syncutil.TTLCache[K,V]for generic TTL-based caches with background eviction- Atomic counters for usage tracking, flushed periodically to the database
- Context-scoped timeouts via helper methods (
m.WithTimeout(ctx)) for backend calls - Graceful shutdown via
context.WithCancel+ signal handling
Project Structure and Layers
The codebase splits along strict layers. New code goes in the layer whose responsibility matches; an outer layer must not import a more outer layer, and inner layers must never import the transport packages.
Layer Responsibilities
| Layer | Imports | Responsibility |
|---|---|---|
cmd/, cli/ | Everything | Wire flags, build the DI injector, invoke top-level services |
transport/ | proxy, auth, observe | Decode HTTP, authenticate, dispatch to manager, encode S3-XML response |
proxy/ | worker, store, backend, observe, counter | Routing strategy, broadcast reads, location cache, worker hosts |
worker/ | store, backend, observe | Background services driven by the lifecycle manager |
store/ | nothing app-specific | Metadata persistence; engine-agnostic orchestration in core/, engines under postgres/ and sqlite/ |
backend/ | nothing app-specific | Per-provider S3 client wrappers behind one ObjectBackend interface |
observe/ | nothing app-specific | Logging, tracing, metrics, audit |
di/ | All public surfaces | Single wiring point that registers every provider |
Inner layers must not import outer layers (no store importing transport).
The compiler does not enforce this, so reviewers must - it is the rule that
keeps the code testable and the dependency graph acyclic.
Layered Read of a Single PUT
The arrow direction is strict: store/core never calls back into proxy.
Dependency Injection
DI uses samber/do/v2 and is centralised
in internal/di/di.go. Every external dependency, store role, worker, and
top-level handler has a Provide<Name> function registered there. Callers
resolve dependencies via do.Invoke[Foo](inj) at the moment they are
needed; nothing is constructed until it is asked for.
Two Foundational Rules
- Lazy providers: a provider returns the dependency it builds; it
never has side effects beyond construction. The injector calls it on
the first
do.Invokeand memoises the result for subsequent calls. - Wide composite store, consumer-defined narrow interfaces at the
boundary: consumers receive the wide
core.MetadataStore(a composite that embeds every per-role interface) and either typehint- narrow it at the field declaration or declare a local interface listing only the methods they call. Producer-side narrow roles (ObjectStore,QuotaStore, etc.) exist as embedding building blocks forMetadataStore; consumer code does not import them directly.
Provider Pattern
Every provider follows the same shape:
Providers take a do.Injector, resolve their own dependencies through it,
and return either the value or an error. They never panic, never log, and
never spawn goroutines. CB protection lives inside each driver’s DBTX/DB
chokepoint, so providers do not wrap stores with per-role decorators.
Wiring Point
The internal/di package is the single wiring point. It is the only
package that imports the concrete engine packages (store/postgres,
store/sqlite, each backend driver, etc.). Providers are split across
focused files by concern:
injector.go-NewInjectorand the top-level registration orderstore.go- database, role aliases, instance ID, metrics depsbackend.go- S3 backends, breaker registry, backend manager, and the optional providers feeding it (encryption, Redis counters, object cache)workers.go- background worker providerslifecycle.go- lifecycle manager and per-mode service registrationtransport.go- S3, admin, UI, notifier, rate limiter, login throttleoptional.go-invokeOptional[T]andresolveOptional*helpersservices.go- lifecycle Runner wrappers (locked-ticker background jobs)
New providers go in the file matching their concern, with 73-char
section dividers when a single file hosts multiple groups.
Consuming a Service
The transport layer and CLI commands resolve services from the injector:
The handler holds do.Injector, not the resolved service - that keeps
the handler decoupled from concrete provider behaviour and lets tests
substitute fakes by registering a different provider.
Adding a New Provider
- Add a
Provide<Name>function in the appropriate section ofinternal/di/di.go. - Inside the function, call
do.Invoke[Dep](i)for each dependency. - Construct the new component and return
(value, nil). - If the new component is consumed somewhere, the consumer should call
do.Invoke[Name](inj)rather than holding the value as a field.
Optional Providers
Some providers register only when a feature is enabled (encryption,
Redis counters, UI, notifications) or only in specific run modes
(reconciler is worker/all-only). Consumers that may run in either mode
use the invokeOptional[T] helper, which returns the zero value of T
when the service is not registered:
Required providers should still bail with the do.Invoke error so a
genuinely missing dependency surfaces at boot, not at first use.
Anti-Patterns
- Constructing a real dependency in a constructor that already has a
provider. Always go through
do.Invoke. - Passing a
*slog.Loggerthrough the constructor argument list. Long-lived components hold a privatelog *slog.Loggerfield set inside the constructor body viaslog.Default().With(logfmt.Component("name")). Metrics use theobserve/telemetrypackage-level vars directly. Free helper functions callslog.XContext(ctx, ...)directly. Seedocs/contributing/logging.md. - Storing the injector as a field on a non-handler struct. Workers, managers, and stores receive their dependencies as constructor args resolved in the provider; only handlers (HTTP/CLI entry points) carry the injector itself.
Constructor Patterns
Constructors at the DI/wiring boundary panic on missing required
dependencies via internal/util/must. The boundary is the set of
constructors a DI provider or a test fixture builds directly:
proxy.NewBackendManager, every worker.New*, every
transport/{s3api,admin,ui}.New*, and the proxy/{drain,readpath, multipart,object,writepath} package constructors that take a *Deps
or interface bag. Internal helpers that are only called from one
already-validated site (accounting.New, metrics.New,
dashboard.New, infra.New) trust their caller and skip the panic.
Why panic rather than return (*T, error): a wiring bug is a programmer
error, not an operator-recoverable condition. The panic surfaces at boot
with a clear “required dependency X is nil” message; the alternative
defers to the NPE that fires several call frames deep on the first
request. Operator-facing config invariants (negative timeouts, missing
required strings) belong in internal/config.SetDefaultsAndValidate
which returns errors the loader can format and report.
Fallible-constructor exemption. Constructors that legitimately
fail at boot for I/O, parsing, or network reasons keep the
(*T, error) signature and do not use must.NotNil. Examples:
httputil.NewCertReloader (reads cert files),
httpserver.New (binds a port). The DI provider chains the error
upstream.
Tests must satisfy the boundary. When a test constructs one of
these types directly, it provides real or gomock-generated deps for
every required field. Don’t loosen production validation to fit lazy
test wiring; widen the test fixture instead. The shared fixtures in
internal/proxy/proxytest, internal/testutil, and the per-package
newTestHandler* helpers exist for this purpose.
Adding a New Component
Each component type has a fixed checklist; following it keeps the code shape uniform across contributors.
Adding a New Backend
- Implement the
backend.ObjectBackendinterface ininternal/backend/<name>.go. Constructor isNew<Name>(cfg config.BackendConfig). - Add the new backend type to the config validator in
internal/config/backends.go. - Wire it into the backend factory in
internal/di/di.goso existingProvideBackendsreturns the new type when configured. - Add a unit test in
internal/backend/<name>_test.go. Integration coverage comes from the existing MinIO testcontainer suite for any S3-compatible provider. - Update
README.mdanddocs/admin-guide.mdconfig sections.
Adding a New Store Role
The narrow-role layering means new operations slot into one of the
existing role interfaces in internal/store/core/interfaces.go, or - if
they truly belong to a new bounded concern - into a new role.
- Add the SQL in
internal/store/postgres/sqlc/queries/<role>.sqland the matching method ininternal/store/sqlite/<role>.go. - Run
make generateto regenerate the sqlc-typed wrappers. - Surface the operation on the role interface in
internal/store/core/interfaces.go. If business logic spans multiple tables or transactions, add the orchestration tointernal/store/core/<topic>.goagainstTxAdapterso both engines share one implementation. - Add the CB-decorator method in
internal/store/cb_<role>.go. - Update
MockStoreininternal/testutil/mock_store.goand any handwritten test mocks (internal/proxy/mock_store_test.goetc.). - Add a
Provide<Role>provider ininternal/di/di.goif it is a new role; otherwise the existing provider already covers it. - Schema changes go in a new
internal/store/postgres/migrations/000NN_*.sqlANDinternal/store/sqlite/schema.sql. Bumppostgres.ExpectedSchemaVersion.
Adding a New Worker
- Implement the worker in
internal/worker/<name>.go. Define a narrow ops interface (e.g.<Name>Ops) and a narrow store interface (e.g.<Name>WorkerStore); the worker takes both as constructor args. - The worker exposes
Run(ctx) error(long-lived) orProcess<X>(ctx)(called per tick) - lifetime determines which. - Register a
Provide<Name>provider ininternal/di/di.go. - Register the worker with the lifecycle manager in
registerWorkerServicesinsidedi.go. Long-lived workers go throughlifecycle.Manager; periodic workers wrap a ticker and an advisory lock. - Add metrics in a per-worker file under
internal/observe/telemetry/metrics_<worker>.goif needed.
Adding a New HTTP Handler
- Define the handler in the appropriate
internal/transport/sub- package. Handler signatures arefunc (h *Handler) handle<X>(w http.ResponseWriter, r *http.Request). - The handler resolves its dependencies through the injector held on
*Handler; it never constructs services itself. - Authentication middleware is applied at the
Serverlevel - do not reimplement auth inside individual handlers. - Audit-log every state-changing request via
audit.Log(ctx, "<event>", ...). - Update
docs/api-reference.mdfor any new admin or UI endpoint.
Adding a New Metric
- Add the metric variable in the appropriate
internal/observe/telemetry/metrics_<domain>.gofile usingpromauto.NewXxx. Group by domain (cleanup, replication, breaker, etc.) - do not create a global metrics file. - Use the
s3o_<domain>_<noun>_<unit>naming convention. Counters end in_total. Histograms end in_secondsor_bytes. Gauges take no suffix. - Add the metric to the dashboard JSON if it is operator-facing
(
grafana/s3-orchestrator.json). - Document the metric in
README.mdanddocs/admin-guide.md.
Error Handling
S3 Errors
Use the S3Error type for errors that map to S3 HTTP responses:
Handlers use writeStorageError to convert S3Error instances into XML responses. Untyped errors fall back to 502 InternalError.
Error Classification
The classifyWriteError helper distinguishes database unavailability (circuit breaker open) from other failures, returning appropriate S3 error codes:
ErrDBUnavailable->503 ServiceUnavailable- Other errors ->
502 InternalError
Background Operation Errors
Background workers (rebalancer, replicator, cleanup) log errors and continue rather than crashing. Individual item failures are logged with slog.Warn and skipped; the batch proceeds with remaining items.
Logging and Audit
All logs are structured log/slog JSON with the JSON handler set in
internal/cli/serve/serve.go initLogging. The full operational
logging conventions (helper package, attribute glossary, banned keys,
component scoping, outcome rollups) live in
docs/contributing/logging.md. This
section is the short summary; the contributing doc is the source of
truth.
Structured operational logs
Every long-lived component holds a *slog.Logger field initialised
once in its constructor with the canonical component attribute:
Three rules every call site must follow:
- Always render errors through
logfmt.Err. Passing"error", errdirectly serialises the error struct as{}in the JSON handler, which JS log viewers render as[object Object], hiding the actual failure mode from operators. - Component is an attribute, not a message prefix. Messages are
plain English (“HEAD probe failed”), not “Pending reaper: HEAD
probe failed”. The
componentattr added by the scoped logger makes every line filterable in Loki/Grafana without text matching. - Use the canonical attribute glossary in
docs/contributing/logging.md. The CI lint rejects banned keys (err,from_backend,remote_addr, etc.).
golangci-lint enforces context: all on every slog call, snake-case
keys, and the banned-key list. See .golangci.yml for the active
sloglint configuration.
Audit Logging
The internal/observe/audit package emits a separate stream of
structured entries marked "audit": true for security-relevant
operations. Audit logs are not subject to the operational-log
conventions above and continue to use slog.LogAttrs directly.
S3 API requests produce two correlated audit entries sharing the
same request_id:
- HTTP layer (
s3.PutObject,s3.GetObject, etc.) — method, path, bucket, status, duration. - Storage layer (
storage.PutObject,storage.GetObject, etc.) — key, backend, size.
Internal operations generate their own correlation IDs:
Rules:
- Use
audit.Logfor operations that change state or serve data (not for debug/health checks). - Always pass context so the request ID propagates.
- Event names use dotted notation:
"s3.PutObject","storage.DeleteObject","rebalance.move". - Include enough attributes to reconstruct the operation without reading other log lines.
Request ID Propagation
Request IDs flow through context via audit.WithRequestID /
audit.RequestID. The logfmt package’s init wires audit.RequestID
as the accessor logfmt.RequestIDFromCtx reads, so worker logs can
surface the inbound request ID without importing the audit package
directly.
- S3 API requests: extracted from
X-Request-Idheader or generated, set on context before auth. - Internal operations: generated at the start of each background task tick or batch run.
- The ID is also set as a
s3o.request_idattribute on OpenTelemetry spans. trace_idandspan_idare automatically injected into JSON log output bytelemetry.TraceHandlerfor any log call with an active span in context — use*Contextslog variants (InfoContext/WarnContext/ErrorContext).
Log Levels
| Level | Use |
|---|---|
slog.LevelDebug | Verbose state; off by default. |
slog.LevelInfo | Lifecycle (startup/shutdown), terminal success of a notable operation, audit entries. |
slog.LevelWarn | Recoverable failure — caller proceeds, operator should know (failover, degraded mode, retry-able errors). |
slog.LevelError | Unrecoverable failure of an operation — request fails, background tick aborts, integrity violation. |
Tracing
OpenTelemetry tracing is wired in internal/observe/telemetry. Every
S3 request, manager call, backend call, and significant background tick
produces a span; spans inherit the request id from audit.RequestID(ctx)
so log lines and traces cross-link.
Starting a Span
Use telemetry.StartSpan rather than the OTel API directly. The helper
guarantees the s3o.request_id attribute is set when the context carries
one and that the span is registered on the global tracer:
Always defer span.End() on the line after the StartSpan call so an
early return cannot leave the span open.
Span Naming
Span names are stable strings (no per-request data) so traces aggregate cleanly:
| Layer | Prefix | Example |
|---|---|---|
| Manager | Manager <Op> | Manager PutObject, Manager GetObject |
| Backend | Backend <Op> | Backend PutObject, Backend GetObject |
| Background worker | <Worker>.<Op> | Replicator.Replicate, CleanupWorker.ProcessCleanupQueue |
| Store engine | <Engine> <Op> | Postgres RecordObject, SQLite ListObjects |
The Manager /Backend prefixes are constants (managerSpanPrefix,
spanPrefix) so the layer attribution stays consistent if the prefix
ever changes.
Attributes
Standard attribute keys live in internal/observe/telemetry/attrs.go so
every span uses the same string. Add new keys there rather than inlining
strings at call sites.
High-cardinality values (object keys, request IDs as attributes) are allowed only on leaf spans. Do not put object keys on long-running worker spans because that explodes trace storage cardinality.
Recording Errors
Span errors are recorded via the OTel RecordError + SetStatus pair
so trace UIs flag the span as failing. Do not log the error inside the
span block - the audit/slog log line already carries the error and
trace correlation links the two.
Trace-to-Log Correlation
Every slog.InfoContext, WarnContext, ErrorContext call inside an
active span automatically attaches trace_id and span_id fields via
telemetry.TraceHandler. This is why context-aware slog must be used
(see Logging and Audit). Never use slog.Info (no context) when an
active span exists - that produces a log line with no trace correlation.
Metrics
Prometheus metrics live in internal/observe/telemetry/metrics_*.go,
one file per domain. The promauto constructors register the metric
with the default registry on package init; nothing else needs to be
called to make a metric visible at /metrics.
Naming
| Type | Format | Example |
|---|---|---|
| Counter | s3o_<domain>_<noun>_total{labels} | s3o_cleanup_queue_enqueued_total{reason} |
| Gauge | s3o_<domain>_<noun>{labels} | s3o_cleanup_dlq_depth |
| Histogram | s3o_<domain>_<noun>_seconds or _bytes | s3o_request_duration_seconds |
The s3o_ prefix is mandatory so a multi-service Prometheus instance
can pick out orchestrator metrics with one label match. The unit suffix
goes at the end (_seconds, _bytes, _total) to match Prometheus
conventions.
File Layout
Group metrics by domain in internal/observe/telemetry/metrics_<domain>.go.
Existing domains: audit, breaker, cache, cleanup, encryption,
meta, quota, rebalance, replication, request. Add a new file
when a new feature has its own logical domain - do not append to a
random existing file just to avoid creating one.
Each metric variable carries a 2-3 line godoc explaining what it measures and which dashboard panel reads it.
Label Cardinality
Labels multiply storage cost. Hard rules:
- Allowed labels:
backend,operation,status,reason, enumerated states. These come from a small fixed set per dimension. - Forbidden labels:
key,request_id,user, IP, anything user-supplied or identity-bearing. These create unbounded label cardinality and will eventually crash Prometheus. - Buckets for histograms must be explicit (
prometheus.LinearBucketsorExponentialBuckets); never default. Pick buckets that span the expected p50 to p99.9 range with at most 10-15 buckets total.
Updating a Metric
Counters use Inc or Add. Gauges use Set. Histograms use Observe.
Always pass label values in the order they appear in the metric
declaration; Prometheus does not protect against argument-order swaps.
Tying Metrics, Tracing, and Audit Together
Every state-changing operation should produce all three signals at the relevant scope:
| Signal | Where |
|---|---|
| Metric | At the layer where the count makes sense (manager for per-request counters, worker for per-tick counters) |
| Span | One per layer call (Manager <Op>, Backend <Op>, store engine) |
| Audit log | At the outer layer responsible for the operation, with the request id from audit.RequestID(ctx) |
Skipping one degrades observability: missing metric means dashboards cannot alert; missing span means a slow request cannot be traced; missing audit log means a customer dispute cannot be reconstructed.
Testing
Unit Tests
- Test files live alongside the code they test:
server_test.go,objects_test.go - Use table-driven tests for operations with multiple input/output combinations
- Use
go.uber.org/mock/mockgenfor generating mocks from interfaces. Add//go:generate mockgendirectives and runmake generate. Generated mocks live alongside the interface they mock. - Legacy hand-written mocks exist in
internal/testutil/and some packages; prefer generated mocks for new tests - Test names follow
TestFunctionName_Scenarioconvention
Integration Tests
- Located in
internal/integration/ - Gated behind the
integrationbuild tag - Run in-process with real MinIO and PostgreSQL containers
- Cover end-to-end flows: CRUD, quota enforcement, multipart, replication, circuit breaker
Benchmarks
- Benchmark files use the
_bench_test.gosuffix:auth_bench_test.go,helpers_bench_test.go - Integration benchmarks live in
internal/integration/bench_test.go(gated behind theintegrationbuild tag) - Run benchmarks:
make bench,make bench-auth,make bench-crypto, etc. Override duration withBENCH_TIME=5s make bench - Use
b.Loop()(Go 1.24+) for all benchmark loops — notfor i := 0; i < b.N; i++ - Use
b.SetBytes()for throughput benchmarks so Go reports MB/s - Use
b.ResetTimer()after setup/population steps - Use
b.RunParallelwithpb.Next()for concurrent benchmarks (b.Loop()cannot replacepb.Next()insideRunParallelcallbacks) - Pre-compute keys and test data outside the measured loop —
fmt.Sprintfinside a benchmark loop measures string formatting, not the code under test - Benchmark names follow
BenchmarkFunctionName/variantwith sub-benchmarks for different input sizes
Fuzz Tests
- Fuzz test files use the
_fuzz_test.gosuffix:auth_fuzz_test.go,xml_fuzz_test.go - Run fuzz targets:
make fuzz(default 30s per target). Override withFUZZ_TIME=5m make fuzzfor deeper exploration - CI runs a 10s smoke test per target to catch regressions
- Seed the corpus with valid inputs, edge cases, and adversarial inputs via
f.Add() - Fuzz callbacks must never panic — verify invariants with
t.Errorfinstead - Match the production code path — use
xml.NewDecoder().Decodenotxml.Unmarshalwhen production uses streaming decoders - Differential oracles: When two implementations exist for the same operation (e.g.,
parseSigV4FieldsandparseSigV4FieldsDirect), fuzz both and assert they agree on every input - Structural invariants: Assert properties of the output that must always hold (e.g., “canonical request has exactly N newlines”, “bucket never contains a slash”, “nonce is always 12 bytes”)
- Do not assert application-level validation at the parsing layer — XML decoders accept negative integers; the handler layer validates range constraints
Test Patterns
- Generated mocks (
mockgen) are the preferred approach for new tests — they stay in sync with interfaces automatically - FailableStore wraps a store to inject errors for circuit breaker testing
- Test assertions use standard
testing.Tmethods, not external assertion libraries
Coverage Exclusions
Go has no built-in coverage ignore directive. Use the Codecov // codecov:ignore inline comment to exclude untestable code (process entry points, os.Exit wrappers) from coverage reports. Always include a reason after the directive:
Use this sparingly and only for code that genuinely cannot be tested:
main()and subcommand entry points that callos.Exit- Signal handlers and process lifecycle glue
Do not use codecov:ignore for code that requires a database, S3 backend, or Redis — integration tests run with testcontainers and contribute coverage. Extract testable logic into separate functions that return errors instead of calling os.Exit directly.
Code Style
Character Rules
ALWAYS USE:
- ASCII dash:
-(hyphen-minus, U+002D) - Standard ASCII characters only
NEVER USE:
- Unicode em-dash (U+2014)
- Unicode en-dash (U+2013)
- Unicode box-drawing (U+2500)
- Equals signs for dividers
Professional Tone
Avoid:
- Personal references: “Let me show you…”, “We need to…”
- Numbered lists in comments: “1. First do this”, “2. Then do that”
- Conversational tone: “Now we’re going to…”
- Future tense: “This will create…”, “We’ll configure…”
Use:
- Present tense: “Creates”, “Configures”, “Manages”
- Declarative statements: “Service runs on port 9000”
- Technical precision: “Uses SigV4 for request authentication”
- Impersonal voice: “The manager selects…”, “The circuit breaker wraps…”
Versioning
The .version file in the repository root controls the version baked into binaries, Docker images, and Debian packages. Every PR that changes Go or SQL files must bump the version.
Patch bump (v0.X.Y -> v0.X.Y+1) for:
- Bug fixes
- Refactoring (no behavior change)
- Documentation updates
- Test improvements
- Dependency updates
- Cleanup and code style changes
Minor bump (v0.X.0 -> v0.X+1.0) for:
- New features (new config fields, new API endpoints, new CLI commands)
- Breaking changes (required config fields, changed defaults, removed options)
- Schema migrations that add tables or columns
Documentation Updates
When a PR changes config fields, API behavior, or deployment requirements, update all affected documentation in the same PR:
packaging/config.yaml— sample configconfig.yaml— local dev configREADME.md— config reference sectiondocs/admin-guide.md— operational documentationdocs/security-hardening.md— if security-relevantdocs/disaster-recovery.md— if it affects failure modesweb/content/guides/*.md— deployment guidesdeploy/— Nomad, Kubernetes, and Helm manifests
Search across all docs with grep -rn 'field_name' README.md docs/ web/content/ packaging/ deploy/ before committing to catch every reference.
Branch Naming
When a branch corresponds to a GitHub issue, use this format:
Examples:
GH_ISSUE_251-worker-pool-parallelismGH_ISSUE_42-multipart-upload-cleanup
For branches without a linked issue, use a short kebab-case description of the topic.
Quick Reference
| Comment Type | Length | Spacing After | Use Case |
|---|---|---|---|
| File header | 79 chars | 1 blank line | Top of every .go file |
| Major section | 73 chars | 1 blank line | Major divisions (types, API, internals) |
| Single-line comment | Variable | None | Minor divisions within functions |
| Inline | Brief | N/A | Specific line explanation |
Examples
Good
Bad
Remember: Comments should explain why decisions were made, not what the code does. The code itself should be clear enough to understand what it does.