Metrics¶

obskit provides a layered metrics API built on top of prometheus_client. You choose the abstraction level that fits your need: opinionated RED/USE/Golden-Signals collectors for common patterns, or raw Prometheus primitives when you need full control.

Quick Start¶

Python

from obskit.metrics.red import REDMetrics

red = REDMetrics(service="my-service")
red.record_request(method="GET", endpoint="/users", duration=0.042, status_code=200)
red.record_error(method="GET", endpoint="/users", error_type="DatabaseError")

That's it — a Histogram and two Counters are registered in the default Prometheus registry and will be scraped at /metrics.

RED Method¶

The Rate–Errors–Duration method is the recommended starting point for any request-handling service.

REDMetrics API¶

Python

from obskit.metrics.red import REDMetrics

red = REDMetrics(
    service="payment-service",   # Becomes a label on all metrics
    namespace="myapp",           # Optional Prometheus namespace prefix
)

`record_request()`¶

Records a completed request — increments the rate counter and observes the duration histogram.

Python

red.record_request(
    method="POST",          # HTTP method or RPC method name
    endpoint="/charge",     # Route pattern (not raw URL — avoid high cardinality)
    duration=0.142,         # Seconds (float)
    status_code=200,        # HTTP status or gRPC status code
)

Generated metrics:

Metric name	Type	Description
`myapp_requests_total`	Counter	Total requests, labelled by method, endpoint, status
`myapp_request_duration_seconds`	Histogram	Latency distribution

`record_error()`¶

Increments the error counter separately so error rate can be computed independently of status codes.

Python

red.record_error(
    method="POST",
    endpoint="/charge",
    error_type="PaymentGatewayTimeout",  # Exception class name or custom category
)

FastAPI integration¶

obskit's FastAPI middleware wires RED metrics automatically:

Python

from fastapi import FastAPI
from obskit.middleware.fastapi import ObskitMiddleware
from obskit.metrics.red import REDMetrics

app = FastAPI()
red = REDMetrics(service="api")

app.add_middleware(ObskitMiddleware, red_metrics=red)

Every request is timed and recorded with zero boilerplate.

prometheus_client Integration¶

obskit is built on prometheus_client and exposes the full API. You can mix obskit collectors with raw Prometheus primitives in the same registry.

Histograms¶

Python

from prometheus_client import Histogram

# Custom buckets for a payment processing service
payment_duration = Histogram(
    "payment_processing_duration_seconds",
    "Time to process a payment",
    labelnames=["gateway", "currency"],
    buckets=[0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0, 30.0],
)

payment_duration.labels(gateway="stripe", currency="USD").observe(0.234)

Choose buckets deliberately

Default Prometheus buckets ([.005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5, 10]) work for most web APIs. For slower operations (video processing, ML inference, batch jobs), shift your buckets to the right. For very fast operations (cache hits, in-memory lookups), add sub-millisecond buckets.

Counters¶

Python

from prometheus_client import Counter

cache_ops = Counter(
    "cache_operations_total",
    "Cache operation counts",
    labelnames=["operation", "result"],   # operation=get/set/delete, result=hit/miss/error
)

cache_ops.labels(operation="get", result="hit").inc()
cache_ops.labels(operation="get", result="miss").inc()

Gauges¶

Python

from prometheus_client import Gauge

active_websockets = Gauge(
    "active_websocket_connections",
    "Number of currently open WebSocket connections",
    labelnames=["tenant"],
)

# In connection handler:
active_websockets.labels(tenant=tenant_id).inc()
# In disconnect handler:
active_websockets.labels(tenant=tenant_id).dec()

Info and Enum¶

Python

from prometheus_client import Info, Enum

# Build-time metadata (version, git SHA)
build_info = Info("myapp_build", "Build information")
build_info.info({"version": "2.0.0", "git_sha": "abc1234", "python": "3.12"})

# State machine state
circuit_state = Enum(
    "payment_circuit_state",
    "Circuit breaker state",
    states=["closed", "open", "half_open"],
)
circuit_state.state("closed")

Exemplars: Linking Metrics to Traces¶

An exemplar is a sample data point stored alongside a Prometheus histogram or counter that carries a trace ID. When you click a metric data point in Grafana, the exemplar takes you directly to the trace for that specific request.

Python

from obskit.metrics import observe_with_exemplar, get_trace_exemplar
from prometheus_client import Histogram

request_duration = Histogram(
    "http_request_duration_seconds",
    "HTTP request duration",
    labelnames=["method", "endpoint"],
)

def handle_request(method, endpoint, duration):
    # observe_with_exemplar automatically attaches the current OTel trace_id
    observe_with_exemplar(
        histogram=request_duration.labels(method=method, endpoint=endpoint),
        value=duration,
    )

You can also retrieve the exemplar dict manually for custom use:

Python

exemplar = get_trace_exemplar()
# Returns: {"trace_id": "4bf92f3577b34da6a3ce929d0e0e4736", "span_id": "00f067aa0ba902b7"}
# Returns: {} when no OTel span is active

OpenMetrics required

Exemplars are part of the OpenMetrics format, not classic Prometheus text format. Grafana's Prometheus data source supports exemplars when you enable the OpenMetrics scrape format in Prometheus (--enable-feature=exemplar-storage).

Prometheus configuration for exemplars¶

YAML

# prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: "myapp"
    static_configs:
      - targets: ["myapp:8000"]
    # Enable OpenMetrics to receive exemplars
    scrape_protocols:
      - OpenMetricsText1.0.0
      - PrometheusText0.0.4

Cardinality Management¶

Why Cardinality Matters¶

Every unique combination of label values creates a separate time series in Prometheus. A metric with three labels, each with 100 possible values, creates up to 1,000,000 time series. This is called the cardinality explosion problem and causes:

Prometheus memory exhaustion (each series uses ~3–5 KB of RAM)
Slow query performance
Expensive storage

graph TD
    Metric["http_requests_total"]
    L1["method\n(GET, POST, PUT, DELETE)\n4 values"]
    L2["endpoint\n(/users, /orders, /items)\n3 values"]
    L3["status\n(200, 400, 404, 500)\n4 values"]
    Series["4 × 3 × 4 = 48 series\n(manageable)"]

    Metric --> L1
    Metric --> L2
    Metric --> L3
    L1 --> Series
    L2 --> Series
    L3 --> Series

The danger is unbounded label values: - User IDs, session IDs, email addresses as labels → millions of series - Raw URL paths (e.g., /users/12345/orders/67890) → unbounded

CardinalityGuard¶

CardinalityGuard enforces a maximum number of distinct label combinations and drops metrics that would exceed the limit:

Python

from obskit.metrics.cardinality import CardinalityGuard

guard = CardinalityGuard(max_series=1000)

# Safe to use:
guard.observe("http_requests_total", labels={"method": "GET", "endpoint": "/users"})

# Will be rejected (silently dropped + warning logged) if it would push
# the series count over 1000:
guard.observe("http_requests_total", labels={"method": "GET", "endpoint": user_id})

Best Practices for Low Cardinality¶

Python

# BAD: user_id in labels → millions of series
counter.labels(user_id="u_abc123", endpoint="/cart").inc()

# GOOD: user_id in log context, not metric label
log.info("cart.viewed", user_id="u_abc123")
counter.labels(endpoint="/cart").inc()

# BAD: raw URL path
counter.labels(path=request.url.path).inc()  # /users/12345 creates a series per user

# GOOD: route pattern from framework router
counter.labels(path=request.route.path).inc()  # /users/{user_id} — fixed cardinality

OTLP Export¶

In addition to Prometheus scraping, obskit can push metrics to any OTLP-compatible backend (Grafana Mimir, Tempo, Victoria Metrics, etc.):

Python

from obskit.metrics.otlp import configure_otlp_metrics

configure_otlp_metrics(
    endpoint="http://otel-collector:4317",
    export_interval=15,   # Push every 15 seconds
    resource_attributes={
        "service.name": "payment-service",
        "service.version": "2.0.0",
        "deployment.environment": "production",
    },
)

The OTLP exporter runs in a background thread and does not block your application.

Prometheus Output Example¶

Here is what obskit metrics look like in Prometheus text format:

Text Only

# HELP myapp_requests_total Total number of requests
# TYPE myapp_requests_total counter
myapp_requests_total{endpoint="/charge",method="POST",service="payment-service",status_code="200"} 14823.0
myapp_requests_total{endpoint="/charge",method="POST",service="payment-service",status_code="500"} 42.0
myapp_requests_total{endpoint="/refund",method="POST",service="payment-service",status_code="200"} 891.0

# HELP myapp_request_duration_seconds Request duration in seconds
# TYPE myapp_request_duration_seconds histogram
myapp_request_duration_seconds_bucket{endpoint="/charge",le="0.05",...} 8123.0
myapp_request_duration_seconds_bucket{endpoint="/charge",le="0.1",...}  12904.0
myapp_request_duration_seconds_bucket{endpoint="/charge",le="0.25",...} 14201.0
myapp_request_duration_seconds_bucket{endpoint="/charge",le="+Inf",...} 14823.0
myapp_request_duration_seconds_sum{endpoint="/charge",...} 897.341
myapp_request_duration_seconds_count{endpoint="/charge",...} 14823.0

# HELP myapp_errors_total Total number of errors
# TYPE myapp_errors_total counter
myapp_errors_total{endpoint="/charge",error_type="PaymentGatewayTimeout",method="POST",...} 38.0

Business Metrics¶

Beyond technical signals, track business-level metrics that reflect the health of your product:

Python

from prometheus_client import Counter, Gauge

# Revenue processed
revenue_processed = Counter(
    "revenue_processed_usd_cents_total",
    "Total revenue processed in USD cents",
    labelnames=["payment_method"],
)

# Active subscriptions (from a background sync)
active_subscriptions = Gauge(
    "active_subscriptions",
    "Number of active subscriptions",
    labelnames=["plan"],
)

# Conversion funnel
checkout_funnel = Counter(
    "checkout_funnel_events_total",
    "Checkout funnel events",
    labelnames=["step", "outcome"],  # step: cart/address/payment/confirm, outcome: proceed/abandon
)

Business metrics in SLOs

Business metrics make excellent SLO signals. "99.9% of payments must succeed" is more meaningful to stakeholders than "99.9% of HTTP requests must return 2xx".

Best Practices Summary¶

Practice	Why
Use `snake_case` metric names	Prometheus convention; `_total` suffix for counters
Bound all label cardinality	Unbounded labels → Prometheus OOM
Use route patterns, not raw URLs	`/users/{id}` not `/users/12345`
Separate success/error histograms	Timeouts skew latency percentiles
Add `service.name` to all metrics	Essential for cross-service dashboards
Use exemplars on latency histograms	Connects metric anomalies directly to traces
Record business metrics alongside technical ones	Bridges engineering and product/business
Use `CardinalityGuard` for dynamic systems	Prevents runaway series from bad inputs

Metrics¶

Quick Start¶

RED Method¶

REDMetrics API¶

record_request()¶

record_error()¶

FastAPI integration¶

prometheus_client Integration¶

Histograms¶

Counters¶

Gauges¶

Info and Enum¶

Exemplars: Linking Metrics to Traces¶

Prometheus configuration for exemplars¶

Cardinality Management¶

Why Cardinality Matters¶

CardinalityGuard¶

Best Practices for Low Cardinality¶

OTLP Export¶

Prometheus Output Example¶

Business Metrics¶

Best Practices Summary¶

`record_request()`¶

`record_error()`¶