Metrics Standards
Quantitative metrics to Prometheus/Mimir and Datadog using CustomMetricsService.
Metric Types
Counter - Only increases (requests, errors, events) Gauge - Can go up or down (connections, queue depth, memory) Histogram - Distribution over time (latency, sizes) Summary - Percentiles over sliding window (p50, p95, p99)
Basic Usage
import { Injectable } from '@nestjs/common';
import { CustomMetricsService } from '../../common/metrics/metrics.service';
@Injectable()
export class UserService {
constructor(private readonly metricsService: CustomMetricsService) {}
async createUser(values: UserCreateInput): Promise<User> {
const startTime = Date.now();
try {
const user = await this.repository.save(values);
// Increment success counter
this.metricsService.incrementCounter('users_created_total', {
status: 'success',
});
// Record duration
const duration = (Date.now() - startTime) / 1000;
this.metricsService.recordHistogram('user_creation_duration_seconds', duration);
return this.toUser(user);
} catch (error) {
// Count errors
this.metricsService.incrementCounter('users_created_total', {
status: 'error',
});
throw error;
}
}
}
Methods
incrementCounter(name, labels, value) Increment a counter metric.
this.metricsService.incrementCounter('operations_total', {
operation: 'create',
status: 'success',
});
setGauge(name, value, labels) Set a gauge to specific value.
this.metricsService.setGauge('active_connections', 42, {
pool: 'database',
});
incrementGauge(name, value, labels) Increase a gauge value.
decrementGauge(name, value, labels) Decrease a gauge value.
recordHistogram(name, value, labels) Record observation in histogram.
const duration = (Date.now() - startTime) / 1000;
this.metricsService.recordHistogram('operation_duration_seconds', duration, {
operation: 'query',
});
recordSummary(name, value, labels) Record observation in summary for percentiles.
Naming Conventions
Format: {domain}_{metric}_{unit}
Examples:
http_requests_total- Total HTTP requests (counter)http_request_duration_seconds- Request duration (histogram)active_connections- Current connections (gauge)database_query_duration_seconds- Query time (histogram)
Units:
- Seconds:
_seconds - Bytes:
_bytes - Counts:
_total(counter) or no suffix (gauge) - Ratios:
_ratio(0.0-1.0)
Labels
Use labels to add dimensions to metrics.
this.metricsService.incrementCounter('api_requests_total', {
method: 'POST',
route: '/v1/users',
status: '201',
});
Common labels:
method- HTTP method (GET, POST, etc.)route- API routestatus- HTTP status code or operation statusoperation- Operation type (create, update, delete)error_type- Error class name
Anti-Patterns
❌ Don't use high cardinality labels Labels create new time series. Avoid user IDs, timestamps, or unbounded values.
// Bad - creates unlimited time series
this.metricsService.incrementCounter('requests_total', {
user_id: userId, // WRONG - unbounded
timestamp: Date.now().toString(), // WRONG - unique every time
});
// Good - use bounded label values
this.metricsService.incrementCounter('requests_total', {
method: 'GET', // Limited values
status: '200', // Limited values
});
❌ Don't forget to track errors Count both successes and failures.
// Bad - only counts successes
this.metricsService.incrementCounter('operations_total');
// Good - label success/error
try {
await operation();
this.metricsService.incrementCounter('operations_total', { status: 'success' });
} catch (error) {
this.metricsService.incrementCounter('operations_total', { status: 'error' });
throw error;
}
❌ Don't record durations in milliseconds Use seconds for duration metrics.
// Bad - milliseconds
this.metricsService.recordHistogram('duration_ms', Date.now() - start);
// Good - seconds
const duration = (Date.now() - start) / 1000;
this.metricsService.recordHistogram('duration_seconds', duration);
❌ Don't use counters for values that decrease Use gauges for values that go up and down.
// Bad - counter for connections
this.metricsService.incrementCounter('active_connections'); // Can't decrease!
// Good - gauge for connections
this.metricsService.setGauge('active_connections', count);
Other mistakes:
- ❌ Not tracking operation duration
- ❌ Missing error metrics
- ❌ Inconsistent naming conventions
- ❌ Not exposing /metrics endpoint