Coder logo

Prometheus integration

The Prometheus integration enables you to query and visualize Coder's platform metrics.

Requirements

  • A Coder deployment on Kubernetes
  • Prometheus Operator installed on your cluster

Configuration

Coder sends Prometheus-formatted metrics to port 2112 on the coderd container. Use the below PodMonitor resource to connect the Prometheus Operator to this endpoint:

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: master-monitor
  namespace: coder
spec:
  selector:
    matchLabels:
      app.kubernetes.io/component: coderd
  podMetricsEndpoints:
    - port: prom-coderd

Workspace Metrics

Each coder workspace has an agent that connects to a single coderd instance. Each coderd instance will include all metrics from the workspaces it manages. The workspace metrics will all look like this:

coderd_workspace_<workspace_metric_name>{user_id="<user_id>",workspace_id="<workspace_id>"}

Due to the nature of workspace ids, this produces a high cardinality of metric labels. This could be problematic for some configurations. If specific workspace metrics are not of interest, or are causing issues, you can configure your metric scraping service to drop these metrics.

Note that if a workspace connects to a new coderd (rebuild, network issue, coder update, etc), the metrics for that workspace will be moved to the new coderd metrics endpoint. The labels on the new metrics will likely have the new coderd pod name. So when tracking a singular workspace, you should track only by workspace_id throughout the lifetime of the workspace until it is deleted.

Drop workspace metrics config

Prometheus Documentation about relabelling metrics. In this case we will drop all metrics that contain the workspace_id label.

metric_relabel_configs:
  - source_labels: ["workspace_id"]
    action: drop

In Prometheus Operator we can pass this config addition to our coderd PodMonitor spec.

apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
  name: master-monitor
  namespace: coder
spec:
  selector:
    matchLabels:
      app.kubernetes.io/component: coderd
  podMetricsEndpoints:
    - port: prom-coderd
      relabelings:
        - action: drop
          sourceLabels:
            - workspace_id

Coderd Metrics

Below is a list of the various metrics emitted by Coder's Prometheus endpoint:

MetricTypeDescription
coderd_agent_aggregator_agent_push_backloggaugeTotal number of agent metric bundles waiting to be processed.
coderd_agent_aggregator_collect_backloggaugeTotal amount of gathers waiting to collect metrics.
coderd_agent_aggregator_collect_nanosecondssummaryTime taken to collect all metrics.
coderd_agent_aggregator_count_totalgaugeTotal number of agent metrics being reported by this coderd.
coderd_agent_aggregator_delete_backloggaugeTotal number of agents waiting to be deleted in aggregator.
coderd_agent_aggregator_workspace_count_totalgaugeTotal number of workspace agents pushing metrics to this coderd.
coderd_api_concurrent_requestsgaugeThe total number of concurrent API requests
coderd_api_concurrent_websocketsgaugeThe total number of concurrent API websockets
coderd_api_request_latencies_mshistogramLatency distribution of requests in milliseconds
coderd_api_requests_processed_totalcounterThe total number of processed API requests
coderd_api_websocket_durations_mshistogramWebsocket duration distribution of requests in milliseconds
coderd_background_workspace_build_duration_shistogramDuration distribution of workspace builds in seconds
coderd_backgroundjob_completed_totalcounterTotal number of jobs completed since startup.
coderd_backgroundjob_current_enqueued_jobsgaugeCurrent number of enqueued and not started background jobs.
coderd_backgroundjob_enqueue_time_secondshistogramHistogram of total time taken by job type to transition from Enqueue to Running.
coderd_backgroundjob_enqueued_totalcounterTotal number of jobs enqueued.
coderd_backgroundjob_execution_time_secondshistogramHistogram of total time taken by job type to transition from Running to Completed.
coderd_backgroundjob_started_totalcounterTotal number of jobs started.
coderd_db_sql_queries_executed_totalcounterThe total number of executed SQL queries
coderd_db_sql_query_latencies_mshistogramLatency distribution of SQL queries in milliseconds
coderd_license_expires_at_unixgaugeUnix timestamp of the license expiry date.
coderd_license_issued_at_unixgaugeUnix timestamp of the license issue date.
coderd_license_time_until_expires_daysgaugeNumber of days until the license expires.
coderd_license_user_countgaugeNumber of active (non-dormant) users.
coderd_license_user_limitgaugeNumber of users allowed by the license.
coderd_rtc_agent_listeners_concurrentgaugeThe total number of concurrent RTC agent listener websockets.
coderd_rtc_client_connections_totalcounterThe total number of RTC client connections.
coderd_rtc_turn_connections_concurrentgaugeThe number of concurrent TURN connections.
coderd_rtc_turn_connections_totalcounterThe total number of TURN connections opened.
coderd_rtc_workspace_connections_currentgaugeThe number of concurrent wsnet workspace connections.
coderd_rtc_workspace_connections_totalcounterThe total number of wsnet workspace connections opened.
go_gc_cycles_automatic_gc_cycles_totalcounterCount of completed GC cycles generated by the Go runtime.
go_gc_cycles_forced_gc_cycles_totalcounterCount of completed GC cycles forced by the application.
go_gc_cycles_total_gc_cycles_totalcounterCount of all completed GC cycles.
go_gc_duration_secondssummaryA summary of the pause duration of garbage collection cycles.
go_gc_heap_allocs_by_size_byteshistogramDistribution of heap allocations by approximate size. Note that this does not include tiny objects as defined by /gc/heap/tiny/allocs:objects, only tiny blocks.
go_gc_heap_allocs_bytes_totalcounterCumulative sum of memory allocated to the heap by the application.
go_gc_heap_allocs_objects_totalcounterCumulative count of heap allocations triggered by the application. Note that this does not include tiny objects as defined by /gc/heap/tiny/allocs:objects, only tiny blocks.
go_gc_heap_frees_by_size_byteshistogramDistribution of freed heap allocations by approximate size. Note that this does not include tiny objects as defined by /gc/heap/tiny/allocs:objects, only tiny blocks.
go_gc_heap_frees_bytes_totalcounterCumulative sum of heap memory freed by the garbage collector.
go_gc_heap_frees_objects_totalcounterCumulative count of heap allocations whose storage was freed by the garbage collector. Note that this does not include tiny objects as defined by /gc/heap/tiny/allocs:objects, only tiny blocks.
go_gc_heap_goal_bytesgaugeHeap size target for the end of the GC cycle.
go_gc_heap_objects_objectsgaugeNumber of objects, live or unswept, occupying heap memory.
go_gc_heap_tiny_allocs_objects_totalcounterCount of small allocations that are packed together into blocks. These allocations are counted separately from other allocations because each individual allocation is not tracked by the runtime, only their block. Each block is already accounted for in allocs-by-size and frees-by-size.
go_gc_pauses_secondshistogramDistribution individual GC-related stop-the-world pause latencies.
go_goroutinesgaugeNumber of goroutines that currently exist.
go_infogaugeInformation about the Go environment.
go_memory_classes_heap_free_bytesgaugeMemory that is completely free and eligible to be returned to the underlying system, but has not been. This metric is the runtime's estimate of free address space that is backed by physical memory.
go_memory_classes_heap_objects_bytesgaugeMemory occupied by live objects and dead objects that have not yet been marked free by the garbage collector.
go_memory_classes_heap_released_bytesgaugeMemory that is completely free and has been returned to the underlying system. This metric is the runtime's estimate of free address space that is still mapped into the process, but is not backed by physical memory.
go_memory_classes_heap_stacks_bytesgaugeMemory allocated from the heap that is reserved for stack space, whether or not it is currently in-use.
go_memory_classes_heap_unused_bytesgaugeMemory that is reserved for heap objects but is not currently used to hold heap objects.
go_memory_classes_metadata_mcache_free_bytesgaugeMemory that is reserved for runtime mcache structures, but not in-use.
go_memory_classes_metadata_mcache_inuse_bytesgaugeMemory that is occupied by runtime mcache structures that are currently being used.
go_memory_classes_metadata_mspan_free_bytesgaugeMemory that is reserved for runtime mspan structures, but not in-use.
go_memory_classes_metadata_mspan_inuse_bytesgaugeMemory that is occupied by runtime mspan structures that are currently being used.
go_memory_classes_metadata_other_bytesgaugeMemory that is reserved for or used to hold runtime metadata.
go_memory_classes_os_stacks_bytesgaugeStack memory allocated by the underlying operating system.
go_memory_classes_other_bytesgaugeMemory used by execution trace buffers, structures for debugging the runtime, finalizer and profiler specials, and more.
go_memory_classes_profiling_buckets_bytesgaugeMemory that is used by the stack trace hash map used for profiling.
go_memory_classes_total_bytesgaugeAll memory mapped by the Go runtime into the current process as read-write. Note that this does not include memory mapped by code called via cgo or via the syscall package. Sum of all metrics in /memory/classes.
go_memstats_alloc_bytesgaugeNumber of bytes allocated and still in use.
go_memstats_alloc_bytes_totalcounterTotal number of bytes allocated, even if freed.
go_memstats_buck_hash_sys_bytesgaugeNumber of bytes used by the profiling bucket hash table.
go_memstats_frees_totalcounterTotal number of frees.
go_memstats_gc_sys_bytesgaugeNumber of bytes used for garbage collection system metadata.
go_memstats_heap_alloc_bytesgaugeNumber of heap bytes allocated and still in use.
go_memstats_heap_idle_bytesgaugeNumber of heap bytes waiting to be used.
go_memstats_heap_inuse_bytesgaugeNumber of heap bytes that are in use.
go_memstats_heap_objectsgaugeNumber of allocated objects.
go_memstats_heap_released_bytesgaugeNumber of heap bytes released to OS.
go_memstats_heap_sys_bytesgaugeNumber of heap bytes obtained from system.
go_memstats_last_gc_time_secondsgaugeNumber of seconds since 1970 of last garbage collection.
go_memstats_lookups_totalcounterTotal number of pointer lookups.
go_memstats_mallocs_totalcounterTotal number of mallocs.
go_memstats_mcache_inuse_bytesgaugeNumber of bytes in use by mcache structures.
go_memstats_mcache_sys_bytesgaugeNumber of bytes used for mcache structures obtained from system.
go_memstats_mspan_inuse_bytesgaugeNumber of bytes in use by mspan structures.
go_memstats_mspan_sys_bytesgaugeNumber of bytes used for mspan structures obtained from system.
go_memstats_next_gc_bytesgaugeNumber of heap bytes when next garbage collection will take place.
go_memstats_other_sys_bytesgaugeNumber of bytes used for other system allocations.
go_memstats_stack_inuse_bytesgaugeNumber of bytes in use by the stack allocator.
go_memstats_stack_sys_bytesgaugeNumber of bytes obtained from system for stack allocator.
go_memstats_sys_bytesgaugeNumber of bytes obtained from system.
go_sched_goroutines_goroutinesgaugeCount of live goroutines.
go_sched_latencies_secondshistogramDistribution of the time goroutines have spent in the scheduler in a runnable state before actually running.
go_sql_idle_connectionsgaugeThe number of idle connections.
go_sql_in_use_connectionsgaugeThe number of connections currently in use.
go_sql_max_idle_closed_totalcounterThe total number of connections closed due to SetMaxIdleConns.
go_sql_max_idle_time_closed_totalcounterThe total number of connections closed due to SetConnMaxIdleTime.
go_sql_max_lifetime_closed_totalcounterThe total number of connections closed due to SetConnMaxLifetime.
go_sql_max_open_connectionsgaugeMaximum number of open connections to the database.
go_sql_open_connectionsgaugeThe number of established connections both in use and idle.
go_sql_wait_count_totalcounterThe total number of connections waited for.
go_sql_wait_duration_seconds_totalcounterThe total time blocked waiting for a new connection.
go_threadsgaugeNumber of OS threads created.
process_cpu_seconds_totalcounterTotal user and system CPU time spent in seconds.
process_max_fdsgaugeMaximum number of open file descriptors.
process_open_fdsgaugeNumber of open file descriptors.
process_resident_memory_bytesgaugeResident memory size in bytes.
process_start_time_secondsgaugeStart time of the process since unix epoch in seconds.
process_virtual_memory_bytesgaugeVirtual memory size in bytes.
process_virtual_memory_max_bytesgaugeMaximum amount of virtual memory available in bytes.
promhttp_metric_handler_requests_in_flightgaugeCurrent number of scrapes being served.
promhttp_metric_handler_requests_totalcounterTotal number of scrapes by HTTP status code.