Skip to main content
Skip to main content
Edit this page

Monitoring your ClickHouse Cloud deployment

Overview

This guide provides enterprise teams with information on monitoring and observability capabilities for production deployments of ClickHouse Cloud. Enterprise customers frequently ask about out-of-the-box monitoring features, integration with existing observability stacks including tools like Datadog and AWS CloudWatch, and how ClickHouse's monitoring compares to self-hosted deployments.

Users can use the following methods to monitor their ClickHouse deployment:

SectionDescriptionWakes idle services?Setup required
Cloud Console dashboardsDay-to-day monitoring with built-in dashboards for service health, resource utilization, and query performanceNoNone
NotificationsAlerts for scaling events, errors, mutations, and billingNoNone (customizable)
Prometheus endpointExport metrics to Grafana, Datadog, or other Prometheus-compatible toolsNoAPI key + scraper config
System table queriesDeep debugging and custom analysis via direct SQL queries against system tablesYesSQL queries
Community and partner integrationsDatadog agent integration, community monitoring tools, and the Billing & Usage APIVariesTool-specific
Advanced dashboard referenceDetailed reference for each advanced dashboard visualization, including troubleshooting examplesNoNone

Quick start

Open the ClickHouse Cloud console to the Monitoring tab. This blog captures common things to watch out for when getting started.

For most users, the Cloud Console dashboards provide everything needed to monitor service health, resource utilization, and query performance without any configuration. If you need to integrate with an external monitoring stack, start with the Prometheus-compatible metrics endpoint.

System impact considerations

The above approaches use a mixture of either relying on Prometheus endpoints, being managed by ClickHouse Cloud, or querying system tables directly. The latter of these options relies on querying the production ClickHouse service, which adds query load to the system under observation and prevents ClickHouse Cloud instances from idling which can impact costs. Additionally, if the production system fails, monitoring may also be affected, since the two are coupled.

Querying system tables directly works well for deep introspection and debugging but is less appropriate for real-time production monitoring. The Cloud Console dashboards and the Prometheus endpoint both use pre-scraped metrics that do not wake idle services, making them better suited for ongoing production monitoring. Consider these trade-offs between detailed system analysis capabilities and operational overhead.