Integrate TiDB Cloud with Prometheus and Grafana
TiDB Cloud provides a Prometheus API endpoint. If you have a Prometheus service, you can monitor key metrics of TiDB Cloud from the endpoint easily.
This document describes how to configure your Prometheus service to read key metrics from the TiDB Cloud endpoint and how to view the metrics using Grafana.
Prerequisites
To integrate TiDB Cloud with Prometheus, you must have a self-hosted or managed Prometheus service.
To edit third-party integration settings of TiDB Cloud, you must have the
Organization Owner
access to your organization orProject Member
access to the target project in TiDB Cloud.
Limitation
You cannot use the Prometheus and Grafana integration in Serverless Tier.
Steps
Step 1. Get a scrape_config file for Prometheus
Before configuring your Prometheus service to read metrics of TiDB Cloud, you need to generate a scrape_config YAML file in TiDB Cloud first. The scrape_config file contains a unique bearer token that allows the Prometheus service to monitor any database clusters in the current project.
To get the scrape_config file for Prometheus, do the following:
Log in to the TiDB Cloud console.
In the left navigation pane of the Clusters page, do one of the following:
- If you have multiple projects, switch to the target project, and then click Admin > Integrations.
- If you only have one project, click Admin > Integrations.
Click Integration to Prometheus.
Click Add File to generate and show the scrape_config file for the current project.
Make a copy of the scrape_config file content for later use.
Step 2. Integrate with Prometheus
In the monitoring directory specified by your Prometheus service, locate the Prometheus configuration file.
For example,
/etc/prometheus/prometheus.yml
.In the Prometheus configuration file, locate the
scrape_configs
section, and then copy the scrape_config file content obtained from TiDB Cloud to the section.In your Prometheus service, check Status > Targets to confirm that the new scrape_config file has been read. If not, you might need to restart the Prometheus service.
Step 3. Use Grafana GUI dashboards to visualize the metrics
After your Prometheus service is reading metrics from TiDB Cloud, you can use Grafana GUI dashboards to visualize the metrics as follows:
- Download the Grafana dashboard JSON of TiDB Cloud here.
- Import this JSON to your own Grafana GUI to visualize the metrics.
- (Optional) Customize the dashboard as needed by adding or removing panels, changing data sources, and modifying display options.
For more information about how to use Grafana, see Grafana documentation.
Best practice of rotating scrape_config
To improve data security, it is a general best practice to periodically rotate scrape_config file bearer tokens.
- Follow Step 1 to create a new scrape_config file for Prometheus.
- Add the content of the new file to your Prometheus configuration file.
- Once you have confirmed that your Prometheus service is still able to read from TiDB Cloud, remove the content of the old scrape_config file from your Prometheus configuration file.
- On the Integration page of your project, delete the corresponding old scrape_config file to block anyone else from using it to read from the TiDB Cloud Prometheus endpoint.
Metrics available to Prometheus
Prometheus tracks the following metric data for your TiDB clusters.
Metric name | Metric type | Labels | Description |
---|---|---|---|
tidbcloud_db_queries_total | count | sql_type: Select\|Insert\|... cluster_name: <cluster name> instance: tidb-0\|tidb-1… component: tidb | The total number of statements executed |
tidbcloud_db_failed_queries_total | count | type: planner:xxx\|executor:2345\|... cluster_name: <cluster name> instance: tidb-0\|tidb-1… component: tidb | The total number of execution errors |
tidbcloud_db_connections | gauge | cluster_name: <cluster name> instance: tidb-0\|tidb-1… component: tidb | Current number of connections in your TiDB server |
tidbcloud_db_query_duration_seconds | histogram | sql_type: Select\|Insert\|... cluster_name: <cluster name> instance: tidb-0\|tidb-1… component: tidb | The duration histogram of statements |
tidbcloud_node_storage_used_bytes | gauge | cluster_name: <cluster name> instance: tikv-0\|tikv-1…\|tiflash-0\|tiflash-1… component: tikv\|tiflash | The disk usage bytes of TiKV/TiFlash nodes |
tidbcloud_node_storage_capacity_bytes | gauge | cluster_name: <cluster name> instance: tikv-0\|tikv-1…\|tiflash-0\|tiflash-1… component: tikv\|tiflash | The disk capacity bytes of TiKV/TiFlash nodes |
tidbcloud_node_cpu_seconds_total | count | cluster_name: <cluster name> instance: tidb-0\|tidb-1…\|tikv-0…\|tiflash-0… component: tidb\|tikv\|tiflash | The CPU usage of TiDB/TiKV/TiFlash nodes |
tidbcloud_node_cpu_capacity_cores | gauge | cluster_name: <cluster name> instance: tidb-0\|tidb-1…\|tikv-0…\|tiflash-0… component: tidb\|tikv\|tiflash | The CPU limit cores of TiDB/TiKV/TiFlash nodes |
tidbcloud_node_memory_used_bytes | gauge | cluster_name: <cluster name> instance: tidb-0\|tidb-1…\|tikv-0…\|tiflash-0… component: tidb\|tikv\|tiflash | The used memory bytes of TiDB/TiKV/TiFlash nodes |
tidbcloud_node_memory_capacity_bytes | gauge | cluster_name: <cluster name> instance: tidb-0\|tidb-1…\|tikv-0…\|tiflash-0… component: tidb\|tikv\|tiflash | The memory capacity bytes of TiDB/TiKV/TiFlash nodes |