Migrate from MySQL-Compatible Databases to TiDB Cloud Using AWS DMS
If you want to migrate heterogeneous databases, such as PostgreSQL, Oracle, and SQL Server to TiDB Cloud, it is recommended to use AWS Database Migration Service (AWS DMS).
AWS DMS is a cloud service that makes it easy to migrate relational databases, data warehouses, NoSQL databases, and other types of data stores. You can use AWS DMS to migrate your data into TiDB Cloud.
This document uses Amazon RDS as an example to show how to migrate data to TiDB Cloud using AWS DMS. The procedure also applies to migrating data from self-hosted MySQL databases or Amazon Aurora to TiDB Cloud.
In this example, the data source is Amazon RDS, and the data destination is a Dedicated Tier cluster in TiDB Cloud. Both upstream and downstream databases are in the same region.
Prerequisites
Before you start the migration, make sure you have read the following:
- If the source database is Amazon RDS or Amazon Aurora, you need to set the
binlog_format
parameter toROW
. If the database uses the default parameter group, thebinlog_format
parameter isMIXED
by default and cannot be modified. In this case, you need to create a new parameter group, for examplenewset
, and set itsbinlog_format
toROW
. Then, modify the default parameter group tonewset
. Note that modifying the parameter group will restart the database. - Check and ensure that the source database uses collations that are compatible with TiDB. The default collation for the utf8mb4 character set in TiDB is
utf8mb4_bin
. But in MySQL 8.0, the default collation isutf8mb4_0900_ai_ci
. If the upstream MySQL uses the default collation, because TiDB is not compatible withutf8mb4_0900_ai_ci
, AWS DMS cannot create the target tables in TiDB and cannot migrate the data. To resolve this problem, you need to modify the collation of the source database toutf8mb4_bin
before the migration. For a complete list of TiDB supported character sets and collations, see Character Set and Collation. - TiDB contains the following system databases by default:
INFORMATION_SCHEMA
,PERFORMANCE_SCHEMA
,mysql
,sys
, andtest
. When you create an AWS DMS migration task, you need to filter out these system databases instead of using the default%
to select the migration object. Otherwise, AWS DMS will try to migrate these system databases from the source database to the target TiDB, which will cause the task to fail. To avoid this issue, it is recommended to fill in the specific database and table names. - Add the public and private network IP addresses of AWS DMS to the IP access lists of both source and target databases. Otherwise, the network connection might fail in some scenarios.
- Use VPC Peerings or Private Endpoint connections to connect AWS DMS and the TiDB cluster.
- It is recommended to use the same region for AWS DMS and the TiDB cluster to get better data writing performance.
- It is recommended to use AWS DMS
dms.t3.large
(2 vCPUs and 8 GiB memory) or a higher instance class. Small instance classes will possibly cause out of memory (OOM) errors. - AWS DMS will automatically create the
awsdms_control
database in the target database.
Limitation
AWS DMS does not support replicating DROP TABLE
.
Step 1. Create an AWS DMS replication instance
Go to the Replication instances page in the AWS DMS console, and switch to the corresponding region. It is recommended to use the same region for AWS DMS as TiDB Cloud. In this document, the upstream and downstream databases and the DMS instance are all in the us-west-2 region.
Click Create replication instance.
Fill in an instance name, ARN, and description.
Fill in the instance configuration:
- Instance class: select an appropriate instance class. It is recommended to use
dms.t3.large
or a higher instance class to get better performance. - Engine version: use the default configuration.
- Multi-AZ: select Single-AZ or Multi-AZ based on your business needs.
- Instance class: select an appropriate instance class. It is recommended to use
Configure the storage in the Allocated storage (GiB) field. Use the default configuration.
Configure connectivity and security.
- Network type - new: select IPv4.
- Virtual private cloud (VPC) for IPv4: select the VPC that you need. It is recommended to use the same VPC as the upstream database to simplify the network configuration.
- Replication subnet group: choose a subnet group for your replication instance.
- Public accessible: use the default configuration.
Configure the Advanced settings, Maintenance, and Tags if needed. Click Create replication instance to finish the instance creation.
Step 2. Create the source database endpoint
In the AWS DMS console, click the replication instance that you just created. Copy the public and private network IP addresses as shown in the following screenshot.
Configure the security group rules for Amazon RDS. In this example, add the public and private IP addresses of the AWS DMS instance to the security group.
Click Create endpoint to create the source database endpoint.
In this example, click Select RDS DB instance and then select the source RDS instance. If the source database is a self-hosted MySQL, you can skip this step and fill in the information in the following steps.
Configure the following information:
Endpoint identifier: create a label for the source endpoint to help you identify it in the subsequent task configuration.
Descriptive Amazon Resource Name (ARN) - optional: create a friendly name for the default DMS ARN.
Source engine: select MySQL.
Access to endpoint database: select Provide access information manually.
Server name: fill in the name of the data server for the data provider. You can copy it from the database console. If the upstream is Amazon RDS or Amazon Aurora, the name will be automatically filled in. If it is a self-hosted MySQL without a domain name, you can fill in the IP address.
Fill in the source database Port, Username, and Password.
Secure Socket Layer (SSL) mode: you can enable SSL mode as needed.
Use default values for Endpoint settings, KMS key, and Tags. In the Test endpoint connection (optional) section, it is recommended to select the same VPC as the source database to simplify the network configuration. Select the corresponding replication instance, and then click Run test. The status needs to be successful.
Click Create endpoint.
Step 3. Create the target database endpoint
In the AWS DMS console, click the replication instance that you just created. Copy the public and private network IP addresses as shown in the following screenshot.
In the TiDB Cloud console, go to the Clusters page, click the name of your target cluster, and then click Connect in the upper-right corner to get the TiDB Cloud database connection information.
Under Step 1: Create traffic filter in the dialog, click Edit, enter the public and private network IP addresses that you copied from the AWS DMS console, and then click Update Filter. It is recommended to add the public IP address and private IP address of the AWS DMS replication instance to the TiDB cluster traffic filter at the same time. Otherwise, AWS DMS might not be able to connect to the TiDB cluster in some scenarios.
Click Download TiDB cluster CA to download the CA certificate. Under Step 3: Connect with a SQL client in the dialog, take a note of the
-u
,-h
, and-P
information in the connection string for later use.Click the VPC Peering tab in the dialog, and then click Add under Step 1: Set up VPC to create a VPC Peering connection for the TiDB cluster and AWS DMS.
Configure the corresponding information. See Set Up VPC Peering Connections.
Configure the target endpoint for the TiDB cluster.
- Endpoint type: select Target endpoint.
- Endpoint identifier: fill in a name for the endpoint.
- Descriptive Amazon Resource Name (ARN) - optional: create a friendly name for the default DMS ARN.
- Target engine: select MySQL.
In the AWS DMS console, click Create endpoint to create the target database endpoint, and then configure the following information:
- Server name: fill in the hostname of your TiDB cluster, which is the
-h
information you have recorded. - Port: enter the port of your TiDB cluster, which is the
-P
information you have recorded. The default port of a TiDB cluster is 4000. - User name: enter the user name of your TiDB cluster, which is the
-u
information you have recorded. - Password: enter the password of your TiDB cluster.
- Secure Socket Layer (SSL) mode: select Verify-ca.
- Click Add new CA certificate to import the CA file downloaded from the TiDB Cloud console in the previous steps.
- Server name: fill in the hostname of your TiDB cluster, which is the
Import the CA file.
Use the default values for Endpoint settings, KMS key, and Tags. In the Test endpoint connection (optional) section, select the same VPC as the source database. Select the corresponding replication instance, and then click Run test. The status needs to be successful.
Click Create endpoint.
Step 4. Create a database migration task
In the AWS DMS console, go to the Data migration tasks page. Switch to your region. Then click Create task in the upper-right corner of the window.
Configure the following information:
- Task identifier: fill in a name for the task. It is recommended to use a name that is easy to remember.
- Descriptive Amazon Resource Name (ARN) - optional: create a friendly name for the default DMS ARN.
- Replication instance: select the AWS DMS instance that you just created.
- Source database endpoint: select the source database endpoint that you just created.
- Target database endpoint: select the target database endpoint that you just created.
- Migration type: select a migration type as needed. In this example, select Migrate existing data and replicate ongoing changes.
Configure the following information:
- Editing mode: select Wizard.
- Custom CDC stop mode for source transactions: use the default setting.
- Target table preparation mode: select Do nothing or other options as needed. In this example, select Do nothing.
- Stop task after full load completes: use the default setting.
- Include LOB columns in replication: select Limited LOB mode.
- Maximum LOB size in (KB): use the default value 32.
- Turn on validation: select it according to your needs.
- Task logs: select Turn on CloudWatch logs for troubleshooting in future. Use the default settings for the related configurations.
In the Table mappings section, specify the database to be migrated.
The schema name is the database name in the Amazon RDS instance. The default value of the Source name is "%", which means that all databases in the Amazon RDS will be migrated to TiDB. It will cause the system databases such as
mysql
andsys
in Amazon RDS to be migrated to the TiDB cluster, and result in task failure. Therefore, it is recommended to fill in the specific database name, or filter out all system databases. For example, according to the settings in the following screenshot, only the database namedfranktest
and all the tables in that database will be migrated.Click Create task in the lower-right corner.
Go back to the Data migration tasks page. Switch to your region. You can see the status and progress of the task.
If you encounter any issues or failures during the migration, you can check the log information in CloudWatch to troubleshoot the issues.
See also
If you want to migrate from MySQL-compatible databases, such as Aurora MySQL and Amazon Relational Database Service (RDS), to TiDB Cloud, it is recommended to use Data Migration on TiDB Cloud.
If you want to migrate from Amazon RDS for Oracle to TiDB Cloud Serverless Tier Using AWS DMS, see Migrate from Amazon RDS for Oracle to TiDB Cloud Serverless Tier Using AWS DMS.