Naming Conventions for Data Import
You can import data into TiDB Cloud in the following file formats: CSV, Parquet, Aurora Snapshot, and SQL. To make sure that your data is imported successfully, you need to prepare the following two types of files:
- Schema file. Prepare the database schema file (optional) and the table schema file, both in SQL format (
.sql
). If the table schema file is not provided, you need to create the corresponding table manually in the target database in advance. - Data file. Prepare a data file that conforms to the naming conventions for importing data. If the data file name can not meet the requirements, it is recommended to use File Pattern to perform the import task. Otherwise, the import task cannot scan the data files you want to import.
Naming conventions for schema files
This section describes the naming conventions for database and table schema files. The naming conventions for schema files are the same for all the following types of source files: CSV, Parquet, Aurora Snapshot, and SQL.
The naming conventions for schema files are as follows:
- Database schema file (optional):
${db_name}-schema-create.sql
- Table schema file:
${db_name}.${table_name}-schema.sql
The following is an example of a database schema file:
Name:
import_db-schema-create.sql
File content:
CREATE DATABASE import_db;
The following is an example of a table schema file:
Name:
import_db.test_table-schema.sql
File content:
CREATE TABLE test_table ( id INTEGER PRIMARY KEY, val VARCHAR(255) );
Naming conventions for data files
This section describes the naming conventions for data files. Depending on the type of source files, the naming conventions for data files are different.
CSV
When you import CSV files, name the data files as follows:
${db_name}.${table_name}[.XXXXXX].csv
([.XXXXXX] is optional)
For example:
import_db.test_table.csv
import_db.test_table.01.csv
Parquet
When you import Parquet files, name the data files as follows:
${db_name}.${table_name}[.XXXXXX].parquet[.{snappy|gz|lzo}]
([.XXXXXXX]
and[.{snappy|gz|lzo}]
are optional)
For example:
import_db.test_table.parquet
import_db.test_table.01.parquet
import_db.test_table.parquet.gz
import_db.test_table.01.parquet.gz
Aurora Snapshot
For Aurora Snapshot files, all files with the .parquet
suffix in the ${db_name}.${table_name}/
folder conform to the naming convention. A data file name can contain any prefix consisting of "a-z, 0-9, - , _ , ." and suffix ".parquet".
For example:
import_db.test_table/mydata.parquet
import_db.test_table/part001/mydata.parquet
import_db.test_table/part002/mydata-part002.parquet
SQL
When you import SQL files, name the data files as follows:
${db_name}.${table_name}[.XXXXXXX].sql
([.XXXXXXX] is optional)
For example:
import_db.test_table.sql
import_db.test_table.01.sql
If the SQL file is exported through TiDB Dumpling with the default configuration, it conforms to the naming convention by default.
File pattern
If the source data file of CSV or Parquet does not conform to the naming convention, you can use the file pattern feature to establish the name mapping relationship between the source data file and the target table. This feature does not support Aurora Snapshot and SQL data files.
- For CSV files, see File Pattern in Step 4. Import CSV files to TiDB Cloud
- For Parquet files, see File Pattern in Step 4. Import Parquet files to TiDB Cloud