Sign InTry Free

Naming Conventions for Data Import

You can import data into TiDB Cloud in the following file formats: CSV, Parquet, Aurora Snapshot, and SQL. To make sure that your data is imported successfully, you need to prepare the following two types of files:

  • Schema file. Prepare the database schema file (optional) and the table schema file, both in SQL format (.sql). If the table schema file is not provided, you need to create the corresponding table manually in the target database in advance.
  • Data file. Prepare a data file that conforms to the naming conventions for importing data. If the data file name can not meet the requirements, it is recommended to use File Pattern to perform the import task. Otherwise, the import task cannot scan the data files you want to import.

Naming conventions for schema files

This section describes the naming conventions for database and table schema files. The naming conventions for schema files are the same for all the following types of source files: CSV, Parquet, Aurora Snapshot, and SQL.

The naming conventions for schema files are as follows:

  • Database schema file (optional): ${db_name}-schema-create.sql
  • Table schema file: ${db_name}.${table_name}-schema.sql

The following is an example of a database schema file:

  • Name: import_db-schema-create.sql

  • File content:

    CREATE DATABASE import_db;
    

The following is an example of a table schema file:

  • Name: import_db.test_table-schema.sql

  • File content:

    CREATE TABLE test_table (
        id INTEGER PRIMARY KEY,
        val VARCHAR(255)
    );
    

Naming conventions for data files

This section describes the naming conventions for data files. Depending on the type of source files, the naming conventions for data files are different.

CSV

When you import CSV files, name the data files as follows:

  • ${db_name}.${table_name}[.XXXXXX].csv ([.XXXXXX] is optional)

For example:

  • import_db.test_table.csv
  • import_db.test_table.01.csv

Parquet

When you import Parquet files, name the data files as follows:

  • ${db_name}.${table_name}[.XXXXXX].parquet[.{snappy|gz|lzo}] ([.XXXXXXX] and [.{snappy|gz|lzo}] are optional)

For example:

  • import_db.test_table.parquet
  • import_db.test_table.01.parquet
  • import_db.test_table.parquet.gz
  • import_db.test_table.01.parquet.gz

Aurora Snapshot

For Aurora Snapshot files, all files with the .parquet suffix in the ${db_name}.${table_name}/ folder conform to the naming convention. A data file name can contain any prefix consisting of "a-z, 0-9, - , _ , ." and suffix ".parquet".

For example:

  • import_db.test_table/mydata.parquet
  • import_db.test_table/part001/mydata.parquet
  • import_db.test_table/part002/mydata-part002.parquet

SQL

When you import SQL files, name the data files as follows:

  • ${db_name}.${table_name}[.XXXXXXX].sql ([.XXXXXXX] is optional)

For example:

  • import_db.test_table.sql
  • import_db.test_table.01.sql

If the SQL file is exported through TiDB Dumpling with the default configuration, it conforms to the naming convention by default.

File pattern

If the source data file of CSV or Parquet does not conform to the naming convention, you can use the file pattern feature to establish the name mapping relationship between the source data file and the target table. This feature does not support Aurora Snapshot and SQL data files.

Download PDFRequest docs changesAsk questions on TiDB Forum
Was this page helpful?
Open Source Ecosystem
TiDB
TiKV
TiSpark
Chaos Mesh
© 2023 PingCAP. All Rights Reserved.