diff --git a/tidb-cloud/import-csv-files-serverless.md b/tidb-cloud/import-csv-files-serverless.md index a38102b427262..bc75aae79bc01 100644 --- a/tidb-cloud/import-csv-files-serverless.md +++ b/tidb-cloud/import-csv-files-serverless.md @@ -99,43 +99,44 @@ To import the CSV files to TiDB Cloud Serverless, take the following steps: 2. Click the name of your target cluster to go to its overview page, and then click **Data** > **Import** in the left navigation pane. -2. Select **Import data from Cloud Storage**, and then click **Amazon S3**. +2. Click **Import data from Cloud Storage**. -3. On the **Import Data from Amazon S3** page, provide the following information for the source CSV files: +3. On the **Import Data from Cloud Storage** page, provide the following information: - - **Import File Count**: select **One file** or **Multiple files** as needed. - - **Included Schema Files**: this field is only visible when importing multiple files. If the source folder contains the target table schemas, select **Yes**. Otherwise, select **No**. - - **Data Format**: select **CSV**. - - **File URI** or **Folder URI**: + - **Storage Provider**: select **Amazon S3**. + - **Source Files URI**: - When importing one file, enter the source file URI and name in the following format `s3://[bucket_name]/[data_source_folder]/[file_name].csv`. For example, `s3://sampledata/ingest/TableName.01.csv`. - When importing multiple files, enter the source file URI and name in the following format `s3://[bucket_name]/[data_source_folder]/`. For example, `s3://sampledata/ingest/`. - - **Bucket Access**: you can use either an AWS Role ARN or an AWS access key to access your bucket. For more information, see [Configure Amazon S3 access](/tidb-cloud/serverless-external-storage.md#configure-amazon-s3-access). + - **Credential**: you can use either an AWS Role ARN or an AWS access key to access your bucket. For more information, see [Configure Amazon S3 access](/tidb-cloud/serverless-external-storage.md#configure-amazon-s3-access). - **AWS Role ARN**: enter the AWS Role ARN value. - **AWS Access Key**: enter the AWS access key ID and AWS secret access key. -4. Click **Connect**. +4. Click **Next**. -5. In the **Destination** section, select the target database and table. +5. In the **Destination Mapping** section, specify how source files are mapped to target tables. - When importing multiple files, you can use **Advanced Settings** > **Mapping Settings** to define a custom mapping rule for each target table and its corresponding CSV file. After that, the data source files will be re-scanned using the provided custom mapping rule. + When a directory is specified in **Source Files URI**, the **Use [File naming conventions](/tidb-cloud/naming-conventions-for-data-import.md) for automatic mapping** option is selected by default. - When you enter the source file URI and name in **Source File URIs and Names**, make sure it is in the following format `s3://[bucket_name]/[data_source_folder]/[file_name].csv`. For example, `s3://sampledata/ingest/TableName.01.csv`. + > **Note:** + > + > When a single file is specified in **Source Files URI**, the **Use [File naming conventions](/tidb-cloud/naming-conventions-for-data-import.md) for automatic mapping** option is not displayed, and TiDB Cloud automatically populates the **Source** field with the file name. In this case, you only need to select the target database and table for data import. - You can also use wildcards to match the source files. For example: + - To let TiDB Cloud automatically map all source files that follow the [File naming conventions](/tidb-cloud/naming-conventions-for-data-import.md) to their corresponding tables, keep this option selected and select **CSV** as the data format. - - `s3://[bucket_name]/[data_source_folder]/my-data?.csv`: all CSV files starting with `my-data` followed by one character (such as `my-data1.csv` and `my-data2.csv`) in that folder will be imported into the same target table. + - To manually configure the mapping rules to associate your source CSV files with the target database and table, unselect this option, and then fill in the following fields: - - `s3://[bucket_name]/[data_source_folder]/my-data*.csv`: all CSV files in the folder starting with `my-data` will be imported into the same target table. + - **Source**: enter the file name pattern in the `[file_name].csv` format. For example: `TableName.01.csv`. You can also use wildcards to match multiple files. Only `*` and `?` wildcards are supported. - Note that only `?` and `*` are supported. + - `my-data?.csv`: matches all CSV files that start with `my-data` followed by a single character, such as `my-data1.csv` and `my-data2.csv`. + - `my-data*.csv`: matches all CSV files that start with `my-data`, such as `my-data-2023.csv` and `my-data-final.csv`. - > **Note:** - > - > The URI must contain the data source folder. + - **Target Database** and **Target Table**: select the target database and table to import the data to. + +6. Click **Next**. TiDB Cloud scans the source files accordingly. -6. Click **Start Import**. +7. Review the scan results, check the data files found and corresponding target tables, and then click **Start Import**. -7. When the import progress shows **Completed**, check the imported tables. +8. When the import progress shows **Completed**, check the imported tables. @@ -151,41 +152,42 @@ To import the CSV files to TiDB Cloud Serverless, take the following steps: 2. Click the name of your target cluster to go to its overview page, and then click **Data** > **Import** in the left navigation pane. -2. Select **Import data from Cloud Storage**, and then click **Google Cloud Storage**. +2. Click **Import data from Cloud Storage**. -3. On the **Import Data from Google Cloud Storage** page, provide the following information for the source CSV files: +3. On the **Import Data from Cloud Storage** page, provide the following information: - - **Import File Count**: select **One file** or **Multiple files** as needed. - - **Included Schema Files**: this field is only visible when importing multiple files. If the source folder contains the target table schemas, select **Yes**. Otherwise, select **No**. - - **Data Format**: select **CSV**. - - **File URI** or **Folder URI**: + - **Storage Provider**: select **Google Cloud Storage**. + - **Source Files URI**: - When importing one file, enter the source file URI and name in the following format `[gcs|gs]://[bucket_name]/[data_source_folder]/[file_name].csv`. For example, `[gcs|gs]://sampledata/ingest/TableName.01.csv`. - When importing multiple files, enter the source file URI and name in the following format `[gcs|gs]://[bucket_name]/[data_source_folder]/`. For example, `[gcs|gs]://sampledata/ingest/`. - - **Bucket Access**: you can use a service account key to access your bucket. For more information, see [Configure GCS access](/tidb-cloud/serverless-external-storage.md#configure-gcs-access). + - **Credential**: you can use a GCS IAM Role Service Account key to access your bucket. For more information, see [Configure GCS access](/tidb-cloud/serverless-external-storage.md#configure-gcs-access). -4. Click **Connect**. +4. Click **Next**. -5. In the **Destination** section, select the target database and table. +5. In the **Destination Mapping** section, specify how source files are mapped to target tables. - When importing multiple files, you can use **Advanced Settings** > **Mapping Settings** to define a custom mapping rule for each target table and its corresponding CSV file. After that, the data source files will be re-scanned using the provided custom mapping rule. + When a directory is specified in **Source Files URI**, the **Use [File naming conventions](/tidb-cloud/naming-conventions-for-data-import.md) for automatic mapping** option is selected by default. - When you enter the source file URI and name in **Source File URIs and Names**, make sure it is in the following format `[gcs|gs]://[bucket_name]/[data_source_folder]/[file_name].csv`. For example, `[gcs|gs]://sampledata/ingest/TableName.01.csv`. + > **Note:** + > + > When a single file is specified in **Source Files URI**, the **Use [File naming conventions](/tidb-cloud/naming-conventions-for-data-import.md) for automatic mapping** option is not displayed, and TiDB Cloud automatically populates the **Source** field with the file name. In this case, you only need to select the target database and table for data import. - You can also use wildcards to match the source files. For example: + - To let TiDB Cloud automatically map all source files that follow the [File naming conventions](/tidb-cloud/naming-conventions-for-data-import.md) to their corresponding tables, keep this option selected and select **CSV** as the data format. - - `[gcs|gs]://[bucket_name]/[data_source_folder]/my-data?.csv`: all CSV files starting with `my-data` followed by one character (such as `my-data1.csv` and `my-data2.csv`) in that folder will be imported into the same target table. + - To manually configure the mapping rules to associate your source CSV files with the target database and table, unselect this option, and then fill in the following fields: - - `[gcs|gs]://[bucket_name]/[data_source_folder]/my-data*.csv`: all CSV files in the folder starting with `my-data` will be imported into the same target table. + - **Source**: enter the file name pattern in the `[file_name].csv` format. For example: `TableName.01.csv`. You can also use wildcards to match multiple files. Only `*` and `?` wildcards are supported. - Note that only `?` and `*` are supported. + - `my-data?.csv`: matches all CSV files that start with `my-data` followed by a single character, such as `my-data1.csv` and `my-data2.csv`. + - `my-data*.csv`: matches all CSV files that start with `my-data`, such as `my-data-2023.csv` and `my-data-final.csv`. - > **Note:** - > - > The URI must contain the data source folder. + - **Target Database** and **Target Table**: select the target database and table to import the data to. + +6. Click **Next**. TiDB Cloud scans the source files accordingly. -6. Click **Start Import**. +7. Review the scan results, check the data files found and corresponding target tables, and then click **Start Import**. -7. When the import progress shows **Completed**, check the imported tables. +8. When the import progress shows **Completed**, check the imported tables. @@ -201,41 +203,42 @@ To import the CSV files to TiDB Cloud Serverless, take the following steps: 2. Click the name of your target cluster to go to its overview page, and then click **Data** > **Import** in the left navigation pane. -2. Select **Import data from Cloud Storage**, and then click **Azure Blob Storage**. +2. Click **Import data from Cloud Storage**. -3. On the **Import Data from Azure Blob Storage** page, provide the following information for the source CSV files: +3. On the **Import Data from Cloud Storage** page, provide the following information: - - **Import File Count**: select **One file** or **Multiple files** as needed. - - **Included Schema Files**: this field is only visible when importing multiple files. If the source folder contains the target table schemas, select **Yes**. Otherwise, select **No**. - - **Data Format**: select **CSV**. - - **File URI** or **Folder URI**: + - **Storage Provider**: select **Azure Blob Storage**. + - **Source Files URI**: - When importing one file, enter the source file URI and name in the following format `[azure|https]://[bucket_name]/[data_source_folder]/[file_name].csv`. For example, `[azure|https]://sampledata/ingest/TableName.01.csv`. - When importing multiple files, enter the source file URI and name in the following format `[azure|https]://[bucket_name]/[data_source_folder]/`. For example, `[azure|https]://sampledata/ingest/`. - - **Bucket Access**: you can use a shared access signature (SAS) token to access your bucket. For more information, see [Configure Azure Blob Storage access](/tidb-cloud/serverless-external-storage.md#configure-azure-blob-storage-access). + - **Credential**: you can use a shared access signature (SAS) token to access your bucket. For more information, see [Configure Azure Blob Storage access](/tidb-cloud/serverless-external-storage.md#configure-azure-blob-storage-access) -4. Click **Connect**. +4. Click **Next**. -5. In the **Destination** section, select the target database and table. +5. In the **Destination Mapping** section, specify how source files are mapped to target tables. - When importing multiple files, you can use **Advanced Settings** > **Mapping Settings** to define a custom mapping rule for each target table and its corresponding CSV file. After that, the data source files will be re-scanned using the provided custom mapping rule. + When a directory is specified in **Source Files URI**, the **Use [File naming conventions](/tidb-cloud/naming-conventions-for-data-import.md) for automatic mapping** option is selected by default. - When you enter the source file URI and name in **Source File URIs and Names**, make sure it is in the following format `[azure|https]://[bucket_name]/[data_source_folder]/[file_name].csv`. For example, `[azure|https]://sampledata/ingest/TableName.01.csv`. + > **Note:** + > + > When a single file is specified in **Source Files URI**, the **Use [File naming conventions](/tidb-cloud/naming-conventions-for-data-import.md) for automatic mapping** option is not displayed, and TiDB Cloud automatically populates the **Source** field with the file name. In this case, you only need to select the target database and table for data import. - You can also use wildcards to match the source files. For example: + - To let TiDB Cloud automatically map all source files that follow the [File naming conventions](/tidb-cloud/naming-conventions-for-data-import.md) to their corresponding tables, keep this option selected and select **CSV** as the data format. - - `[azure|https]://[bucket_name]/[data_source_folder]/my-data?.csv`: all CSV files starting with `my-data` followed by one character (such as `my-data1.csv` and `my-data2.csv`) in that folder will be imported into the same target table. + - To manually configure the mapping rules to associate your source CSV files with the target database and table, unselect this option, and then fill in the following fields: - - `[azure|https]://[bucket_name]/[data_source_folder]/my-data*.csv`: all CSV files in the folder starting with `my-data` will be imported into the same target table. + - **Source**: enter the file name pattern in the `[file_name].csv` format. For example: `TableName.01.csv`. You can also use wildcards to match multiple files. Only `*` and `?` wildcards are supported. - Note that only `?` and `*` are supported. + - `my-data?.csv`: matches all CSV files that start with `my-data` followed by a single character, such as `my-data1.csv` and `my-data2.csv`. + - `my-data*.csv`: matches all CSV files that start with `my-data`, such as `my-data-2023.csv` and `my-data-final.csv`. - > **Note:** - > - > The URI must contain the data source folder. + - **Target Database** and **Target Table**: select the target database and table to import the data to. + +6. Click **Next**. TiDB Cloud scans the source files accordingly. -6. Click **Start Import**. +7. Review the scan results, check the data files found and corresponding target tables, and then click **Start Import**. -7. When the import progress shows **Completed**, check the imported tables. +8. When the import progress shows **Completed**, check the imported tables. @@ -251,41 +254,42 @@ To import the CSV files to TiDB Cloud Serverless, take the following steps: 2. Click the name of your target cluster to go to its overview page, and then click **Data** > **Import** in the left navigation pane. -2. Select **Import data from Cloud Storage**, and then click **Alibaba Cloud OSS**. +2. Click **Import data from Cloud Storage**. -3. On the **Import Data from Alibaba Cloud OSS** page, provide the following information for the source CSV files: +3. On the **Import Data from Cloud Storage** page, provide the following information: - - **Import File Count**: select **One file** or **Multiple files** as needed. - - **Included Schema Files**: this field is only visible when importing multiple files. If the source folder contains the target table schemas, select **Yes**. Otherwise, select **No**. - - **Data Format**: select **CSV**. - - **File URI** or **Folder URI**: + - **Storage Provider**: select **Alibaba Cloud OSS**. + - **Source Files URI**: - When importing one file, enter the source file URI and name in the following format `oss://[bucket_name]/[data_source_folder]/[file_name].csv`. For example, `oss://sampledata/ingest/TableName.01.csv`. - When importing multiple files, enter the source file URI and name in the following format `oss://[bucket_name]/[data_source_folder]/`. For example, `oss://sampledata/ingest/`. - - **Bucket Access**: you can use an AccessKey pair to access your bucket. For more information, see [Configure Alibaba Cloud Object Storage Service (OSS) access](/tidb-cloud/serverless-external-storage.md#configure-alibaba-cloud-object-storage-service-oss-access). + - **Credential**: you can use an AccessKey pair to access your bucket. For more information, see [Configure Alibaba Cloud Object Storage Service (OSS) access](/tidb-cloud/serverless-external-storage.md#configure-alibaba-cloud-object-storage-service-oss-access). -4. Click **Connect**. +4. Click **Next**. -5. In the **Destination** section, select the target database and table. +5. In the **Destination Mapping** section, specify how source files are mapped to target tables. - When importing multiple files, you can use **Advanced Settings** > **Mapping Settings** to define a custom mapping rule for each target table and its corresponding CSV file. After that, the data source files will be re-scanned using the provided custom mapping rule. + When a directory is specified in **Source Files URI**, the **Use [File naming conventions](/tidb-cloud/naming-conventions-for-data-import.md) for automatic mapping** option is selected by default. - When you enter the source file URI and name in **Source File URIs and Names**, make sure it is in the following format `oss://[bucket_name]/[data_source_folder]/[file_name].csv`. For example, `oss://sampledata/ingest/TableName.01.csv`. + > **Note:** + > + > When a single file is specified in **Source Files URI**, the **Use [File naming conventions](/tidb-cloud/naming-conventions-for-data-import.md) for automatic mapping** option is not displayed, and TiDB Cloud automatically populates the **Source** field with the file name. In this case, you only need to select the target database and table for data import. - You can also use wildcards to match the source files. For example: + - To let TiDB Cloud automatically map all source files that follow the [File naming conventions](/tidb-cloud/naming-conventions-for-data-import.md) to their corresponding tables, keep this option selected and select **CSV** as the data format. - - `oss://[bucket_name]/[data_source_folder]/my-data?.csv`: all CSV files starting with `my-data` followed by one character (such as `my-data1.csv` and `my-data2.csv`) in that folder will be imported into the same target table. + - To manually configure the mapping rules to associate your source CSV files with the target database and table, unselect this option, and then fill in the following fields: - - `oss://[bucket_name]/[data_source_folder]/my-data*.csv`: all CSV files in the folder starting with `my-data` will be imported into the same target table. + - **Source**: enter the file name pattern in the `[file_name].csv` format. For example: `TableName.01.csv`. You can also use wildcards to match multiple files. Only `*` and `?` wildcards are supported. - Note that only `?` and `*` are supported. + - `my-data?.csv`: matches all CSV files that start with `my-data` followed by a single character, such as `my-data1.csv` and `my-data2.csv`. + - `my-data*.csv`: matches all CSV files that start with `my-data`, such as `my-data-2023.csv` and `my-data-final.csv`. - > **Note:** - > - > The URI must contain the data source folder. + - **Target Database** and **Target Table**: select the target database and table to import the data to. + +6. Click **Next**. TiDB Cloud scans the source files accordingly. -6. Click **Start Import**. +7. Review the scan results, check the data files found and corresponding target tables, and then click **Start Import**. -7. When the import progress shows **Completed**, check the imported tables. +8. When the import progress shows **Completed**, check the imported tables. diff --git a/tidb-cloud/import-parquet-files-serverless.md b/tidb-cloud/import-parquet-files-serverless.md index c7ba5cf642fba..e56be8b584b0c 100644 --- a/tidb-cloud/import-parquet-files-serverless.md +++ b/tidb-cloud/import-parquet-files-serverless.md @@ -107,43 +107,44 @@ To import the Parquet files to TiDB Cloud Serverless, take the following steps: 2. Click the name of your target cluster to go to its overview page, and then click **Data** > **Import** in the left navigation pane. -2. Select **Import data from Cloud Storage**, and then click **Amazon S3**. +2. Click **Import data from Cloud Storage**. -3. On the **Import Data from Amazon S3** page, provide the following information for the source Parquet files: +3. On the **Import Data from Cloud Storage** page, provide the following information: - - **Import File Count**: select **One file** or **Multiple files** as needed. - - **Included Schema Files**: this field is only visible when importing multiple files. If the source folder contains the target table schemas, select **Yes**. Otherwise, select **No**. - - **Data Format**: select **Parquet**. - - **File URI** or **Folder URI**: + - **Storage Provider**: select **Amazon S3**. + - **Source Files URI**: - When importing one file, enter the source file URI and name in the following format `s3://[bucket_name]/[data_source_folder]/[file_name].parquet`. For example, `s3://sampledata/ingest/TableName.01.parquet`. - When importing multiple files, enter the source file URI and name in the following format `s3://[bucket_name]/[data_source_folder]/`. For example, `s3://sampledata/ingest/`. - - **Bucket Access**: you can use either an AWS Role ARN or an AWS access key to access your bucket. For more information, see [Configure Amazon S3 access](/tidb-cloud/serverless-external-storage.md#configure-amazon-s3-access). + - **Credential**: you can use either an AWS Role ARN or an AWS access key to access your bucket. For more information, see [Configure Amazon S3 access](/tidb-cloud/serverless-external-storage.md#configure-amazon-s3-access). - **AWS Role ARN**: enter the AWS Role ARN value. - **AWS Access Key**: enter the AWS access key ID and AWS secret access key. -4. Click **Connect**. +4. Click **Next**. -5. In the **Destination** section, select the target database and table. +5. In the **Destination Mapping** section, specify how source files are mapped to target tables. - When importing multiple files, you can use **Advanced Settings** > **Mapping Settings** to define a custom mapping rule for each target table and its corresponding Parquet file. After that, the data source files will be re-scanned using the provided custom mapping rule. + When a directory is specified in **Source Files URI**, the **Use [File naming conventions](/tidb-cloud/naming-conventions-for-data-import.md) for automatic mapping** option is selected by default. - When you enter the source file URI and name in **Source File URIs and Names**, make sure it is in the following format `s3://[bucket_name]/[data_source_folder]/[file_name].parquet`. For example, `s3://sampledata/ingest/TableName.01.parquet`. + > **Note:** + > + > When a single file is specified in **Source Files URI**, the **Use [File naming conventions](/tidb-cloud/naming-conventions-for-data-import.md) for automatic mapping** option is not displayed, and TiDB Cloud automatically populates the **Source** field with the file name. In this case, you only need to select the target database and table for data import. - You can also use wildcards to match the source files. For example: + - To let TiDB Cloud automatically map all source files that follow the [File naming conventions](/tidb-cloud/naming-conventions-for-data-import.md) to their corresponding tables, keep this option selected and select **Parquet** as the data format. - - `s3://[bucket_name]/[data_source_folder]/my-data?.parquet`: all Parquet files starting with `my-data` followed by one character (such as `my-data1.parquet` and `my-data2.parquet`) in that folder will be imported into the same target table. + - To manually configure the mapping rules to associate your source Parquet files with the target database and table, unselect this option, and then fill in the following fields: - - `s3://[bucket_name]/[data_source_folder]/my-data*.parquet`: all Parquet files in the folder starting with `my-data` will be imported into the same target table. + - **Source**: enter the file name pattern in the `[file_name].parquet` format. For example: `TableName.01.parquet`. You can also use wildcards to match multiple files. Only `*` and `?` wildcards are supported. - Note that only `?` and `*` are supported. + - `my-data?.parquet`: matches all Parquet files that start with `my-data` followed by a single character, such as `my-data1.parquet` and `my-data2.parquet`. + - `my-data*.parquet`: matches all Parquet files that start with `my-data`, such as `my-data-2023.parquet` and `my-data-final.parquet`. - > **Note:** - > - > The URI must contain the data source folder. + - **Target Database** and **Target Table**: select the target database and table to import the data to. + +6. Click **Next**. TiDB Cloud scans the source files accordingly. -6. Click **Start Import**. +7. Review the scan results, check the data files found and corresponding target tables, and then click **Start Import**. -7. When the import progress shows **Completed**, check the imported tables. +8. When the import progress shows **Completed**, check the imported tables. @@ -159,41 +160,42 @@ To import the Parquet files to TiDB Cloud Serverless, take the following steps: 2. Click the name of your target cluster to go to its overview page, and then click **Data** > **Import** in the left navigation pane. -2. Select **Import data from Cloud Storage**, and then click **Google Cloud Storage**. +2. Click **Import data from Cloud Storage**. -3. On the **Import Data from Google Cloud Storage** page, provide the following information for the source Parquet files: +3. On the **Import Data from Google Cloud Storage** page, provide the following information: - - **Import File Count**: select **One file** or **Multiple files** as needed. - - **Included Schema Files**: this field is only visible when importing multiple files. If the source folder contains the target table schemas, select **Yes**. Otherwise, select **No**. - - **Data Format**: select **Parquet**. - - **File URI** or **Folder URI**: + - **Storage Provider**: select **Google Cloud Storage**. + - **Source Files URI**: - When importing one file, enter the source file URI and name in the following format `[gcs|gs]://[bucket_name]/[data_source_folder]/[file_name].parquet`. For example, `[gcs|gs]://sampledata/ingest/TableName.01.parquet`. - When importing multiple files, enter the source file URI and name in the following format `[gcs|gs]://[bucket_name]/[data_source_folder]/`. For example, `[gcs|gs]://sampledata/ingest/`. - - **Bucket Access**: you can use a GCS IAM Role to access your bucket. For more information, see [Configure GCS access](/tidb-cloud/serverless-external-storage.md#configure-gcs-access). + - **Credential**: you can use a GCS IAM Role Service Account key to access your bucket. For more information, see [Configure GCS access](/tidb-cloud/serverless-external-storage.md#configure-gcs-access). -4. Click **Connect**. +4. Click **Next**. -5. In the **Destination** section, select the target database and table. +5. In the **Destination Mapping** section, specify how source files are mapped to target tables. - When importing multiple files, you can use **Advanced Settings** > **Mapping Settings** to define a custom mapping rule for each target table and its corresponding Parquet file. After that, the data source files will be re-scanned using the provided custom mapping rule. + When a directory is specified in **Source Files URI**, the **Use [File naming conventions](/tidb-cloud/naming-conventions-for-data-import.md) for automatic mapping** option is selected by default. - When you enter the source file URI and name in **Source File URIs and Names**, make sure it is in the following format `[gcs|gs]://[bucket_name]/[data_source_folder]/[file_name].parquet`. For example, `[gcs|gs]://sampledata/ingest/TableName.01.parquet`. + > **Note:** + > + > When a single file is specified in **Source Files URI**, the **Use [File naming conventions](/tidb-cloud/naming-conventions-for-data-import.md) for automatic mapping** option is not displayed, and TiDB Cloud automatically populates the **Source** field with the file name. In this case, you only need to select the target database and table for data import. - You can also use wildcards to match the source files. For example: + - To let TiDB Cloud automatically map all source files that follow the [File naming conventions](/tidb-cloud/naming-conventions-for-data-import.md) to their corresponding tables, keep this option selected and select **Parquet** as the data format. - - `[gcs|gs]://[bucket_name]/[data_source_folder]/my-data?.parquet`: all Parquet files starting with `my-data` followed by one character (such as `my-data1.parquet` and `my-data2.parquet`) in that folder will be imported into the same target table. + - To manually configure the mapping rules to associate your source Parquet files with the target database and table, unselect this option, and then fill in the following fields: - - `[gcs|gs]://[bucket_name]/[data_source_folder]/my-data*.parquet`: all Parquet files in the folder starting with `my-data` will be imported into the same target table. + - **Source**: enter the file name pattern in the `[file_name].parquet` format. For example: `TableName.01.parquet`. You can also use wildcards to match multiple files. Only `*` and `?` wildcards are supported. - Note that only `?` and `*` are supported. + - `my-data?.parquet`: matches all Parquet files that start with `my-data` followed by a single character, such as `my-data1.parquet` and `my-data2.parquet`. + - `my-data*.parquet`: matches all Parquet files that start with `my-data`, such as `my-data-2023.parquet` and `my-data-final.parquet`. - > **Note:** - > - > The URI must contain the data source folder. + - **Target Database** and **Target Table**: select the target database and table to import the data to. + +6. Click **Next**. TiDB Cloud scans the source files accordingly. -6. Click **Start Import**. +7. Review the scan results, check the data files found and corresponding target tables, and then click **Start Import**. -7. When the import progress shows **Completed**, check the imported tables. +8. When the import progress shows **Completed**, check the imported tables. @@ -209,41 +211,42 @@ To import the Parquet files to TiDB Cloud Serverless, take the following steps: 2. Click the name of your target cluster to go to its overview page, and then click **Data** > **Import** in the left navigation pane. -2. Select **Import data from Cloud Storage**, and then click **Azure Blob Storage**. +2. Click **Import data from Cloud Storage**. -3. On the **Import Data from Azure Blob Storage** page, provide the following information for the source Parquet files: +3. On the **Import Data from Cloud Storage** page, provide the following information: - - **Import File Count**: select **One file** or **Multiple files** as needed. - - **Included Schema Files**: this field is only visible when importing multiple files. If the source folder contains the target table schemas, select **Yes**. Otherwise, select **No**. - - **Data Format**: select **Parquet**. - - **File URI** or **Folder URI**: + - **Storage Provider**: select **Azure Blob Storage**. + - **Source Files URI**: - When importing one file, enter the source file URI and name in the following format `[azure|https]://[bucket_name]/[data_source_folder]/[file_name].parquet`. For example, `[azure|https]://sampledata/ingest/TableName.01.parquet`. - When importing multiple files, enter the source file URI and name in the following format `[azure|https]://[bucket_name]/[data_source_folder]/`. For example, `[azure|https]://sampledata/ingest/`. - - **Bucket Access**: you can use a shared access signature (SAS) token to access your bucket. For more information, see [Configure Azure Blob Storage access](/tidb-cloud/serverless-external-storage.md#configure-azure-blob-storage-access). + - **Credential**: you can use a shared access signature (SAS) token to access your bucket. For more information, see [Configure Azure Blob Storage access](/tidb-cloud/serverless-external-storage.md#configure-azure-blob-storage-access) -4. Click **Connect**. +4. Click **Next**. -5. In the **Destination** section, select the target database and table. +5. In the **Destination Mapping** section, specify how source files are mapped to target tables. - When importing multiple files, you can use **Advanced Settings** > **Mapping Settings** to define a custom mapping rule for each target table and its corresponding Parquet file. After that, the data source files will be re-scanned using the provided custom mapping rule. + When a directory is specified in **Source Files URI**, the **Use [File naming conventions](/tidb-cloud/naming-conventions-for-data-import.md) for automatic mapping** option is selected by default. - When you enter the source file URI and name in **Source File URIs and Names**, make sure it is in the following format `[azure|https]://[bucket_name]/[data_source_folder]/[file_name].parquet`. For example, `[azure|https]://sampledata/ingest/TableName.01.parquet`. + > **Note:** + > + > When a single file is specified in **Source Files URI**, the **Use [File naming conventions](/tidb-cloud/naming-conventions-for-data-import.md) for automatic mapping** option is not displayed, and TiDB Cloud automatically populates the **Source** field with the file name. In this case, you only need to select the target database and table for data import. - You can also use wildcards to match the source files. For example: + - To let TiDB Cloud automatically map all source files that follow the [File naming conventions](/tidb-cloud/naming-conventions-for-data-import.md) to their corresponding tables, keep this option selected and select **Parquet** as the data format. - - `[azure|https]://[bucket_name]/[data_source_folder]/my-data?.parquet`: all Parquet files starting with `my-data` followed by one character (such as `my-data1.parquet` and `my-data2.parquet`) in that folder will be imported into the same target table. + - To manually configure the mapping rules to associate your source Parquet files with the target database and table, unselect this option, and then fill in the following fields: - - `[azure|https]://[bucket_name]/[data_source_folder]/my-data*.parquet`: all Parquet files in the folder starting with `my-data` will be imported into the same target table. + - **Source**: enter the file name pattern in the `[file_name].parquet` format. For example: `TableName.01.parquet`. You can also use wildcards to match multiple files. Only `*` and `?` wildcards are supported. - Note that only `?` and `*` are supported. + - `my-data?.parquet`: matches all Parquet files that start with `my-data` followed by a single character, such as `my-data1.parquet` and `my-data2.parquet`. + - `my-data*.parquet`: matches all Parquet files that start with `my-data`, such as `my-data-2023.parquet` and `my-data-final.parquet`. - > **Note:** - > - > The URI must contain the data source folder. + - **Target Database** and **Target Table**: select the target database and table to import the data to. + +6. Click **Next**. TiDB Cloud scans the source files accordingly. -6. Click **Start Import**. +7. Review the scan results, check the data files found and corresponding target tables, and then click **Start Import**. -7. When the import progress shows **Completed**, check the imported tables. +8. When the import progress shows **Completed**, check the imported tables. @@ -259,41 +262,42 @@ To import the Parquet files to TiDB Cloud Serverless, take the following steps: 2. Click the name of your target cluster to go to its overview page, and then click **Data** > **Import** in the left navigation pane. -2. Select **Import data from Cloud Storage**, and then click **Alibaba Cloud OSS**. +2. Click **Import data from Cloud Storage**. -3. On the **Import Data from Alibaba Cloud OSS** page, provide the following information for the source Parquet files: +3. On the **Import Data from Cloud Storage** page, provide the following information: - - **Import File Count**: select **One file** or **Multiple files** as needed. - - **Included Schema Files**: this field is only visible when importing multiple files. If the source folder contains the target table schemas, select **Yes**. Otherwise, select **No**. - - **Data Format**: select **Parquet**. - - **File URI** or **Folder URI**: + - **Storage Provider**: select **Alibaba Cloud OSS**. + - **Source Files URI**: - When importing one file, enter the source file URI and name in the following format `oss://[bucket_name]/[data_source_folder]/[file_name].parquet`. For example, `oss://sampledata/ingest/TableName.01.parquet`. - When importing multiple files, enter the source file URI and name in the following format `oss://[bucket_name]/[data_source_folder]/`. For example, `oss://sampledata/ingest/`. - - **Bucket Access**: you can use an AccessKey pair to access your bucket. For more information, see [Configure Alibaba Cloud Object Storage Service (OSS) access](/tidb-cloud/serverless-external-storage.md#configure-alibaba-cloud-object-storage-service-oss-access). + - **Credential**: you can use an AccessKey pair to access your bucket. For more information, see [Configure Alibaba Cloud Object Storage Service (OSS) access](/tidb-cloud/serverless-external-storage.md#configure-alibaba-cloud-object-storage-service-oss-access). -4. Click **Connect**. +4. Click **Next**. -5. In the **Destination** section, select the target database and table. +5. In the **Destination Mapping** section, specify how source files are mapped to target tables. - When importing multiple files, you can use **Advanced Settings** > **Mapping Settings** to define a custom mapping rule for each target table and its corresponding Parquet file. After that, the data source files will be re-scanned using the provided custom mapping rule. + When a directory is specified in **Source Files URI**, the **Use [File naming conventions](/tidb-cloud/naming-conventions-for-data-import.md) for automatic mapping** option is selected by default. - When you enter the source file URI and name in **Source File URIs and Names**, make sure it is in the following format `oss://[bucket_name]/[data_source_folder]/[file_name].parquet`. For example, `oss://sampledata/ingest/TableName.01.parquet`. + > **Note:** + > + > When a single file is specified in **Source Files URI**, the **Use [File naming conventions](/tidb-cloud/naming-conventions-for-data-import.md) for automatic mapping** option is not displayed, and TiDB Cloud automatically populates the **Source** field with the file name. In this case, you only need to select the target database and table for data import. - You can also use wildcards to match the source files. For example: + - To let TiDB Cloud automatically map all source files that follow the [File naming conventions](/tidb-cloud/naming-conventions-for-data-import.md) to their corresponding tables, keep this option selected and select **Parquet** as the data format. - - `oss://[bucket_name]/[data_source_folder]/my-data?.parquet`: all Parquet files starting with `my-data` followed by one character (such as `my-data1.parquet` and `my-data2.parquet`) in that folder will be imported into the same target table. + - To manually configure the mapping rules to associate your source Parquet files with the target database and table, unselect this option, and then fill in the following fields: - - `oss://[bucket_name]/[data_source_folder]/my-data*.parquet`: all Parquet files in the folder starting with `my-data` will be imported into the same target table. + - **Source**: enter the file name pattern in the `[file_name].parquet` format. For example: `TableName.01.parquet`. You can also use wildcards to match multiple files. Only `*` and `?` wildcards are supported. - Note that only `?` and `*` are supported. + - `my-data?.parquet`: matches all Parquet files that start with `my-data` followed by a single character, such as `my-data1.parquet` and `my-data2.parquet`. + - `my-data*.parquet`: matches all Parquet files that start with `my-data`, such as `my-data-2023.parquet` and `my-data-final.parquet`. - > **Note:** - > - > The URI must contain the data source folder. + - **Target Database** and **Target Table**: select the target database and table to import the data to. + +6. Click **Next**. TiDB Cloud scans the source files accordingly. -6. Click **Start Import**. +7. Review the scan results, check the data files found and corresponding target tables, and then click **Start Import**. -7. When the import progress shows **Completed**, check the imported tables. +8. When the import progress shows **Completed**, check the imported tables. diff --git a/tidb-cloud/import-sample-data-serverless.md b/tidb-cloud/import-sample-data-serverless.md index ebd2cc08c2b1a..b1a506766a673 100644 --- a/tidb-cloud/import-sample-data-serverless.md +++ b/tidb-cloud/import-sample-data-serverless.md @@ -21,19 +21,25 @@ This document describes how to import the sample data into TiDB Cloud Serverless 2. Click the name of your target cluster to go to its overview page, and then click **Data** > **Import** in the left navigation pane. -2. Select **Import data from Cloud Storage**, and then click **Amazon S3**. +2. Click **Import data from Cloud Storage**. -3. On the **Import Data from Amazon S3** page, configure the following source data information: +3. On the **Import Data from Cloud Storage** page, provide the following information: - - **Import File Count**: for the sample data, select **Multiple files**. - - **Included Schema Files**: for the sample data, select **Yes**. - - **Data Format**: select **SQL**. - - **Folder URI** or **File URI**: enter the sample data URI `s3://tidbcloud-sample-data/data-ingestion/`. - - **Bucket Access**: for the sample data, you can only use a Role ARN to access its bucket. For your own data, you can use either an AWS access key or a Role ARN to access your bucket. - - **AWS Role ARN**: enter `arn:aws:iam::801626783489:role/import-sample-access`. - - **AWS Access Key**: skip this option for the sample data. + - **Storage Provider**: select **Amazon S3**. + - **Source Files URI**: enter the sample data URI `s3://tidbcloud-sample-data/data-ingestion/`. + - **Credential**: + - **AWS Role ARN**: enter `arn:aws:iam::801626783489:role/import-sample-access`. + - **AWS Access Key**: skip this option for the sample data. -4. Click **Connect** > **Start Import**. +4. Click **Next**. + +5. In the **Destination Mapping** section, keep the **Use [File naming conventions](/tidb-cloud/naming-conventions-for-data-import.md) for automatic mapping** option selected and select **SQL** as the data format. + +6. Click **Next**. + +7. Review the scan results, check the data files found and corresponding target tables, and then click **Start Import**. + +8. When the import progress shows **Completed**, check the imported tables. When the data import progress shows **Completed**, you have successfully imported the sample data and the database schema to your database in TiDB Cloud Serverless.