Integrate TiDB with Amazon AppFlow
Amazon AppFlow is a fully managed API integration service that you use to connect your software as a service (SaaS) applications to AWS services, and securely transfer data. With Amazon AppFlow, you can import and export data from and to TiDB into many types of data providers, such as Salesforce, Amazon S3, LinkedIn, and GitHub. For more information, see Supported source and destination applications in AWS documentation.
This document describes how to integrate TiDB with Amazon AppFlow and takes integrating a TiDB Serverless cluster as an example.
If you do not have a TiDB cluster, you can create a TiDB Serverless cluster, which is free and can be created in approximately 30 seconds.
Prerequisites
JDK 11 or above
Maven 3.8 or above
AWS CLI version 2
AWS Serverless Application Model Command Line Interface (AWS SAM CLI) 1.58.0 or above
An AWS Identity and Access Management (IAM) user with the following requirements:
The user can access AWS using an access key.
The user has the following permissions:
AWSCertificateManagerFullAccess
: used for reading and writing the AWS Secrets Manager.AWSCloudFormationFullAccess
: SAM CLI uses AWS CloudFormation to proclaim the AWS resources.AmazonS3FullAccess
: AWS CloudFormation uses Amazon S3 to publish.AWSLambda_FullAccess
: currently, AWS Lambda is the only way to implement a new connector for Amazon AppFlow.IAMFullAccess
: SAM CLI needs to create aConnectorFunctionRole
for the connector.
- A SalesForce account.
Step 1. Register a TiDB connector
Clone the code
Clone the integration example code repository for TiDB and Amazon AppFlow:
git clone https://github.com/pingcap-inc/tidb-appflow-integration
Build and upload a Lambda
Build the package:
cd tidb-appflow-integration
mvn clean package
(Optional) Configure your AWS access key ID and secret access key if you have not.
aws configure
Upload your JAR package as a Lambda:
sam deploy --guided
Note
- The
--guided
option uses prompts to guide you through the deployment. Your input will be stored in a configuration file, which issamconfig.toml
by default. stack_name
specifies the name of AWS Lambda that you are deploying.- This prompted guide uses AWS as the cloud provider of TiDB Serverless. To use Amazon S3 as the source or destination, you need to set the
region
of AWS Lambda as the same as that of Amazon S3. - If you have already run
sam deploy --guided
before, you can just runsam deploy
instead, and SAM CLI will use the configuration filesamconfig.toml
to simplify the interaction.
If you see a similar output as follows, this Lambda is successfully deployed.
Successfully created/updated stack - <stack_name> in <region>
Go to the AWS Lambda console, and you can see the Lambda that you just uploaded. Note that you need to select the correct region in the upper-right corner of the window.
Use Lambda to register a connector
In the AWS Management Console, navigate to Amazon AppFlow > Connectors and click Register new connector.
In the Register a new connector dialog, choose the Lambda function you uploaded and specify the connector label using the connector name.
Click Register. Then, a TiDB connector is registered successfully.
Step 2. Create a flow
Navigate to Amazon AppFlow > Flows and click Create flow.
Set the flow name
Enter the flow name, and then click Next.
Set the source and destination tables
Choose the Source details and Destination details. TiDB connector can be used in both of them.
Choose the source name. This document uses Salesforce as an example source.
After you register to Salesforce, Salesforce will add some example data to your platform. The following steps will use the Account object as an example source object.
Click Connect.
In the Connect to Salesforce dialog, specify the name of this connection, and then click Continue.
Click Allow to confirm that AWS can read your Salesforce data.
Note
If your company has already used the Professional Edition of Salesforce, the REST API is not enabled by default. You might need to register a new Developer Edition to use the REST API. For more information, refer to Salesforce Forum Topic.
In the Destination details area, choose TiDB-Connector as the destination. The Connect button is displayed.
Before clicking Connect, you need to create a
sf_account
table in TiDB for the Salesforce Account object. Note that this table schema is different from the sample data in Tutorial of Amazon AppFlow.CREATE TABLE `sf_account` (
`id` varchar(255) NOT NULL,
`name` varchar(150) NOT NULL DEFAULT '',
`type` varchar(150) NOT NULL DEFAULT '',
`billing_state` varchar(255) NOT NULL DEFAULT '',
`rating` varchar(255) NOT NULL DEFAULT '',
`industry` varchar(255) NOT NULL DEFAULT '',
PRIMARY KEY (`id`)
);
After the
sf_account
table is created, click Connect. A connection dialog is displayed.In the Connect to TiDB-Connector dialog, enter the connection properties of the TiDB cluster. If you use a TiDB Serverless cluster, you need to set the TLS option to
Yes
, which lets the TiDB connector use the TLS connection. Then, click Connect.Now you can get all tables in the database that you specified for connection. Choose the sf_account table from the drop-down list.
The following screenshot shows the configurations to transfer data from the Salesforce Account object to the
sf_account
table in TiDB:In the Error handling area, choose Stop the current flow run. In the Flow trigger area, choose the Run on demand trigger type, which means you need to run the flow manually. Then, click Next.
Set mapping rules
Map the fields of the Account object in Salesforce to the sf_account
table in TiDB, and then click Next.
The
sf_account
table is newly created in TiDB and it is empty.test> SELECT * FROM sf_account;
+----+------+------+---------------+--------+----------+
| id | name | type | billing_state | rating | industry |
+----+------+------+---------------+--------+----------+
+----+------+------+---------------+--------+----------+
To set a mapping rule, you can select a source field name on the left, and select a destination field name on the right. Then, click Map fields, and a rule is set.
The following mapping rules (Source field name -> Destination field name) are needed in this document:
Account ID -> id
Account Name -> name
Account Type -> type
Billing State/Province -> billing_state
Account Rating -> rating
Industry -> industry
(Optional) Set filters
If you want to add some filters to your data fields, you can set them here. Otherwise, skip this step and click Next.
Confirm and create the flow
Confirm the information of the flow to be created. If everything looks fine, click Create flow.
Step 3. Run the flow
On the page of the newly created flow, click Run flow in the upper-right corner.
The following screenshot shows an example that the flow runs successfully:
Query the sf_account
table, and you can see that the records from the Salesforce Account object have been written to it:
test> SELECT * FROM sf_account;
+--------------------+-------------------------------------+--------------------+---------------+--------+----------------+
| id | name | type | billing_state | rating | industry |
+--------------------+-------------------------------------+--------------------+---------------+--------+----------------+
| 001Do000003EDTlIAO | Sample Account for Entitlements | null | null | null | null |
| 001Do000003EDTZIA4 | Edge Communications | Customer - Direct | TX | Hot | Electronics |
| 001Do000003EDTaIAO | Burlington Textiles Corp of America | Customer - Direct | NC | Warm | Apparel |
| 001Do000003EDTbIAO | Pyramid Construction Inc. | Customer - Channel | null | null | Construction |
| 001Do000003EDTcIAO | Dickenson plc | Customer - Channel | KS | null | Consulting |
| 001Do000003EDTdIAO | Grand Hotels & Resorts Ltd | Customer - Direct | IL | Warm | Hospitality |
| 001Do000003EDTeIAO | United Oil & Gas Corp. | Customer - Direct | NY | Hot | Energy |
| 001Do000003EDTfIAO | Express Logistics and Transport | Customer - Channel | OR | Cold | Transportation |
| 001Do000003EDTgIAO | University of Arizona | Customer - Direct | AZ | Warm | Education |
| 001Do000003EDThIAO | United Oil & Gas, UK | Customer - Direct | UK | null | Energy |
| 001Do000003EDTiIAO | United Oil & Gas, Singapore | Customer - Direct | Singapore | null | Energy |
| 001Do000003EDTjIAO | GenePoint | Customer - Channel | CA | Cold | Biotechnology |
| 001Do000003EDTkIAO | sForce | null | CA | null | null |
+--------------------+-------------------------------------+--------------------+---------------+--------+----------------+
Noteworthy things
- If anything goes wrong, you can navigate to the CloudWatch page on the AWS Management Console to get logs.
- The steps in this document are based on Building custom connectors using the Amazon AppFlow Custom Connector SDK.
- TiDB Serverless is NOT a production environment.
- To prevent excessive length, the examples in this document only show the
Insert
strategy, butUpdate
andUpsert
strategies are also tested and can be used.