Importing data from S3-compatible storage
Running the import s3
command starts, on the server side, importing data and information about data schema objects from S3-compatible storage in the format described in the File structure article:
ydb [connection options] import s3 [options]
where [connection options] are database connection options
Unlike tools restore, the import s3
command always creates entire objects, meaning that none of the objects being imported (neither directories nor tables) should exist for the command to run successfully.
If you need to import additional data from S3 to existing tables, you can copy the S3 contents to the file system (for example, using S3cmd) and run the tools restore command.
Command line parameters
[options]
: Command parameters:
S3 connection parameters
To run the command to import data from S3, make sure to specify the S3 connection parameters. Since data import is performed asynchronously by the YDB server, the specified endpoint must be available to establish a server-side connection.
List of imported objects
--item STRING
: Description of the object to import. The --item
parameter can be specified several times if you need to import multiple objects. The STRING
format is <property>=<value>,...
, with the following properties required:
source
,src
, ors
: Path to S3 (key prefix) specifying the directory or table to import.destination
,dst
, ord
: Path to the DB that will store the imported directory or table. The final element of the path must not exist. All directories specified in the path will be created if they don’t exist.
Additional parameters
--description STRING
: Operation text description stored in the history of operations. --retries NUM
: Number of import retries the server will make. Defaults to 10.--format STRING
: Result output format.
pretty
: Human-readable format (default).proto-json-base64
: Protobuf that supports JSON values encoded as binary strings using base64 encoding.
Importing data
Import result
If successful , the import s3
command outputs summary information about the enqueued operation for importing data from S3 in the format specified in the --format
option. The actual import operation is performed by the server asynchronously. The summary displays the operation ID that can be used later to check the status and actions with the operation:
In the
pretty
output mode used by default, the operation identifier is output in the id field with semigraphics formatting:┌───────────────────────────────────────────┬───────┬─────...
| id | ready | stat...
├───────────────────────────────────────────┼───────┼─────...
| ydb://import/8?id=281474976788395&kind=s3 | true | SUCC...
├╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴╴┴╴╴╴╴╴╴╴┴╴╴╴╴╴...
| Items:
...
In proto-json-base64 output mode, the ID is in the “id” attribute:
{"id":"ydb://export/8?id=281474976788395&kind=s3","ready":true, ... }
Import status
Data is imported in the background. You can get information about the status and progress of the import operation by running the operation get
command with the quoted operation ID passed as the command parameter. For example:
ydb -p db1 operation get "ydb://import/8?id=281474976788395&kind=s3"
The format of the operation get
command output is also specified in the --format
option.
Although the operation ID format is URL, there is no guarantee that it’s retained later. It should only be interpreted as a string.
You can track the completion of the import operation by changes in the “progress” attribute:
In the
pretty
output mode used by default, a successful operation is indicated by the “Done” value in theprogress
field with semigraphics formatting:┌───── ... ──┬───────┬─────────┬──────────┬─...
| id | ready | status | progress | ...
├──────... ──┼───────┼─────────┼──────────┼─...
| ydb:/... | true | SUCCESS | Done | ...
├╴╴╴╴╴ ... ╴╴┴╴╴╴╴╴╴╴┴╴╴╴╴╴╴╴╴╴┴╴╴╴╴╴╴╴╴╴╴┴╴...
...
In proto-json-base64 output mode, a completed operation is indicated by the
PROGRESS_DONE
value of theprogress
attribute:{"id":"ydb://...", ...,"progress":"PROGRESS_DONE",... }
Ending the import operation
Once the data is imported, use the operation forget
command to make sure the import operation is removed from the list of operations:
ydb -p db1 operation forget "ydb://import/8?id=281474976788395&kind=s3"
List of import operations
To get a list of import operations, run the operation list import/s3
command:
ydb -p db1 operation list import/s3
The format of the operation list
command output is also specified in the --format
option.
Examples
The examples use a profile named db1
. For information about how to create it, see the Getting started with the YDB CLI article in the “Getting started “ section.
Importing data to the DB root
Importing the contents of the export1
directory in the mybucket
bucket to the root of the database, using S3 authentication parameters from environment variables or the ~/.aws/credentials
file:
ydb -p db1 import s3 \
--s3-endpoint storage.yandexcloud.net --bucket mybucket \
--item src=export1,dst=.
Importing multiple directories
Importing objects from the dir1 and dir2 directories of the mybucket
S3 bucket to the same-name DB directories using explicitly specified authentication parameters in S3:
ydb -p db1 import s3 \
--s3-endpoint storage.yandexcloud.net --bucket mybucket \
--access-key VJGSOScgs-5kDGeo2hO9 --secret-key fZ_VB1Wi5-fdKSqH6074a7w0J4X0 \
--item src=export/dir1,dst=dir1 --item src=export/dir2,dst=dir2
Getting operation IDs
To get a list of import operation IDs in a format that is convenient for processing in bash scripts, use jq:
ydb -p db1 operation list import/s3 --format proto-json-base64 | jq -r ".operations[].id"
You’ll get an output where each new line contains the operation ID. For example:
ydb://import/8?id=281474976789577&kind=s3
ydb://import/8?id=281474976789526&kind=s3
ydb://import/8?id=281474976788779&kind=s3
These IDs can be used, for example, to run a loop that will end all current operations:
ydb -p db1 operation list import/s3 --format proto-json-base64 | jq -r ".operations[].id" | while read line; do ydb -p db1 operation forget $line;done