CREATE REPOSITORY
You can use the CREATE REPOSITORY
statement to register a new repository that you can use to create, manage, and restore snapshots.
See also
Synopsis
CREATE REPOSITORY repository_name TYPE type
[ WITH (parameter_name [= value], [, ...]) ]
Description
The CREATE REPOSITORY
statement creates a repository with a repository name and repository type. You can configure the different types of repositories WITH additional parameters.
Note
If the back-end data storage (specific to the repository type) already contains CrateDB snapshots, they will become available to the cluster.
See also
System information: Repositories
Parameters
repository_name
The name of the repository to register.
type
The repository type.
Caution
You cannot change any repository parameters after creating the repository (including parameters set by the WITH clause).
Suppose you want to use new parameters for an existing repository. In that case, you must first drop the repository using the DROP REPOSITORY statement and then recreate it with a new CREATE REPOSITORY
statement.
When you drop a repository, CrateDB deletes the corresponding record from sys.repositories but does not delete any snapshots from the corresponding backend data storage. If you create a new repository using the same backend data storage, any existing snapshots will become available again.
Clauses
WITH
You can use the WITH
clause to specify one or more repository parameter values:
[ WITH (parameter_name [= value], [, ...]) ]
Parameters
The following parameters apply to all repository types:
max_restore_bytes_per_sec
The maximum rate (bytes per second) at which a single CrateDB node will read snapshot data from this repository.
Default: 40mb
max_snapshot_bytes_per_sec
The maximum rate (bytes per second) at which a single CrateDB node will write snapshot data to this repository.
Default: 40mb
All other parameters (see the next section) are specific to the repository type.
Types
CrateDB includes built-in support for the following types:
CrateDB can support additional types via plugins.
fs
An fs
repository stores snapshots on the local file system. If a cluster has multiple nodes, you must use a shared data storage volume mounted locally on all master nodes and data nodes.
Note
To create fs
repositories, you must configure the list of allowed file system paths using the path.repo setting.
Parameters
location
Type: text
Required
An absolute or relative path to the directory where CreateDB will store snapshots. If the path is relative, CrateDB will append it to the first entry in the path.repo setting.
Windows UNC paths are allowed if the server name and shares are specified and backslashes are escaped.
The path must be allowed by the path.repo setting.
compress
Type: boolean
Default: true
Whether CrateDB should compress the metadata part of the snapshot or not.
CrateDB does not compress the actual table data.
chunk_size
Type: bigint
or text
Default: null
Defines the maximum size of any single file that comprises the snapshot. If set to null
, CrateDB will not split big files into smaller chunks. You can specify the chunk size with units (e.g., 1g
, 5m
, or 9k
). If no unit is specified, the unit defaults to bytes.
hdfs
An hdfs
repository stores snapshots on a Hadoop Distributed File System (HDFS).
Parameters
uri
Type: text
Default: The default URI for the given HDFS configuration
HDFS URIs take the form of:
hdfs://<host>:<port>/
security.principal
Type: text
A qualified Kerberos principal used to authenticate against HDFS.
path
Type: text
The HDFS file system path to use for snapshots.
load_defaults
Type: boolean
Default: true
Whether to load the default Hadoop configuration.
conf.<key>
Type: various
Dynamic configuration values to be added to the Hadoop configuration.
compress
Type: boolean
Default: true
Whether CrateDB should compress the metadata part of the snapshot or not.
CrateDB does not compress the actual table data.
chunk_size
Type: bigint
or text
Default: null
Defines the maximum size of any single file that comprises the snapshot. If set to null
, CrateDB will not split big files into smaller chunks. You can specify the chunk size with units (e.g., 1g
, 5m
, or 9k
). If no unit is specified, the unit defaults to bytes.
s3
An s3
repository stores snapshot on the Amazon Simple Storage Service (Amazon S3).
Note
If you are using Amazon S3 in conjunction with IAM roles, the access_key
and secret_key
parameters must be left undefined.
Additionally, make sure to attach the IAM to each EC2 instance that will run a CrateDB master node or data node. The attached IAM role will provide the necessary credentials when required.
Parameters
access_key
Type: text
Required: false
Access key used for authentication against Amazon Web Services (AWS).
Note
CrateDB masks this parameter. You cannot query the parameter value from the sys.repositories table.
secret_key
Type: text
Required: false
The secret key used for authentication against AWS.
Note
CrateDB masks this parameter. You cannot query the parameter value from the sys.repositories table.
endpoint
Type: text
Default: The default AWS API endpoint
The AWS API endpoint to use.
Tip
You can specify a regional endpoint to force the use of a specific AWS region.
protocol
Type: text
Values: http
, https
Default: https
Protocol to use.
bucket
Type: text
Name of the Amazon S3 bucket used for storing snapshots.
If the bucket does not yet exist, CrateDB will attempt to create a new bucket on Amazon S3.
base_path
Type: text
Default: root directory
The bucket path to use for snapshots.
The path is relative, so the base_path
value must not start with a /
character.
compress
Type: boolean
Default: true
Whether CrateDB should compress the metadata part of the snapshot or not.
CrateDB does not compress the actual table data.
chunk_size
Type: bigint
or text
Default: null
Defines the maximum size of any single file that comprises the snapshot. If set to null
, CrateDB will not split big files into smaller chunks. You can specify the chunk size with units (e.g., 1g
, 5m
, or 9k
). If no unit is specified, the unit defaults to bytes.
readonly
Type: boolean
Default: false
If true
, the repository is read-only.
server_side_encryption
Type: boolean
Default: false
If true
, files are server-side encrypted by AWS using the AES256
algorithm.
buffer_size
Type: text
Default: 5mb
Minimum: 5mb
If a chunk is smaller than buffer_size
, CrateDB will upload the chunk with a single request.
If a chunk exceeds buffer_size
, CrateDB will split the chunk into multiple parts of buffer_size
length and upload them separately.
max_retries
Type: integer
Default: 3
The number of retries in case of errors.
use_throttle_retries
Type: boolean
Default: true
Whether CrateDB should throttle retries (i.e., should back off).
canned_acl
Type: text
Values: private
, public-read
, public-read-write
, authenticated-read
, log-delivery-write
, bucket-owner-read
, or bucket-owner-full-control
Default: private
When CrateDB creates new buckets and objects, the specified Canned ACL is added.
azure
An azure
repository stores snapshots on the Azure Blob storage service.
Parameters
account
Type: text
The Azure Storage account name.
Note
CrateDB masks this parameter. You cannot query the parameter value from the sys.repositories table.
key
Type: text
The Azure Storage account secret key.
Note
CrateDB masks this parameter. You cannot query the parameter value from the sys.repositories table.
endpoint_suffix
Type: text
Default: core.windows.net
The Azure Storage account endpoint suffix.
Tip
You can use an endpoint suffix to force the use of a specific Azure service region.
container
Type: text
Default: crate-snapshots
The blob container name.
Note
You must create the container before creating the repository.
base_path
Type: text
Default: root directory
The container path to use for snapshots.
compress
Type: boolean
Default: true
Whether CrateDB should compress the metadata part of the snapshot or not.
CrateDB does not compress the actual table data.
chunk_size
Type: bigint
or text
Default: 256mb
Maximum: 256mb
Minimum: 1b
Defines the maximum size of any single file that comprises the snapshot. If set to null
, CrateDB will not split big files into smaller chunks. You can specify the chunk size with units (e.g., 1g
, 5m
, or 9k
). If no unit is specified, the unit defaults to bytes.
readonly
Type: boolean
Default: false
If true
, the repository is read-only.
location_mode
Type: text
Values: primary_only
, secondary_only
Default: primary_only
The location mode for storing blob data.
Note
If you set location_mode
to secondary_only
, readonly
will be forced to true
.
max_retries
Type: integer
Default: 3
The number of retries (in the case of failures) before considering the snapshot to be failed.
timeout
Type: text
Default: 30s
The client side timeout for any single request to Azure.
proxy_type
Type: text
Values: http
, socks
, or direct
Default: direct
The type of proxy to use when connecting to Azure.
proxy_host
Type: text
The hostname of the proxy.
proxy_port
Type: integer
Default: 0
The port number of the proxy.
url
A url
repository provides read-only access to an fs repository via one of the supported network access protocols.
You can use a url
repository to restore snapshots.
Parameters
url
Type: text
The root URL of the fs repository.
Note
The URL must match one of the URLs configured by the repositories.url.allowed_urls setting.