SageMaker Node
Overview
Amazon SageMaker is a fully managed machine learning service. With Amazon SageMaker, data scientists and developers can quickly build and train machine learning models, and then deploy them into a production-ready hosted environment.
Amazon SageMaker Model Building Pipelines is a tool for building machine learning pipelines that take advantage of direct SageMaker integration.
For users using big data and machine learning, SageMaker task plugin help users connect big data workflows with SageMaker usage scenarios.
DolphinScheduler SageMaker task plugin features are as follows:
- Start a SageMaker pipeline execution. Continuously get the execution status until the pipeline completes execution.
Create Task
- Click
Project -> Management-Project -> Name-Workflow Definition
, and click the “Create Workflow” button to enter the DAG editing page. - Drag from the toolbar task node to canvas.
Task Example
- Please refer to DolphinScheduler Task Parameters Appendix for default parameters.
Here are some specific parameters for the SagaMaker plugin:
- SagemakerRequestJson: Request parameters of StartPipelineExecution,see also AWS API
The task plugin are shown as follows:
Environment to prepare
Some AWS configuration is required, modify a field in file common.properties
# The AWS access key. if resource.storage.type=S3 or use EMR-Task, This configuration is required
resource.aws.access.key.id=<YOUR AWS ACCESS KEY>
# The AWS secret access key. if resource.storage.type=S3 or use EMR-Task, This configuration is required
resource.aws.secret.access.key=<YOUR AWS SECRET KEY>
# The AWS Region to use. if resource.storage.type=S3 or use EMR-Task, This configuration is required
resource.aws.region=<AWS REGION>