Amazon EMR
Overview
Amazon EMR task type, for operation EMR clusters on AWS and running computing tasks. Using aws-java-sdk in the background code, to transfer JSON parameters to task object and submit to AWS, Two program types are currently supported:
RUN_JOB_FLOW
Using API_RunJobFlow submit RunJobFlowRequest objectADD_JOB_FLOW_STEPS
Using API_AddJobFlowSteps submit AddJobFlowStepsRequest object
Create Task
- Click
Project Management -> Project Name -> Workflow Definition
, click theCreate Workflow
button to enter the DAG editing page. - Drag
AmazonEMR
task from the toolbar to the artboard to complete the creation.
Task Parameters
- Please refer to DolphinScheduler Task Parameters Appendix for default parameters.
Parameter | Description |
---|---|
Program Type | Select the program type. If it is RUN_JOB_FLOW , you need to fill in jobFlowDefineJson , if it is ADD_JOB_FLOW_STEPS , you need to fill in stepsDefineJson . |
jobFlowDefineJson | JSON corresponding to the RunJobFlowRequest object, for details refer to API_RunJobFlow_Examples. |
stepsDefineJson | JSON corresponding to the AddJobFlowStepsRequest object, for details refer to API_AddJobFlowSteps_Examples. |
Task Example
Create an EMR cluster and run Steps
This example shows how to create an EMR
task node of type RUN_JOB_FLOW
. Taking the execution of SparkPi
as an example, the task will create an EMR
cluster and execute the SparkPi
sample program.
jobFlowDefineJson example
{
"Name": "SparkPi",
"ReleaseLabel": "emr-5.34.0",
"Applications": [
{
"Name": "Spark"
}
],
"Instances": {
"InstanceGroups": [
{
"Name": "Primary node",
"InstanceRole": "MASTER",
"InstanceType": "m4.xlarge",
"InstanceCount": 1
}
],
"KeepJobFlowAliveWhenNoSteps": false,
"TerminationProtected": false
},
"Steps": [
{
"Name": "calculate_pi",
"ActionOnFailure": "CONTINUE",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args": [
"/usr/lib/spark/bin/run-example",
"SparkPi",
"15"
]
}
}
],
"JobFlowRole": "EMR_EC2_DefaultRole",
"ServiceRole": "EMR_DefaultRole"
}
Add a Step to a Running EMR Cluster
This example shows how to create an EMR
task node of type ADD_JOB_FLOW_STEPS
. Taking the execution of SparkPi
as an example, the task will add a SparkPi
sample program to the running EMR
cluster.
stepsDefineJson example
{
"JobFlowId": "j-3V628TKAERHP8",
"Steps": [
{
"Name": "calculate_pi",
"ActionOnFailure": "CONTINUE",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args": [
"/usr/lib/spark/bin/run-example",
"SparkPi",
"15"
]
}
}
]
}
Notice
- Failover on EMR Task type has not been implemented. In this time, DolphinScheduler only supports failover on yarn task type . Other task type, such as EMR task, k8s task not ready yet.
stepsDefineJson
A task definition only supports the association of a single step, which can better ensure the reliability of the task state.
当前内容版权归 DolphinScheduler 或其关联方所有,如需对内容或内容相关联开源项目进行关注与资助,请访问 DolphinScheduler .