Amazon EMR
Overview
Amazon EMR task type, for creating EMR clusters on AWS and running computing tasks. Using aws-java-sdk in the background code, to transfer JSON parameters to RunJobFlowRequest object and submit to AWS.
Create Task
- Click
Project Management -> Project Name -> Workflow Definition
, click theCreate Workflow
button to enter the DAG editing page. - Drag
AmazonEMR
task from the toolbar to the artboard to complete the creation.
Task Parameters
Parameter | Description |
---|---|
Node name | The node name in a workflow definition is unique. |
Run flag | Identifies whether this node schedules normally, if it does not need to execute, select the prohibition execution . |
Description | Describe the function of the node. |
Task priority | When the number of worker threads is insufficient, execute in the order of priority from high to low, and tasks with the same priority will execute in a first-in first-out order. |
Worker grouping | Assign tasks to the machines of the worker group to execute. If Default is selected, randomly select a worker machine for execution. |
Times of failed retry attempts | The number of times the task failed to resubmit. You can select from drop-down or fill-in a number. |
Failed retry interval: The time interval for resubmitting the task after a failed task. You can select from drop-down or fill-in a number. | |
Timeout alarm | Check the timeout alarm and timeout failure. When the task runs exceed the “timeout”, an alarm email will send and the task execution will fail. |
JSON | JSON corresponding to the RunJobFlowRequest object, for details refer to API_RunJobFlow_Examples. |
JSON example
{
"Name": "SparkPi",
"ReleaseLabel": "emr-5.34.0",
"Applications": [
{
"Name": "Spark"
}
],
"Instances": {
"InstanceGroups": [
{
"Name": "Primary node",
"InstanceRole": "MASTER",
"InstanceType": "m4.xlarge",
"InstanceCount": 1
}
],
"KeepJobFlowAliveWhenNoSteps": false,
"TerminationProtected": false
},
"Steps": [
{
"Name": "calculate_pi",
"ActionOnFailure": "CONTINUE",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args": [
"/usr/lib/spark/bin/run-example",
"SparkPi",
"15"
]
}
}
],
"JobFlowRole": "EMR_EC2_DefaultRole",
"ServiceRole": "EMR_DefaultRole"
}
当前内容版权归 DolphinScheduler 或其关联方所有,如需对内容或内容相关联开源项目进行关注与资助,请访问 DolphinScheduler .