Jupyter
Overview
Use Jupyter Task
to create a jupyter-type task and execute jupyter notes. When the worker executes Jupyter Task
, it will use papermill
to evaluate jupyter notes. Click here for details about papermill
.
Conda Configuration
- Config
conda.path
incommon.properties
to the path of yourconda.sh
, which should be the sameconda
you use to manage the python environment of yourpapermill
andjupyter
. Click here for more information aboutconda
. conda.path
is set to/opt/anaconda3/etc/profile.d/conda.sh
by default. If you have no idea where yourconda
is, simply runconda info | grep -i 'base environment'
.
NOTE:
Jupyter Task Plugin
usessource
command to activate conda environment. If your tenant does not have permission to usesource
,Jupyter Task Plugin
will not function.
Python Dependency Management
Use Pre-Installed Conda Environment
- Create a conda environment manually or using
shell task
on your target worker. - In your
jupyter task
, setcondaEnvName
as the name of the conda environment you just created.
Use Packed Conda Environment
- Use Conda-Pack to pack your conda environment into
tarball
. - Upload packed conda environment to
resource center
. - Set
condaEnvName
as the name of your packed conda environment in yourjupyter task
, e.g.jupyter_env.tar.gz
. - Select your packed conda environment as
resource
in yourjupyter task
, e.g.jupyter_env.tar.gz
.
NOTE: Make sure you follow the Conda-Pack official instructions. If you unpack your packed conda environment, the directory structure should be the same as below:
.
├── bin
├── conda-meta
├── etc
├── include
├── lib
├── share
└── ssl
NOTICE: Please follow the
conda pack
instructions above strictly, and DO NOT modifybin/activate
.Jupyter Task Plugin
usessource
command to activate your packed conda environment. If you are concerned about usingsource
, choose other options to manage your python dependency.
Construct From Requirements
- Upload or create a
.txt
file of requirements with your python dependencies inResource Center
. - Set
condaEnvName
as the name of your file of requirements in yourjupyter task
, e.g.requirements.txt
. - Select your file of requirements as
resource
in yourjupyter task
, e.g.requirements.txt
.
Here is an example file of requirements, from which jupyter task plugin
will automatically construct your python dependencies, run your python code and finally tear down the environment:
fastjsonschema==2.15.3
fonttools==4.33.3
geojson==2.5.0
identify==2.4.11
idna==3.3
importlib-metadata==4.11.3
importlib-resources==5.7.1
ipykernel==5.5.6
ipython==8.2.0
ipython-genutils==0.2.0
jedi==0.18.1
Jinja2==3.1.1
json5==0.9.6
jsonschema==4.4.0
jupyter-client==7.3.0
jupyter-core==4.10.0
jupyter-server==1.17.0
jupyterlab==3.3.4
jupyterlab-pygments==0.2.2
jupyterlab-server==2.13.0
kiwisolver==1.4.2
MarkupSafe==2.1.1
matplotlib==3.5.2
matplotlib-inline==0.1.3
mistune==0.8.4
nbclassic==0.3.7
nbclient==0.6.0
nbconvert==6.5.0
nbformat==5.3.0
nest-asyncio==1.5.5
notebook==6.4.11
notebook-shim==0.1.0
numpy==1.22.3
packaging==21.3
pandas==1.4.2
pandocfilters==1.5.0
papermill==2.3.4
Create Task
- Click
Project Management-Project Name-Workflow Definition
, and click theCreate Workflow
button to enter the DAG editing page. - Drag from the toolbar to the canvas.
Task Parameters
- Please refer to DolphinScheduler Task Parameters Appendix for default parameters.
Parameter | Description |
---|---|
Conda Env Name | Name of conda environment or packed conda environment tarball. |
Input Note Path | Path of input jupyter note template. |
Out Note Path | Path of output note. |
Jupyter Parameters | Parameters in json format used for jupyter note parameterization. |
Kernel | Jupyter notebook kernel. |
Engine | Engine to evaluate jupyter notes. |
Jupyter Execution Timeout | Timeout set for each jupyter notebook cell. |
Jupyter Start Timeout | Timeout set for jupyter notebook kernel. |
Others | Other command options for papermill. |
Task Example
Jupyter Task Example
This example illustrates how to create a jupyter task node.