Data Analysis with Jupyter Lab

Analysis data with python & SQL powered by Jupyter Lab

Jupyter Lab is a complete data science R&D env based on IPython Notebook for data analysis and visualization. It is currently an optional Beta feature and is only enabled in the demo by default.

Because JupyterLab provides a Web Terminal feature, it is recommended to use infra-jupyter to deploy it manually on the meta node.

Jupyter Lab - 图1

TL;DR

  1. ./infra-jupyter.yml # install jupyter on meta node on port 8888 with user jupyter and password pigsty
  2. ./infra-jupyter.yml -e jupyter_port=8887 # use another port 8887 by default
  3. ./infra-jupyter.yml -e jupyter_domain=lab.pigsty.cc # use another domain (lab.pigsty by default)
  4. ./infra-jupyter.yml -e jupyter_username=osuser_jupyter jupyter_password=pigsty2 # use another pass

Jupyter Config

NameTypeLevelComment
jupyter_portintegerGJupyter端
jupyter_domainstringGJupyter端口
jupyter_usernamestringGJupyter使用的操作系统用户
jupyter_passwordstringGJupyter Lab的密码

默认值

  1. jupyter_username: jupyter # os user name, special names: default|root (dangerous!)
  2. jupyter_password: pigsty # default password for jupyter lab (important!)
  3. jupyter_port: 8888 # default port for jupyter lab
  4. jupyter_domain: lab.pigsty # domain name used to distinguish jupyter

jupyter_username

The OS user used by Jupyter, type: bool, level: G, default value: "jupyter".

The same goes for other usernames, but the particular username default will run Jupyter Lab with the user currently running the installation (usually admin).

jupyter_password

Password for Jupyter Lab, type: bool, level: G, default value: "pigsty".

If Jupyter is enabled, it is highly recommended to change this password. Salted and obfuscated passwords are written to ~jupyter/.jupyter/jupyter_server_config.json by default.

jupyter_port

Jupyter server listen port, type: int, level: G, default value: 8888.

When JupyterLab is enabled, Pigsty will run the local Notebook server using the user-specified by the jupyter_username parameter.

In addition, you need to make sure that the config node_packages_meta_pip parameter contains the default value 'jupyterlab'.

JupyterLab can be accessed by navigating from the Pigsty home page or through the default domain lab.pigsty, and listens on port 8888.

jupyter_domain

Jupyter domain name, type:string, level: G, default: lab.pigsty

This domain name will be written to /etc/nginx/conf.d/jupyter.conf as nginx upstream.


Jupyter Playbook

infra-jupyter

Playbook infra-jupyter.yml will install JupyterLab on the meta node.

It’s a handy data analysis IDE for python. It’s also risky because of its web shell functionality. So it’s disabled by default. And enabled only in the Demo environment.

Refer to Config: Jupyter for configuring Jupiter, then execute this playbook.

If Jupyter is enabled in the production environment, be sure to change the password of Jupyter.


Access PostgreSQL in Jupyter

You can access PostgreSQL via python driver psycopg2:

  1. import psycopg2
  2. conn = psycopg2.connect('postgres://dbuser_meta:DBUser.Meta@:5432/meta')
  3. cursor = conn.cursor()
  4. cursor.execute("""SELECT date, new_cases FROM covid.country_history WHERE country_code = 'CN';""")
  5. data = cursor.fetchall()

Last modified 2022-06-03: add scaffold for en docs (6a6eded)