Quick Deployment
- Reminder: If you want to experience LINKIS Family Bucket: DSS + Linkis + Qualitis + Visualis + Azkaban, please visit DSS One-Key Deployment
1 Determine the installation environment
2 Simplified version of Linkis environment preparation
3 Simple version of Linkis environment preparation
4 Standard Linkis Environment Preparation
5 Installation and deployment
6. Quickly use Linkis

Quick Deployment

Reminder: If you want to experience LINKIS Family Bucket: DSS + Linkis + Qualitis + Visualis + Azkaban, please visit DSS One-Key Deployment

1 Determine the installation environment

    Linkis provides the following three installation environment preparation methods according to the difficulty of installation, the differences are as follows:

Lite:

    Minimum environment dependency, single-node installation mode, only includes Python engine, and only needs the user's Linux environment to support Python.
    Please note: the lite version only allows users to submit Python scripts.

Simple version:

    depends on Python, Hadoop and Hive, distributed installation mode, including Python engine and Hive engine, requires the user's Linux environment to install Hadoop and Hive first.
    The simple version allows users to submit HiveQL and Python scripts.

Standard Edition

    Depends on Python, Hadoop, Hive and Spark, distributed installation mode, including Python engine, Hive engine and Spark engine, requires the user's Linux environment to install Hadoop first , Hive and Spark, Linkis machines rely on the cluster's hadoop/hive/spark configuration files, and do not need to be deployed with the DataNode and NameNode machines, but can be deployed on a separate Client machine.
    The standard version allows users to submit Spark scripts (including SparkSQL, Pyspark and Scala), HiveQL and Python scripts. **Please note: the installation of the standard version requires the machine's memory to be above 10G** If the machine's memory is not enough, you need to add or modify the environment variable: `export SERVER_HEAP_SIZE="512M"`

2 Simplified version of Linkis environment preparation

2.1. Basic software installation

    The following software must be installed:

MySQL (5.5+), How to install MySQL
JDK (above 1.8.0_141), How to install JDK
Python (support both 2.x and 3.x), How to install Python

2.2 Create User

    For example: **Deployment user is hadoop account**

Create a deployment user on the deployment machine for installation

    sudo useradd hadoop

Because the Linkis service uses sudo -u ${linux-user} to switch engines to perform operations, the deployment user needs to have sudo permissions and is password-free.

    vi /etc/sudoers

     hadoop ALL=(ALL) NOPASSWD: NOPASSWD: ALL

If your Python wants to have the drawing function, you also need to install the drawing module in the installation node. The command is as follows:

    python -m pip install matplotlib

2.3 Installation package preparation

    from the released release of Linkis ([click here to enter the download page](https://github.com/apache/incubator-linkis/releases)), Download the latest installation package.
    First decompress the installation package to the installation directory, and modify the configuration of the decompressed files.

    tar -xvf wedatasphere-linkis-x.x.x-dist.tar.gz

(1) Modify the basic configuration

    vi conf/config.sh

    SSH_PORT=22 #Specify the SSH port, if the stand-alone version is installed, it may not be configured
    deployUser=hadoop #Specify deployment user
    LINKIS_HOME=/appcom/Install/Linkis # Specify the installation directory
    WORKSPACE_USER_ROOT_PATH=file:///tmp/hadoop # Specify the user root directory, which is generally used to store the user's script files and log files, etc. It is the user's workspace.
    RESULT_SET_ROOT_PATH=file:///tmp/linkis # The result set file path, used to store the result set file of the job
    #HDFS_USER_ROOT_PATH=hdfs:///tmp/linkis #This parameter needs to be commented for the streamlined version installation

(2) Modify the database configuration

    vi conf/db.sh

    # Set the connection information of the database
    # Including IP address, database name, user name, port
    # Mainly used to store user-defined variables, configuration parameters, UDF and small functions, and provide the underlying storage of JobHistory
    MYSQL_HOST=
    MYSQL_PORT=
    MYSQL_DB=
    MYSQL_USER=
    MYSQL_PASSWORD=

    The environment is ready, click me to enter \[5-installation deployment\](#5-installation deployment)

3 Simple version of Linkis environment preparation

3.1 Basic software installation

    The following software must be installed:

MySQL (5.5+), How to install MySQL
JDK (above 1.8.0_141), How to install JDK
Python (support both 2.x and 3.x), How to install Python
Hadoop (Community version and versions below CDH3.0 are supported)
Hive (1.2.1, 2.0 and above 2.0, there may be compatibility issues)

3.2 Create User

    For example: **Deployment user is hadoop account**

Create deployment users on all machines that need to be deployed for installation

    sudo useradd hadoop

Because the Linkis service uses sudo -u ${linux-user} to switch engines to perform operations, the deployment user needs to have sudo permissions and is password-free.

    vi /etc/sudoers

     hadoop ALL=(ALL) NOPASSWD: NOPASSWD: ALL

Set the following global environment variables on each installation node so that Linkis can use Hadoop and Hive normally
```
Modify the installation user's .bash\_rc, the command is as follows:
```

    vim /home/hadoop/.bash_rc

The following is an example of environment variables:

    #JDK
    export JAVA_HOME=/nemo/jdk1.8.0_141
    #HADOOP
    export HADOOP_HOME=/appcom/Install/hadoop
    export HADOOP_CONF_DIR=/appcom/config/hadoop-config
    #Hive
    export HIVE_HOME=/appcom/Install/hive
    export HIVE_CONF_DIR=/appcom/config/hive-config

3.3 SSH password-free configuration (required for distributed mode)

    If your Linkis are deployed on the same server, this step can be skipped.
    If your Linkis is deployed on multiple servers, then you also need to configure ssh password-free login for these servers.
    [How to configure SSH password-free login](https://www.jianshu.com/p/0922095f69f3)

3.4 Installation package preparation

    from the released release of Linkis ([click here to enter the download page](https://github.com/apache/incubator-linkis/releases)), Download the latest installation package.
    First decompress the installation package to the installation directory, and modify the configuration of the decompressed files.

    tar -xvf wedatasphere-linkis-x.x.x-dist.tar.gz

(1) Modify the basic configuration

    vi /conf/config.sh

erties
    deployUser=hadoop #Specify deployment user
    LINKIS_HOME=/appcom/Install/Linkis # Specify the installation directory
    WORKSPACE_USER_ROOT_PATH=file:///tmp/hadoop # Specify the user root directory, which is generally used to store the user's script files and log files, etc. It is the user's workspace.
    HDFS_USER_ROOT_PATH=hdfs:///tmp/linkis # Specify the user's HDFS root directory, which is generally used to store the result set files of the job
    # If you want to use it with Scriptis, the CDH version of Hive, you also need to configure the following parameters (the community version of Hive can ignore this configuration)
    HIVE_META_URL=jdbc://... # HiveMeta Metadata Database URL
    HIVE_META_USER= # HiveMeta Metadata Database User
    HIVE_META_PASSWORD= # Password of HiveMeta Metabase
    # Configure hadoop/hive/spark configuration directory
    HADOOP_CONF_DIR=/appcom/config/hadoop-config #hadoop's conf directory
    HIVE_CONF_DIR=/appcom/config/hive-config #hive's conf directory

(2) Modify the database configuration

       vi conf/db.sh


    # Set the connection information of the database
    # Including IP address, database name, user name, port
    # Mainly used to store user-defined variables, configuration parameters, UDF and small functions, and provide the underlying storage of JobHistory
    MYSQL_HOST=
    MYSQL_PORT=
    MYSQL_DB=
    MYSQL_USER=
    MYSQL_PASSWORD=

   The environment is ready, click me to enter \[5-installation deployment\](#5-installation deployment)

4 Standard Linkis Environment Preparation

4.1 Basic software installation

    The following software must be installed:

MySQL (5.5+), How to install MySQL
JDK (above 1.8.0_141), How to install JDK
Python (support both 2.x and 3.x), How to install Python
Hadoop (Community version and versions below CDH3.0 are supported)
Hive (1.2.1, 2.0 and above 2.0, there may be compatibility issues)
Spark (Start from Linkis release 0.7.0, support all versions above Spark 2.0)

4.2 Create User

    For example: **Deployment user is hadoop account**

Create deployment users on all machines that need to be deployed for installation

    sudo useradd hadoop

Because the Linkis service uses sudo -u ${linux-user} to switch engines to perform operations, the deployment user needs to have sudo permissions and is password-free.

    vi /etc/sudoers

    hadoop ALL=(ALL) NOPASSWD: NOPASSWD: ALL

Set the following global environment variables on each installation node so that Linkis can use Hadoop, Hive and Spark normally

Modify the .bash_rc of the installing user, the command is as follows:

    vim /home/hadoop/.bash_rc

The following is an example of environment variables:

    #JDK
    export JAVA_HOME=/nemo/jdk1.8.0_141
    #HADOOP
    export HADOOP_HOME=/appcom/Install/hadoop
    export HADOOP_CONF_DIR=/appcom/config/hadoop-config
    #Hive
    export HIVE_HOME=/appcom/Install/hive
    export HIVE_CONF_DIR=/appcom/config/hive-config
    #Spark
    export SPARK_HOME=/appcom/Install/spark
    export SPARK_CONF_DIR=/appcom/config/spark-config/spark-submit
    export PYSPARK_ALLOW_INSECURE_GATEWAY=1 # Pyspark must add parameters

If your Pyspark wants to have the drawing function, you also need to install the drawing module on all installation nodes. The command is as follows:

    python -m pip install matplotlib

4.3 SSH password-free configuration (required for distributed mode)

    If your Linkis are deployed on the same server, this step can be skipped.
    If your Linkis is deployed on multiple servers, then you also need to configure ssh password-free login for these servers.
    [How to configure SSH password-free login](https://www.jianshu.com/p/0922095f69f3)

4.4 Installation package preparation

    from the released release of Linkis ([click here to enter the download page](https://github.com/apache/incubator-linkis/releases)), Download the latest installation package.
    First decompress the installation package to the installation directory, and modify the configuration of the decompressed files.

    tar -xvf wedatasphere-linkis-x.x.0-dist.tar.gz

(1) Modify the basic configuration

    vi conf/config.sh

    SSH_PORT=22 #Specify the SSH port, if the stand-alone version is installed, it may not be configured
    deployUser=hadoop #Specify deployment user
    LINKIS_HOME=/appcom/Install/Linkis # Specify the installation directory
    WORKSPACE_USER_ROOT_PATH=file:///tmp/hadoop # Specify the user root directory, which is generally used to store the user's script files and log files, etc. It is the user's workspace.
    HDFS_USER_ROOT_PATH=hdfs:///tmp/linkis # Specify the user's HDFS root directory, which is generally used to store the result set files of the job
    # If you want to use it with Scriptis, the CDH version of Hive, you also need to configure the following parameters (the community version of Hive can ignore this configuration)
    HIVE_META_URL=jdbc://... # HiveMeta Metadata Database URL
    HIVE_META_USER= # HiveMeta Metadata Database User
    HIVE_META_PASSWORD= # Password of HiveMeta Metabase
    # Configure hadoop/hive/spark configuration directory
    HADOOP_CONF_DIR=/appcom/config/hadoop-config #hadoop's conf directory
    HIVE_CONF_DIR=/appcom/config/hive-config #hive's conf directory
    SPARK_CONF_DIR=/appcom/config/spark-config #spark's conf directory

(2) Modify the database configuration

    vi conf/db.sh


    # Set the connection information of the database
    # Including IP address, database name, user name, port
    # Mainly used to store user-defined variables, configuration parameters, UDF and small functions, and provide the underlying storage of JobHistory
    MYSQL_HOST=
    MYSQL_PORT=
    MYSQL_DB=
    MYSQL_USER=
    MYSQL_PASSWORD=

5 Installation and deployment

5.1 Execute the installation script:

    sh bin/install.sh

5.2 Installation steps

The install.sh script will ask you about the installation mode.

The installation mode is condensed mode, simple mode or standard mode. Please choose the appropriate installation mode according to the environment you prepare.
The install.sh script will ask you if you need to initialize the database and import metadata.

Because the user is worried that the user repeatedly executes the install.sh script to clear the user data in the database, when the install.sh is executed, the user will be asked if they need to initialize the database and import metadata.

Yes must be selected for the first installation.

5.3 Is the installation successful:

    Check whether the installation is successful by viewing the log information printed on the console.
    If there is an error message, you can check the specific reason for the error.

5.4 Quick start Linkis

(1), start the service:

Execute the following command in the installation directory to start all services:

  ./bin/start-all.sh> start.log 2>start_error.log

(2), check whether the startup is successful

You can check the success of the service startup on the Eureka interface, and check the method:

Use http://${EUREKA\_INSTALL\_IP}:${EUREKA\_PORT}, open it in a browser, and view the server Whether the registration is successful.

If you did not specify EUREKA_INSTALL_IP and EUREKA_INSTALL_IP in config.sh, the HTTP address is: http://127.0.0.1:20303

As shown in the figure below, if the following microservices appear on your Eureka homepage, it means that the services have been started successfully and you can provide services to the outside world normally:

Note: The ones marked in red are DSS services, and the rest are services of Linkis. If you only use linkis, you can ignore the parts marked in red

6. Quickly use Linkis

6.1 Overview

    Linkis provides users with a Java client implementation, and users can use UJESClient to quickly access Linkis back-end services.

6.2 Fast running

    We provide two test classes of UJESClient under the ujes/client/src/test module:

    com.webank.wedatasphere.linkis.ujes.client.UJESClientImplTestJ # Java-based test class
    com.webank.wedatasphere.linkis.ujes.client.UJESClientImplTest # Test class based on Scala implementation

    If you cloned the source code of Linkis, you can run these two test classes directly.

6.3 Quick implementation

    **The following specifically introduces how to quickly implement a linkis code submission and execution.**

6.3.1 maven dependency

<dependency>
  <groupId>com.webank.wedatasphere.Linkis</groupId>
  <artifactId>Linkis-ujes-client</artifactId>
  <version>0.11.0</version>
</dependency>

6.3.2 Reference Implementation

-JAVA

package com.webank.bdp.dataworkcloud.ujes.client;
import com.webank.wedatasphere.Linkis.common.utils.Utils;
import com.webank.wedatasphere.Linkis.httpclient.dws.authentication.StaticAuthenticationStrategy;
import com.webank.wedatasphere.Linkis.httpclient.dws.config.DWSClientConfig;
import com.webank.wedatasphere.Linkis.httpclient.dws.config.DWSClientConfigBuilder;
import com.webank.wedatasphere.Linkis.ujes.client.UJESClient;
import com.webank.wedatasphere.Linkis.ujes.client.UJESClientImpl;
import com.webank.wedatasphere.Linkis.ujes.client.request.JobExecuteAction;
import com.webank.wedatasphere.Linkis.ujes.client.request.ResultSetAction;
import com.webank.wedatasphere.Linkis.ujes.client.response.JobExecuteResult;
import com.webank.wedatasphere.Linkis.ujes.client.response.JobInfoResult;
import com.webank.wedatasphere.Linkis.ujes.client.response.JobProgressResult;
import com.webank.wedatasphere.Linkis.ujes.client.response.JobStatusResult;
import org.apache.commons.io.IOUtils;
import java.util.concurrent.TimeUnit;
public class UJESClientImplTestJ{
    public static void main(String[] args){
        // 1. Configure DWSClientBuilder, get a DWSClientConfig through DWSClientBuilder
        DWSClientConfig clientConfig = ((DWSClientConfigBuilder) (DWSClientConfigBuilder.newBuilder()
                .addUJESServerUrl("http://${ip}:${port}") //Specify ServerUrl, the address of the Linkis server-side gateway, such as http://{ip}:{port}
                .connectionTimeout(30000) //connectionTimeOut client connection timeout
                .discoveryEnabled(true).discoveryFrequency(1, TimeUnit.MINUTES) //Whether to enable registration discovery, if enabled, the newly launched Gateway will be automatically discovered
                .loadbalancerEnabled(true) // Whether to enable load balancing, if registration discovery is not enabled, load balancing is meaningless
                .maxConnectionSize(5) //Specify the maximum number of connections, that is, the maximum number of concurrent
                .retryEnabled(false).readTimeout(30000) //execution failed, whether to allow retry
                .setAuthenticationStrategy(new StaticAuthenticationStrategy()) //AuthenticationStrategy Linkis authentication method
                .setAuthTokenKey("johnnwang").setAuthTokenValue("Abcd1234"))) //Authentication key, generally the user name; authentication value, generally the password corresponding to the user name
                .setDWSVersion("v1").build(); //Linkis backend protocol version, the current version is v1
        // 2. Get a UJESClient through DWSClientConfig
        UJESClient client = new UJESClientImpl(clientConfig);
        // 3. Start code execution
        JobExecuteResult jobExecuteResult = client.execute(JobExecuteAction.builder()
                .setCreator("LinkisClient-Test") //creator, requesting the system name of the Linkis client, used for system-level isolation
                .addExecuteCode("show tables") //ExecutionCode The code to be executed
                .setEngineType(JobExecuteAction.EngineType$.MODULE$.HIVE()) // The execution engine type of Linkis that you want to request, such as Spark hive, etc.
                .setUser("johnnwang") //User, requesting user; used for user-level multi-tenant isolation
                .build());
        System.out.println("execId: "+ jobExecuteResult.getExecID() + ", taskId:" + jobExecuteResult.taskID());
        // 4. Get the execution status of the script
        JobStatusResult status = client.status(jobExecuteResult);
        while(!status.isCompleted()) {
            // 5. Get the execution progress of the script
            JobProgressResult progress = client.progress(jobExecuteResult);
            Utils.sleepQuietly(500);
            status = client.status(jobExecuteResult);
        }
        // 6. Get the job information of the script
        JobInfoResult jobInfo = client.getJobInfo(jobExecuteResult);
        // 7. Get the list of result sets (if the user submits multiple SQL at a time, multiple result sets will be generated)
        String resultSet = jobInfo.getResultSetList(client)[0];
        // 8. Get a specific result set through a result set information
        Object fileContents = client.resultSet(ResultSetAction.builder().setPath(resultSet).setUser(jobExecuteResult.getUser()).build()).getFileContent();
        System.out.println("fileContents: "+ fileContents);
        IOUtils.closeQuietly(client);
    }
}

-SCALA


import java.util.concurrent.TimeUnit
import com.webank.wedatasphere.Linkis.common.utils.Utils
import com.webank.wedatasphere.Linkis.httpclient.dws.authentication.StaticAuthenticationStrategy
import com.webank.wedatasphere.Lin
kis.httpclient.dws.config.DWSClientConfigBuilder
import com.webank.wedatasphere.Linkis.ujes.client.request.JobExecuteAction.EngineType
import com.webank.wedatasphere.Linkis.ujes.client.request.{JobExecuteAction, ResultSetAction}
import org.apache.commons.io.IOUtils
object UJESClientImplTest extends App {
  // 1. Configure DWSClientBuilder, get a DWSClientConfig through DWSClientBuilder
  val clientConfig = DWSClientConfigBuilder.newBuilder()
    .addUJESServerUrl("http://${ip}:${port}") //Specify ServerUrl, the address of the Linkis server-side gateway, such as http://{ip}:{port}
    .connectionTimeout(30000) //connectionTimeOut client connection timeout
    .discoveryEnabled(true).discoveryFrequency(1, TimeUnit.MINUTES) //Whether to enable registration discovery, if enabled, the newly launched Gateway will be automatically discovered
    .loadbalancerEnabled(true) // Whether to enable load balancing, if registration discovery is not enabled, load balancing is meaningless
    .maxConnectionSize(5) //Specify the maximum number of connections, that is, the maximum number of concurrent
    .retryEnabled(false).readTimeout(30000) //execution failed, whether to allow retry
    .setAuthenticationStrategy(new StaticAuthenticationStrategy()) //AuthenticationStrategy Linkis authentication method
    .setAuthTokenKey("${username}").setAuthTokenValue("${password}") //Authentication key, generally the user name; authentication value, generally the password corresponding to the user name
    .setDWSVersion("v1").build() //Linkis backend protocol version, the current version is v1
  // 2. Get a UJESClient through DWSClientConfig
  val client = UJESClient(clientConfig)
  // 3. Start code execution
  val jobExecuteResult = client.execute(JobExecuteAction.builder()
    .setCreator("LinkisClient-Test") //creator, requesting the system name of the Linkis client, used for system-level isolation
    .addExecuteCode("show tables") //ExecutionCode The code to be executed
    .setEngineType(EngineType.SPARK) // The execution engine type of Linkis that you want to request, such as Spark hive, etc.
    .setUser("${username}").build()) //User, request user; used for user-level multi-tenant isolation
  println("execId: "+ jobExecuteResult.getExecID + ", taskId:" + jobExecuteResult.taskID)
  // 4. Get the execution status of the script
  var status = client.status(jobExecuteResult)
  while(!status.isCompleted) {
  // 5. Get the execution progress of the script
    val progress = client.progress(jobExecuteResult)
    val progressInfo = if(progress.getProgressInfo != null) progress.getProgressInfo.toList else List.empty
    println("progress: "+ progress.getProgress + ", progressInfo:" + progressInfo)
    Utils.sleepQuietly(500)
    status = client.status(jobExecuteResult)
  }
  // 6. Get the job information of the script
  val jobInfo = client.getJobInfo(jobExecuteResult)
  // 7. Get the list of result sets (if the user submits multiple SQL at a time, multiple result sets will be generated)
  val resultSet = jobInfo.getResultSetList(client).head
  // 8. Get a specific result set through a result set information
  val fileContents = client.resultSet(ResultSetAction.builder().setPath(resultSet).setUser(jobExecuteResult.getUser).build()).getFileContent
  println("fileContents: "+ fileContents)
  IOUtils.closeQuietly(client)
}