Production Deployment Reference Guide
1 Introduction
Linkis has been running stably on the WeBank big data production platform for more than two years. The development and operation personnel have summarized a set of Linkis production deployment guidelines to Let Linkis exert its maximum performance on the basis of stable operation, while also saving server resources and reducing usage costs. The guide includes two major categories: deployment method selection and parameter configuration. Finally, Linkis has also been tested in the test environment for a long time. We will give our stress test practice and experience in Chapter 4.
2 Deployment plan selection
Linkis's stand-alone deployment is simple, but it cannot be used in a production environment, because too many processes on the same server will make the server too stressful.
The choice of deployment plan is related to the company’s user scale, user habits, and the number of concurrent users in the cluster. Generally speaking, we will use Linkis At the same time, the number of users and the user's preference for the execution engine are used to make the choice based on the deployment method.
The following is a detailed description of the number of simultaneous users. Assuming that users prefer spark the most, hive is the second, and it is recommended that the server host memory is 64G or more.
**On the machine where EngineManager is installed, because the user's engine process will be started, the machine's memory load will be relatively high, and other types of microservices will affect the machine The load is relatively low.**
**We generally recommend to reserve about 20G on the server where EM is installed for use by the Linux system, EM's own process and other processes, such as 128G memory For the server, after removing the 20G memory, there is still 100G of memory that can be used to start the engine process. For example, if a Spark Driver has 4G memory, then the server can start up to 25 spark engines.**
The formula for calculating the total resources used: Total resources used by Linkis = total memory + total number of cores =
Number of people online at the same time * (Driver or Hive client memory) + number of people online at the same time * (Driver or Hive client cores)
For example, if there are 50 people using at the same time, Spark’s Driver memory is 2G, Hive Client memory is 2G, and each engine uses two cores, then it is 50 * 2G + 50 * 2 cores = 100G memory + 100 CPU cores
Convention before parameter configuration (must see):
1. The parameters are generally configured in linkis.properties of the conf directory in the microservice installation directory, and configured in the form of key=value, such as wds.linkis.enginemanager.cores.max=20. The only exception is that the configuration of engine microservices needs to be configured in linkis-engine.properties.
2. After the parameter configuration, the microservice needs to be restarted to take effect. After the engine parameter configuration, after the engine manager of the page is killed, restart the engine to take effect
A reference deployment plan is provided below.
2.1 The number of simultaneous users 10-50
1). The best recommendation for server configuration: 4 servers, named S1, S2, S3, S4
Service Name | Deployment Selection | Description |
---|---|---|
SparkEngineManger | S1 | SparkEM requires an exclusive server, because it is assumed that the user most prefers spark (if hive is preferred, it can be modified) |
SparkEntrance | S2 | |
HiveEngineManager | S3 | |
HiveEntrance | S2 | |
PythonEngineManager | S3 | |
PythonEntrance | S2 | |
Others (Eureka, gateway, etc.) | S4 | If this machine is under too much pressure, you can add another server to deploy services separately |
2). Minimum server configuration: 2 servers
3). Parameter configuration
If you need to do this, you need to configure it in linkis.properties and linkis-engine.properties in the conf directory under the microservice installation directory. Parameter configuration is generally divided into two parameter types, Entrance and EngineManager.
a) Entrance microservice
Parameter name | Parameter function | Suggested parameter value |
---|---|---|
wds.linkis.rpc.receiver.asyn.queue.size.max | Specify the queue size of RPC messages received by the entrance microservice | 2000 |
wds.linkis.rpc.receiver.asyn.consumer.thread.max | Specify Entrance microservice RPC consumption thread pool size | 100 |
b) EngineManager microservice
Note: Linkis defines the concept of protecting resources. The purpose of protecting resources is to reserve a certain amount of resources. EM will not use up the maximum resources and activate the role of protecting the machine.
Parameter name | Parameter function | Suggested parameter value |
---|---|---|
wds.linkis.enginemanager.memory.max | Used to specify the total memory of all engines started by the EM process | 40G (64) or 100G (128) |
wds.linkis.enginemanager.cores.max | Used to specify the total number of cores of all engines started by the EM process | 20 |
wds.linkis.enginemanager.engine.instances.max | Used to specify the total number of all engines started by the EM process | 20 |
wds.linkis.enginemanager.protected.memory | Used to specify the memory used by the EM process for protection | 2G (meaning that up to 38 (40-2) G of memory can be used) |
wds.linkis.enginemanager.protected.cores.max | Used to specify the number of cores used for protection by the EM process | 2 (meaning that up to 18 (20-2) cores can be used) |
wds.linkis.enginemanager.protected.engine.instances | Used to specify the number of engines used for protection by the EM process | 1 (meaning that up to 19 (20-1) engines can be started) |
2.2 Number of concurrent users 50-100
1). Recommended server configuration: 7 servers, named S1, S2…S7
Service Name | Deployment Selection | Description |
---|---|---|
SparkEngineManger | S1, S2 | |
SparkEntrance | S5 | |
HiveEngineManager | S3, S4 | |
HiveEntrance | S5 | |
PythonEngineManager | S4 | |
PythonEntrance | S4 | |
Eureka, Gateway, RM | S6 | Eureka and RM require high availability deployment |
PublicService, RM, Datasource, Eureka | S7 | Eureka and RM require high availability deployment |
2). Minimum server configuration: 4 servers
3). Parameter configuration
a) Entrance microservice
Parameter name | Parameter function | Suggested parameter value |
---|---|---|
wds.linkis.rpc.receiver.asyn.queue.size.max | Specify the queue size of RPC messages received by the entrance microservice | 3000 |
wds.linkis.rpc.receiver.asyn.consumer.thread.max | Specify Entrance microservice RPC consumption thread pool size | 120 |
b) EngineManager microservice
Parameter name | Parameter function | Suggested parameter value |
---|---|---|
wds.linkis.enginemanager.memory.max | Used to specify the total memory of all engines started by the EM process | 40G (64) or 100G (128) |
wds.linkis.enginemanager.cores.max | Used to specify the total number of cores of all engines started by the EM process | 20 |
wds.linkis.enginemanager.engine.instances.max | Used to specify the total number of all engines started by the EM process | 20 |
wds.linkis.enginemanager.protected.memory | Used to specify the memory used by the EM process for protection | 2G (meaning that up to 38 (40-2) G of memory can be used) |
wds.linkis.enginemanager.protected.cores.max | Used to specify the number of cores used for protection by the EM process | 2 (meaning that up to 18 (20-2) cores can be used) |
wds.linkis.enginemanager.protected.engine.instances | Used to specify the number of engines used for protection by the EM process | 1 (meaning that up to 19 (20-1) engines can be started) |
2.3 Number of simultaneous users 100-300
1). Recommended server configuration: 11 servers, named S1, S2…S11
Service Name | Deployment Selection | Description |
---|---|---|
SparkEngineManger | S1, S2, S3, S4 | |
SparkEntrance | S8 | |
HiveEngineManager | S5, S6, S7 | |
HiveEntrance | S8 | |
PythonEngineManager | S9 | |
PythonEntrance | S9 | |
Eureka, Gateway, RM | S10 | Eureka and RM require high availability deployment |
PublicService, RM, Datasource, Eureka | s11 | Eureka and RM require high availability deployment |
2). Minimum server configuration: 6 servers
3). Parameter configuration
a) Entrance microservice
Parameter name | Parameter function | Suggested parameter value |
---|---|---|
wds.linkis.rpc.receiver.asyn.queue.size.max | Specify the queue size of RPC messages received by the entrance microservice | 4000 |
wds.linkis.rpc.receiver.asyn.consumer.thread.max | Specify Entrance microservice RPC consumption thread pool size | 150 |
b) EngineManager microservice
Parameter name | Parameter function | Suggested parameter value |
---|---|---|
wds.linkis.enginemanager.memory.max | Used to specify the total memory of all engines started by the EM process | 40G (64) or 100G (128) |
wds.linkis.enginemanager.cores.max | Used to specify the total number of cores of all engines started by the EM process | 20 |
wds.linkis.enginemanager.engine.instances.max | Used to specify the total number of all engines started by the EM process | 20 |
wds.linkis.enginemanager.protected.memory | Used to specify the memory used by the EM process for protection | 2G (meaning that up to 38 (40-2) G of memory can be used) |
wds.linkis.enginemanager.protected.cores.max | Used to specify the number of cores used for protection by the EM process | 2 (meaning that up to 18 (20-2) cores can be used) |
wds.linkis.enginemanager.protected.engine.instances | Used to specify the number of engines used for protection by the EM process | 1 (meaning that up to 19 (20-1) engines can be started) |
2.4 Number of concurrent users 300-500
1). Recommended server configuration 15 servers, named S1, S2, S3, S4
Service Name | Deployment Selection | Description |
---|---|---|
SparkEngineManger | S1, S2, S3, S4, S5, S6, S7 | |
SparkEntrance | S12 | |
HiveEngineManager | S8, S9, S10, S11 | |
HiveEntrance | S12 | |
PythonEngineManager | S13 | |
PythonEntrance | S13 | |
Eureka, Gateway, RM | S14 | Eureka and RM require high availability deployment |
PublicService, RM, Datasource, Eureka | s15 | Eureka and RM require high availability deployment |
2). Minimum server configuration: 10 servers
3). Parameter configuration
a) Entrance microservice
Parameter name | Parameter function | Suggested parameter value |
---|---|---|
wds.linkis.rpc.receiver.asyn.queue.size.max | Specify the queue size of RPC messages received by the entrance microservice | 5000 |
wds.linkis.rpc.receiver.asyn.consumer.thread.max | Specify Entrance microservice RPC consumption thread pool size | 150 |
b) EngineManager microservice
Parameter name | Parameter function | Suggested parameter value |
---|---|---|
wds.linkis.enginemanager.memory.max | Used to specify the total memory of all engines started by the EM process | 40G (64) or 100G (128) |
wds.linkis.enginemanager.cores.max | Used to specify the total number of cores of all engines started by the EM process | 20 |
wds.linkis.enginemanager.engine.instances.max | Used to specify the total number of all engines started by the EM process | 20 |
wds.linkis.enginemanager.protected.memory | Used to specify the memory used by the EM process for protection | 2G (meaning that up to 38 (40-2) G of memory can be used) |
wds.linkis.enginemanager.protected.cores.max | Used to specify the number of cores used for protection by the EM process | 2 (meaning that up to 18 (20-2) cores can be used) |
wds.linkis.enginemanager.protected.engine.instances | Used to specify the number of engines used for protection by the EM process | 1 (meaning that up to 19 (20-1) engines can be started) |
2.5 The number of simultaneous users is more than 500
1). Recommended server configuration: 25 servers, named S1, S2.. S19, S25
Service Name | Deployment Selection | Description |
---|---|---|
SparkEngineManger | S1, S2, S3, S4, S5, S6, S7 | |
S8, S9, S10 | ||
SparkEntrance | S17 | |
HiveEngineManager | S11,S12,S13,S14,S15, | |
S16 | ||
HiveEntrance | S17 | |
PythonEngineManager | S18, S19 | |
PythonEntrance | S20 | |
Eureka, RM | S21 | Eureka and RM require high availability deployment |
RM, ,Eureka | S22 | Eureka and RM require high availability deployment |
Eureka, PublicService | S23 | Eureka and RM require high availability deployment |
Gateway, Datasource | S24 |
2). Minimum server configuration: 15 servers
3). Parameter configuration
a) Entrance microservice
Parameter name | Parameter function | Suggested parameter value |
---|---|---|
wds.linkis.rpc.receiver.asyn.queue.size.max | Specify the queue size of RPC messages received by the entrance microservice | 5000 |
wds.linkis.rpc.receiver.asyn.consumer.thread.max | Specify Entrance microservice RPC consumption thread pool size | 200 |
b) EngineManager microservice
Parameter name | Parameter function | Suggested parameter value |
---|---|---|
wds.linkis.enginemanager.memory.max | Used to specify the total memory of all engines started by the EM process | 40G (64) or 100G (128) |
wds.linkis.enginemanager.cores.max | Used to specify the total number of cores of all engines started by the EM process | 20 |
wds.linkis.enginemanager.engine.instances.max | Used to specify the total number of all engines started by the EM process | 20 |
wds.linkis.enginemanager.protected.memory | Used to specify the memory used by the EM process for protection | 2G (meaning that up to 38 (40-2) G of memory can be used) |
wds.linkis.enginemanager.protected.cores.max | Used to specify the number of cores used for protection by the EM process | 2 (meaning that up to 18 (20-2) cores can be used) |
wds.linkis.enginemanager.protected.engine.instances | Used to specify the number of engines used for protection by the EM process | 1 (meaning that up to 19 (20-1) engines can be started) |
3 Other general parameter configuration
In addition to the two types of microservices, Entrance and EngineManager, Linkis has other microservices that also have their own parameters for configuration.
3.1 PublicService custom configuration
The publicService microservice carries various auxiliary functions run by Linkis, including file editing and saving, and result set reading.
Parameter name | Parameter function | Suggested parameter value |
---|---|---|
wds.linkis.workspace.filesystem.get.timeout | Used to specify the timeout time for obtaining the file system | 10000 (unit is ms) |
wds.linkis.workspace.resultset.download.maxsize | Used to specify the maximum number of rows of the download result set | 5000 (up to 5000 downloads) or -1 (full download) |
3.2 Engine Microservice
Engine microservices are available at any time, including spark, hive and python engines. The configuration parameters of engine microservices need to be modified in linkis-engine.properties under conf in the EngineManager installation directory.
Parameter name | Parameter function | Suggested parameter value |
---|---|---|
wds.linkis.engine.max.free.time | Used to specify how long an engine will be killed if idle | 3h (meaning that an engine will be automatically killed after three hours of not performing a task) |
4 Summary
The deployment plan of Linkis is closely related to how it is used. At the same time, the number of users is the biggest influencing factor. In order to enable users to use it comfortably and reduce the cost of cluster servers, it is necessary for operation and maintenance developers to try and listen to user feedback. If it has been deployed The plan is inappropriate, and the deployment plan needs to be changed in a timely and appropriate manner.