Process

  • [Doc] Process
  • [Doc] Child Processes
  • [Doc] Cluster
  • [Basic] IPC
  • [Basic] Daemon

Introduction

For Process, we will discuss two concepts,① the process of the operating system, ② the Process object in Node.js. Operation process is basis for the server-side just like Html for the Front-end. No one can do server-side programming without Unix/Linux. Excuting ps -ef command in Linux/Unix/Mac system, and you will see the running processes of the current system.
Each parameter is as follows:

Column name Meaning
UID User ID of the process’s owner
PID Process ID number
PPID ID number of the process’s parent process
C CPU usage
STIME Time when the process started
TTY Terminal associated with the process
TIME Total CPU time used by the process since it started
CMD Command and arguments for the process

For more details about the process and the operating system, you can read the APUE(Advanced Programming in the UNIX® Environment).

Process

Here we will discuss the process object in Node.js. It can be printed out by using console.log (process) in the code. You can see the process object exposed a lot of useful properties and methods. For more details you can refer Official document, which has been very detailed,
including but not limited to:

  • The basic information of the process
  • The usage of the process
  • Process Events
  • Dependencies/versions
  • The basic information of the operating system platform
  • The information of the user
  • Signal Events
  • The three standard streams

process.nextTick

The previous chapter has already mentioned process.nextTick, which is an important, basic method you have to know.

  1. ┌───────────────────────┐
  2. ┌─>│ timers
  3. └──────────┬────────────┘
  4. ┌──────────┴────────────┐
  5. I/O callbacks
  6. └──────────┬────────────┘
  7. ┌──────────┴────────────┐
  8. idle, prepare
  9. └──────────┬────────────┘ ┌───────────────┐
  10. ┌──────────┴────────────┐ incoming:
  11. poll │<─────┤ connections,
  12. └──────────┬────────────┘ data, etc.
  13. ┌──────────┴────────────┐ └───────────────┘
  14. check
  15. └──────────┬────────────┘
  16. ┌──────────┴────────────┐
  17. └──┤ close callbacks
  18. └───────────────────────┘

process.nextTick is not technically part of the event loop. Instead, the nextTickQueue will be processed after the current operation completes, regardless of the current phase of the event loop. So the question is coming, what will happen if you call process.nextTick recursively?(doge

  1. function test() {
  2. process.nextTick(() => test());
  3. }

What is the difference between this situation and the following? Why?

  1. function test() {
  2. setTimeout(() => test(), 0);
  3. }

Configuration

Configuration is a very common problem in development deployments. As usual, there are two ways for configuration, one is to define configuration file, another is to to use the environment variables.

node-configuration

You can specify the configuration by Setting Environment Variables, then obtain the configuration item by using process.env. In addition, you can obtain by reading the configuration file. There are many excellent libraries such as Dotenv, node-config, etc. in this field. But when loading the configuration file by using these libraries, it usually encounters a problem with the current working directory.

What’s the current working directory of the process? What’s it for?

You can obtain the current working directory by using process.cwd(). It usually is the directory when the command line starts. It can also be specified at startup. File operations, etc. obtain the file by using the relative path which is relative to the current working directory.

Some of the third-party modules that can obtain the configuration look for the configuration file through your current directory. So if running the start script in the wrong directory, you will get wrong rusults. You can change working directory by using process.chdir() in your code.

Standard Stream

The process object also exposes three standard stream, process.stderr, process.stdout, process.stdin. If you have used C/C++/Java before, you will not feel unfamiliar with this. So the common interview questions come: Is console.log synchronous or asynchronous? How to implement a console.log?

If there are keywords such as C/C++ in your resume, you will be asked how to implement a synchronous input(similar to the scanf in the C, cin in C++, raw_input in Python, etc.).

Maintaining

Familiar with basic commands about process, such as top, ps, pstree , etc.

Child Process

Child Process is an important concept in the process. In Node.js, you can use child_process module to execute executable files, call commands in command line , such as programs in other languages, etc. You can also execute js code as a sub-process by using this module. The well-known Netease’s distributed architecture pomelo is based on the module(not cluster) to implement the multi-process distributed architecture.

What’re the difference between child_process.fork and fork in POSIX?

In Node.js, child_process.fork() calls POSIX fork(2). You need manually manage the release of resources in the child process for fork POSIX. You don’t need to care about this problem when using child_process.fork, beacuse Node.js will automatic release, and provide options whether the child process can survive after the parent process is destroyed.

  • spawn() - Spawns a child process to execute the command
    • options.detached - Whether the child process can survive after the parent process is destroyed
    • options.stdio - Configure the three pipes that are established between the parent and child process
  • spawnSync() - A synchronous version of spawn().
    You can set the timeout, and obtain the child process by the return object
  • exec() - Spawns a child process to execute the command, with the callback parameters to get information of the child process. You can set the timeout for the process
  • execSync() - A synchronous version of exec(). You can set the timeout. It returns stdout of the child process
  • execFile() - Spawns a child process to execute an executable file. You can set the timeout for the process
  • execFileSync() - A synchronous version of execFile(). It returns stdout of the child process. It will throw Error if it is timeout or its exit code is not 0
  • fork() - Enhanced version of spawn(). It returns a child process object and allows sending messages between parent and child.

The exec/execSync method will directly call bash to explain the command. So if there are external parameters of the command, you need to pay attention to the situation was injected.

child.kill and child.send

The common interview question is what are the differences between child.kill and child.send.
One is based on the signal system, the other is based on IPC.

Does the death of parent process or child process affect each other? What is an orphan process?

The death of a child process will not affect the parent process. When the child process dies (the last thread of the thread group, usually when the “lead” thread dies), it will send a death signal to its parent process. On the other hand, when the parent process dies, by default, the child process will follow the death. But at this time, if the child process is in the operational state, dead state, etc., it will be adopted by process identifier 1(the init system process) and become an orphaned process. In addition, when the child process dies(“terminated” state), the parent process does not call wait() or waitpid() to return the child’s infomation in time, there is a PCB remaining in the process table. The child process is called a zombie process.

Cluster

Cluster is a common way to use multi-core systems in Node.js. It is based on child_process.fork () implementation. For this reason, the worker processes can communicate with the parent via IPC, and do not copy parent’s memory space. You can distinguish between the parent process and the child process by adding cluster.isMaster, making it
similar to the fork in POSIX.

  1. const cluster = require('cluster'); // | |
  2. const http = require('http'); // | |
  3. const numCPUs = require('os').cpus().length; // | | Both executed
  4. // | |
  5. if (cluster.isMaster) { // |-|-----------------
  6. // Fork workers. // |
  7. for (var i = 0; i < numCPUs; i++) { // |
  8. cluster.fork(); // |
  9. } // | Only the parent process is executed (a.js)
  10. cluster.on('exit', (worker) => { // |
  11. console.log(`${worker.process.pid} died`); // |
  12. }); // |
  13. } else { // |-------------------
  14. // Workers can share any TCP connection // |
  15. // In this case it is an HTTP server // |
  16. http.createServer((req, res) => { // |
  17. res.writeHead(200); // | Only the child process is executed (b.js)
  18. res.end('hello world\n'); // |
  19. }).listen(8000); // |
  20. } // |-------------------
  21. // | |
  22. console.log('hello'); // | | Both executed

In the code above numCPUs is a global variable. However, it will not change in the child process when modified in the parent process, because the child process and the parent process run in separate memory spaces. The so-called shared is that they both run, but in separate memory spaces.

The execution of the parent process can be seen as a.js, and the execution of the child process seen as b.js. You can imagine that it executes node a.js first, and then cluster.fork several times(execute node b.js several times). The cluster module is a bridge between them. They two can communicate between each other by using methods provided by cluster.

How It Works

The worker processes are spawned using the child_process.fork() method, so that they can communicate with the parent via IPC and pass server handles back and forth.

The cluster module supports two methods of distributing incoming connections.

The first one (and the default one on all platforms except Windows), is the round-robin approach, where the master process listens on a port, accepts new connections and distributes them across the workers in a round-robin fashion, with some built-in smarts to avoid overloading a worker process.

The second approach is where the master process creates the listen socket and sends it to interested workers. The workers then accept incoming connections directly.

The second approach should, in theory, give the best performance. In practice however, distribution tends to be very unbalanced due to operating system scheduler vagaries. Loads have been observed where over 70% of all connections ended up in just two processes, out of a total of eight.

IPC(Inter-process communication)

Inter-process communication techniques can be divided into various types. These are:

Type Without Connnection Stability Flow Control Priority
Pipe N Y Y N
Named Pipe N Y Y N
Message Queues N Y Y N
Semaphores N Y Y Y
Shared Memory N Y Y Y
UNIX Stream Socket N Y Y N
UNIX Datagram Socket Y Y N N

IPC in Node.js is implemented through Pipe based on libuv. It is implemented by Named Pipe(the second item in the list above) in windows, and UDS (Unix Domain Socket) in *nix.

Ordinary socket is designed for network communications, which itself is unreliable. But the socket for the IPC is not the case, because local network environment is reliable by default. So you can simplify much unnecessary encode/decode and calculate the verification, etc., get more efficient UDS communication.

If understanding the IPC in Node.js, you will be asked an interesting question:

Before IPC channel was
set up, how the parent process and the child process
communicate between each other? If there is no communication, how is IPC set up?

This question is very simple, just a problem of thinking. When you create a child process via child_process, you can specify the env (environment variable) of the child process. When starting the child process in Node.js, the main process sets up the IPC channel first, then pass the fd(file descriptor) of the IPC channel to the child process via environment variable (NODE_CHANNEL_FD). Then the child process connects to the parent process via fd.

Finally, for the issue of inter-process communication (IPC), we generally do not directly ask the IPC implementation, but will ask under what conditions you need IPC, and the use of IPC to deal with any business scene.

Daemon Process

Daemon Process is a very basic concept of the server side. Many people may only know that we can start a process as a daemon by using tools such as pm2, but not what is a process and why using it. For excellent guys, daemon process implement should be known.

The normal process will be directly shut down after the user exits the terminal. The Process starting with & and running in the background will be shut down when the session (session group) is released. The daemon process is not dependent on the terminal(tty) process and will not be shut down because of the user exiting the terminal.

  1. // Daemon Process Implement (Written in C)
  2. void init_daemon()
  3. {
  4. pid_t pid;
  5. int i = 0;
  6. if ((pid = fork()) == -1) {
  7. printf("Fork error !\n");
  8. exit(1);
  9. }
  10. if (pid != 0) {
  11. exit(0); // parent process exits
  12. }
  13. setsid(); // the child process opens a new session and becomes the session header and the process group leader
  14. if ((pid = fork()) == -1) {
  15. printf("Fork error !\n");
  16. exit(-1);
  17. }
  18. if (pid != 0) {
  19. exit(0); // End the first process, and the second process is not the session header any more.
  20. // avoid the current session group to re-connect with the tty
  21. }
  22. chdir("/tmp"); // change the working directory
  23. umask(0); // reset the file umask
  24. for (; i < getdtablesize(); ++i) {
  25. close(i); // close the file descriptor
  26. }
  27. return;
  28. }

Code for Daemon Process in Node.js