8.5 Discussion
As data scientists, we work with data, and sometimes a lot of data. This means that sometimes we need to run a command multiple times or distribute data-intensive commands over multiple cores. In this chapter we have shown you how easy it is to parallelize commands. GNU Parallel is a very powerful and flexible tool to speed up ordinary command-line tools and distribute them over multiple cores and remote machines. It offers a lot of functionality and in this chapter we’ve only been able to scratch the surface. Some features of GNU Parallel are that we haven’t covered:
- Different ways of specifying input.
- Keep a log of all the jobs.
- Only start new jobs when the machine is under a certain load.
- Timeout, resume, and retry jobs.Once you have a basic understanding of GNU Parallel and its most important options, we recommend that you take a look at its tutorial listed in the Further Reading section.
当前内容版权归 Jeroen Janssens 或其关联方所有,如需对内容或内容相关联开源项目进行关注与资助,请访问 Jeroen Janssens .