8.1 Overview
This intermezzo chapter discusses several approaches to speed up tasks that require commands and pipelines to be run many times. The main goal of this chapter is to demonstrate to you the flexibility and power of a tool called GNU Parallel. Because this tool can be combined with any other tool discussed in this book, it will positively change the way you use the command line for data science. In this chapter, you’ll learn about:
- Running commands in serial to a range of numbers, lines, and files.
- Breaking a large task into several smaller tasks.
- Running pipelines in parallel using GNU Parallel.
- Distributing pipelines on multiple machines.
当前内容版权归 Jeroen Janssens 或其关联方所有,如需对内容或内容相关联开源项目进行关注与资助,请访问 Jeroen Janssens .