Chapter 4 Creating Reusable Command-line Tools
Throughout the book, we use a lot of commands and pipelines that basically fit on one line. Let us call those one-liners. Being able to perform complex tasks with just a one-liner is what makes the command line powerful. It’s a very different experience from writing traditional programs.
Some tasks you perform only once, and some you perform more often. Some tasks are very specific and others can be generalized. If you foresee or notice that you need to repeat a certain one-liner on a regular basis, it is worthwhile to turn this into a command-line tool of its own. So, both one-liners and command-line tools have their uses. Recognizing the opportunity requires practice and skill. The advantage of a command-line tool is that you do not have to remember the entire one-liner and that it improves readability if you include it into some other pipeline.
The benefit of a working with a programming language, however, is that you have the code in a file. This means that you can easily reuse that code. If the code has parameters it can even be applied to problems that follow a similar pattern.
Command-line tools have the best of both worlds: they can be used from the command line, accept parameters, and only have to be created once. In this chapter we’re going to get familiar creating reusable command-line tools in two ways. First, we explain to turn those one-liners into reusable command-line tools. By adding parameters to our commands, we can add the same flexibility that a programming language offers. Subsequently, we demonstrate how to create reusable command-line tools from code you have written in a programming language. By following the UNIX philosophy, your code can be combined with other command-line tools, which may be written in an entirely different language. We will focus on three programming languages: Python, R, and Java.
We believe that creating reusable command-line tools makes you a more efficient and productive data scientist in the long run. You gradually build up your own data science toolbox from which you can draw existing tools and apply it to problems you have encountered previously. It requires practice in order to be able to recognize the opportunity to turn a one-liner or existing code into a command-line tool.
In order to turn a one-liner into a shell script, we need to use some shell scripting. We shall only demonstrate the usefulness a small subset of concepts from shell scripting. This subset includes variables, conditionals, and loops. A complete course in shell scripting deserves a book on its own, and is therefore beyond the scope of this one. If you want to dive deeper into shell scripting, we recommend Classic Shell Scripting by Robbins and Beebe (2005).