1 Preface
There are many programming languages and each and every one of them has its strengths and weaknesses. Some languages are very quick, but verbose. Other languages are very easy to write in, but slow. This is known as the two-language problem and Julia aims at circumventing this problem. Even though all three of us come from different fields, we all found the Julia language more effective for our research than languages that we’ve used before. We discuss some of our arguments in Section 2. However, compared to other languages, Julia is one of the newest languages around. This means that the ecosystem around the language is sometimes difficult to navigate through. It’s difficult to figure out where to start and how all the different packages fit together. That is why we decided to create this book! We wanted to make it easier for researchers, and especially our colleagues, to start using this awesome language.
As discussed above, each language has its strengths and weaknesses. In our opinion, data science is definitely a strength of Julia. At the same time, all three of us used data science tools in our day to day life. And, probably, you want to use data science too! That is why this book has a focus on data science.
In the next part of this section, we emphasize the “data” part of data science and why data skills are, and will remain, in high demand in industry as well as in academia. We make an argument for incorporating software engineering practices into data science which should reduce friction when updating and sharing code with collaborators. Most data analyses are collaborative endeavors; that is why these software practices will help you.
1.0.1 Data is Everywhere
Data is abundant and will be even more so in the near future. A report from late 2012 concluded that, from 2005 to 2020, the amount of data stored digitally will grow by a factor of 300, from 130 exabytes1 to a whopping 40,000 exabytes (Gantz & Reinsel, 2012). This is equal to 40 trillion gigabytes and, to put it into perspective, more than 5.2 terabytes for every living human currently on this planet! In 2020, on average, every person created 1.7 MB of data per second (Domo, 2018). A recent report predicted that almost two thirds (65%) of national GDPs will have undergone digitization by 2022 (Fitzgerald et al., 2020).
Every profession will be impacted by the increasing availability of data and data’s increased importance (Chen et al., 2014; Khan et al., 2014). Data is used to communicate and build knowledge, and to make decisions. This is why data skills are important. If you become comfortable with handling data, you will become a valuable researcher or professional. In other words, you will become data literate.
Support this project
CC BY-NC-SA 4.0 Jose Storopoli, Rik Huijzer, Lazaro Alonso