1.1 What is Data Science?
Data science is not only machine learning and statistics, and it’s not all about prediction. Alas, it is not even a discipline fully contained within STEM (Science, Technology, Engineering, and Mathematics) fields (Meng, 2019). But one thing that we can assert with high confidence is that data science is always about data. Our aims of this book are twofold:
- We focus on the backbone of data science: data.
- We use the Julia programming language to process the data.
We cover why Julia is an extremely effective language for data science in Section 2. For now, let’s turn our attention towards data.
1.1.1 Data Literacy
According to Wikipedia, the formal definition of data literacy is “the ability to read, understand, create, and communicate data as information.”. We also like the informal idea that, being data literate, you won’t feel overwhelmed by data, but instead can use it to make the right decisions. Data literacy can be seen as a highly competitive skill to possess. In this book we’ll cover two aspects of data literacy:
- Data Manipulation with
DataFrames.jl
(Chapter 4) andDataFramesMeta.jl
(Chapter 5). In these chapters you will learn how to:- Read CSV and Excel data into Julia.
- Process data in Julia, that is, learn how to answer data questions.
- Filter and subset data.
- Handle missing data.
- Join multiple data sources together.
- Group and summarize data.
- Export data out of Julia to CSV and Excel files.
- Data Visualization with
Makie.jl
(Chapter 6). In this chapter you will learn how to:- Plot data with different
Makie.jl
backends. - Save visualizations in several formats such as PNG or PDF.
- Use different plotting functions to make diverse data visualizations.
- Customize visualizations with attributes.
- Use and create new plotting themes.
- Add \(\LaTeX\) elements to plots.
- Manipulate color and palettes.
- Create complex figure layouts.
- Plot data with different
Support this project
CC BY-NC-SA 4.0 Jose Storopoli, Rik Huijzer, Lazaro Alonso