4.1 Load and Save Files
Having only data inside Julia programs and not being able to load or save it would be very limiting. Therefore, we start by mentioning how to store files to and load files from disk. We focus on CSV, see Section 4.1.1, and Excel, see Section 4.1.2, file formats since those are the most common data storage formats for tabular data.
4.1.1 CSV
Comma-separated values (CSV) files are are very effective way to store tables. CSV files have two advantages over other data storage files. First, it does exactly what the name indicates it does, namely storing values by separating them using commas ,
. This acronym is also used as the file extension. So, be sure that you save your files using the “.csv” extension such as “myfile.csv.” To demonstrate how a CSV file looks, we can add the CSV.jl package using the Pkg
REPL mode (Section 3.5.4.2):
julia> ]
pkg> add CSV
and load it via:
using CSV
We can now use our previous data:
grades_2020()
name | grade_2020 |
---|---|
Sally | 1.0 |
Bob | 5.0 |
Alice | 8.5 |
Hank | 4.0 |
and read it from a file after writing it:
function write_grades_csv()
path = "grades.csv"
CSV.write(path, grades_2020())
end
path = write_grades_csv()
read(path, String)
name,grade_2020
Sally,1.0
Bob,5.0
Alice,8.5
Hank,4.0
Here, we also see the second benefit of CSV data format: the data can be read by using a simple text editor. This differs from many alternative data formats which require proprietary software, e.g. Excel.
This works wonders, but what if our data contains commas ,
as values? If we were to naively write data with commas, it would make the files very hard to convert back to a table. Luckily, CSV.jl
handles this for us automatically. Consider the following data with commas ,
:
function grades_with_commas()
df = grades_2020()
df[3, :name] = "Alice,"
df
end
grades_with_commas()
name | grade_2020 |
---|---|
Sally | 1.0 |
Bob | 5.0 |
Alice, | 8.5 |
Hank | 4.0 |
If we write this, we get:
function write_comma_csv()
path = "grades-commas.csv"
CSV.write(path, grades_with_commas())
end
path = write_comma_csv()
read(path, String)
name,grade_2020
Sally,1.0
Bob,5.0
"Alice,",8.5
Hank,4.0
So, CSV.jl
adds quotation marks "
around the comma-containing values. Another common way to solve this problem is to write the data to a tab-separated values (TSV) file format. This assumes that the data doesn’t contain tabs, which holds in most cases.
Also, note that TSV files can also be read using a simple text editor, and these files use the “.tsv” extension.
function write_comma_tsv()
path = "grades-comma.tsv"
CSV.write(path, grades_with_commas(); delim='\t')
end
read(write_comma_tsv(), String)
name grade_2020
Sally 1.0
Bob 5.0
Alice, 8.5
Hank 4.0
Text file formats like CSV and TSV files can also be found that use other delimiters, such as semicolons “;” spaces “ ,” or even something as unusual as “π.”
function write_space_separated()
path = "grades-space-separated.csv"
CSV.write(path, grades_2020(); delim=' ')
end
read(write_space_separated(), String)
name grade_2020
Sally 1.0
Bob 5.0
Alice 8.5
Hank 4.0
By convention, it’s still best to give files with special delimiters, such as “;” the “.csv” extension.
Loading CSV files using CSV.jl
is done in a similar way. You can use CSV.read
and specify in what kind of format you want the output. We specify a DataFrame
.
path = write_grades_csv()
CSV.read(path, DataFrame)
name | grade_2020 |
---|---|
Sally | 1.0 |
Bob | 5.0 |
Alice | 8.5 |
Hank | 4.0 |
Conveniently, CSV.jl
will automatically infer column types for us:
path = write_grades_csv()
df = CSV.read(path, DataFrame)
4×2 DataFrame
Row │ name grade_2020
│ String7 Float64
─────┼─────────────────────
1 │ Sally 1.0
2 │ Bob 5.0
3 │ Alice 8.5
4 │ Hank 4.0
It works even for far more complex data:
my_data = """
a,b,c,d,e
Kim,2018-02-03,3,4.0,2018-02-03T10:00
"""
path = "my_data.csv"
write(path, my_data)
df = CSV.read(path, DataFrame)
1×5 DataFrame
Row │ a b c d e
│ String3 Date Int64 Float64 DateTime
─────┼──────────────────────────────────────────────────────────
1 │ Kim 2018-02-03 3 4.0 2018-02-03T10:00:00
These CSV basics should cover most use cases. For more information, see the CSV.jl documentation and especially the CSV.File constructor docstring.
4.1.2 Excel
There are multiple Julia packages to read Excel files. In this book, we will only look at XLSX.jl, because it is the most actively maintained package in the Julia ecosystem that deals with Excel data. As a second benefit, XLSX.jl
is written in pure Julia, which makes it easy for us to inspect and understand what’s going on under the hood.
Load XLSX.jl
via
using XLSX:
eachtablerow,
readxlsx,
writetable
To write files, we define a little helper function for data and column names:
function write_xlsx(name, df::DataFrame)
path = "$name.xlsx"
data = collect(eachcol(df))
cols = names(df)
writetable(path, data, cols)
end
Now, we can easily write the grades to an Excel file:
function write_grades_xlsx()
path = "grades"
write_xlsx(path, grades_2020())
"$path.xlsx"
end
When reading it back, we will see that XLSX.jl
puts the data in a XLSXFile
type and we can access the desired sheet
much like a Dict
:
path = write_grades_xlsx()
xf = readxlsx(path)
XLSXFile("grades.xlsx") containing 1 Worksheet
sheetname size range
-------------------------------------------------
Sheet1 5x2 A1:B5
xf = readxlsx(write_grades_xlsx())
sheet = xf["Sheet1"]
eachtablerow(sheet) |> DataFrame
name | grade_2020 |
---|---|
Sally | 1.0 |
Bob | 5.0 |
Alice | 8.5 |
Hank | 4.0 |
Notice that we cover just the basics of XLSX.jl
but more powerful usage and customizations are available. For more information and options, see the XLSX.jl documentation.
Support this project
CC BY-NC-SA 4.0 Jose Storopoli, Rik Huijzer, Lazaro Alonso