5.3 Column Transformation
Whereas the @select
macro variants performs column selection, the @transform
macro variants do not perform any column selection. It can either overwrite existent columns or create new columns that will be added to the right of our DataFrame
.
For example, the previous operation on :grade
can be invoked as a transformation with:
@rtransform df :grade_100 = :grade * 10
name | grade | grade_100 |
---|---|---|
Sally | 1.0 | 10.0 |
Bob | 5.0 | 50.0 |
Alice | 8.5 | 85.0 |
Hank | 4.0 | 40.0 |
Bob | 9.5 | 95.0 |
Sally | 9.5 | 95.0 |
Hank | 6.0 | 60.0 |
As you can see, @transform
does not perform column selection, and the :grade_100
column is created as a new column and added to the right of our DataFrame
.
DataFramesMeta.jl
macros also support begin ... end
statements. For example, suppose that you are creating two columns in a @transform
macro:
@rtransform df :grade_100 = :grade * 10 :grade_5 = :grade / 2
name | grade | grade_100 | grade_5 |
---|---|---|---|
Sally | 1.0 | 10.0 | 0.5 |
Bob | 5.0 | 50.0 | 2.5 |
Alice | 8.5 | 85.0 | 4.25 |
Hank | 4.0 | 40.0 | 2.0 |
Bob | 9.5 | 95.0 | 4.75 |
Sally | 9.5 | 95.0 | 4.75 |
Hank | 6.0 | 60.0 | 3.0 |
It can be cumbersome and difficult to read the performed transformations. To facilitate that, we can use begin ... end
statements and put one transformation per line:
@rtransform df begin
:grade_100 = :grade * 10
:grade_5 = :grade / 2
end
name | grade | grade_100 | grade_5 |
---|---|---|---|
Sally | 1.0 | 10.0 | 0.5 |
Bob | 5.0 | 50.0 | 2.5 |
Alice | 8.5 | 85.0 | 4.25 |
Hank | 4.0 | 40.0 | 2.0 |
Bob | 9.5 | 95.0 | 4.75 |
Sally | 9.5 | 95.0 | 4.75 |
Hank | 6.0 | 60.0 | 3.0 |
We can also use other columns in our transformations, which makes DataFramesMeta.jl
more appealing than DataFrames.jl
due to the easier syntax.
First, let’s revisit the leftjoined DataFrame
from Chapter 4:
leftjoined = leftjoin(grades_2020(), grades_2021(); on=:name)
name | grade_2020 | grade_2021 |
---|---|---|
Sally | 1.0 | 9.5 |
Hank | 4.0 | 6.0 |
Bob | 5.0 | missing |
Alice | 8.5 | missing |
Additionally, we’ll replace the missing values with 5
(Section 4.9, also note the !
in in-place variant @rtransform!
):
@rtransform! leftjoined :grade_2021 = coalesce(:grade_2021, 5)
name | grade_2020 | grade_2021 |
---|---|---|
Sally | 1.0 | 9.5 |
Hank | 4.0 | 6.0 |
Bob | 5.0 | 5 |
Alice | 8.5 | 5 |
This is how you calculate the mean of grades in both years using DataFramesMeta.jl
:
@rtransform leftjoined :mean_grades = (:grade_2020 + :grade_2021) / 2
name | grade_2020 | grade_2021 | mean_grades |
---|---|---|---|
Sally | 1.0 | 9.5 | 5.25 |
Hank | 4.0 | 6.0 | 5.0 |
Bob | 5.0 | 5 | 5.0 |
Alice | 8.5 | 5 | 6.75 |
This is how you would perform it in DataFrames.jl
:
transform(leftjoined, [:grade_2020, :grade_2021] => ByRow((x, y) -> (x + y) / 2) => :mean_grades)
name | grade_2020 | grade_2021 | mean_grades |
---|---|---|---|
Sally | 1.0 | 9.5 | 5.25 |
Hank | 4.0 | 6.0 | 5.0 |
Bob | 5.0 | 5 | 5.0 |
Alice | 8.5 | 5 | 6.75 |
As you can see, the case for easier syntax is not hard to argue for DataFramesMeta.jl
.
Support this project
CC BY-NC-SA 4.0 Jose Storopoli, Rik Huijzer, Lazaro Alonso