Going Deeper into fastai’s Layered API
The fastai library is built on a layered API. In the very top layer there are applications that allow us to train a model in five lines of codes, as we saw in <>. In the case of creating DataLoaders
for a text classifier, for instance, we used the line:
In [ ]:
from fastai.text.all import *
dls = TextDataLoaders.from_folder(untar_data(URLs.IMDB), valid='test')
The factory method TextDataLoaders.from_folder
is very convenient when your data is arranged the exact same way as the IMDb dataset, but in practice, that often won’t be the case. The data block API offers more flexibility. As we saw in the last chapter, we can get the same result with:
In [ ]:
path = untar_data(URLs.IMDB)
dls = DataBlock(
blocks=(TextBlock.from_folder(path),CategoryBlock),
get_y = parent_label,
get_items=partial(get_text_files, folders=['train', 'test']),
splitter=GrandparentSplitter(valid_name='test')
).dataloaders(path)
But it’s sometimes not flexible enough. For debugging purposes, for instance, we might need to apply just parts of the transforms that come with this data block. Or we might want to create a DataLoaders
for some application that isn’t directly supported by fastai. In this section, we’ll dig into the pieces that are used inside fastai to implement the data block API. Understanding these will enable you to leverage the power and flexibility of this mid-tier API.
note: Mid-Level API: The mid-level API does not only contain functionality for creating
DataLoaders
. It also has the callback system, which allows us to customize the training loop any way we like, and the general optimizer. Both will be covered in <>.
Transforms
When we studied tokenization and numericalization in the last chapter, we started by grabbing a bunch of texts:
In [ ]:
files = get_text_files(path, folders = ['train', 'test'])
txts = L(o.open().read() for o in files[:2000])
We then showed how to tokenize them with a Tokenizer
:
In [ ]:
tok = Tokenizer.from_folder(path)
tok.setup(txts)
toks = txts.map(tok)
toks[0]
Out[ ]:
(#374) ['xxbos','xxmaj','well',',','"','cube','"','(','1997',')'...]
and how to numericalize, including automatically creating the vocab for our corpus:
In [ ]:
num = Numericalize()
num.setup(toks)
nums = toks.map(num)
nums[0][:10]
Out[ ]:
tensor([ 2, 8, 76, 10, 23, 3112, 23, 34, 3113, 33])
The classes also have a decode
method. For instance, Numericalize.decode
gives us back the string tokens:
In [ ]:
nums_dec = num.decode(nums[0][:10]); nums_dec
Out[ ]:
(#10) ['xxbos','xxmaj','well',',','"','cube','"','(','1997',')']
and Tokenizer.decode
turns this back into a single string (it may not, however, be exactly the same as the original string; this depends on whether the tokenizer is reversible, which the default word tokenizer is not at the time we’re writing this book):
In [ ]:
tok.decode(nums_dec)
Out[ ]:
'xxbos xxmaj well , " cube " ( 1997 )'
decode
is used by fastai’s show_batch
and show_results
, as well as some other inference methods, to convert predictions and mini-batches into a human-understandable representation.
For each of tok
or num
in the preceding example, we created an object, called the setup
method (which trains the tokenizer if needed for tok
and creates the vocab for num
), applied it to our raw texts (by calling the object as a function), and then finally decoded the result back to an understandable representation. These steps are needed for most data preprocessing tasks, so fastai provides a class that encapsulates them. This is the Transform
class. Both Tokenize
and Numericalize
are Transform
s.
In general, a Transform
is an object that behaves like a function and has an optional setup
method that will initialize some inner state (like the vocab inside num
) and an optional decode
that will reverse the function (this reversal may not be perfect, as we saw with tok
).
A good example of decode
is found in the Normalize
transform that we saw in <>: to be able to plot the images its decode
method undoes the normalization (i.e., it multiplies by the standard deviation and adds back the mean). On the other hand, data augmentation transforms do not have a decode
method, since we want to show the effects on images to make sure the data augmentation is working as we want.
A special behavior of Transform
s is that they always get applied over tuples. In general, our data is always a tuple (input,target)
(sometimes with more than one input or more than one target). When applying a transform on an item like this, such as Resize
, we don’t want to resize the tuple as a whole; instead, we want to resize the input (if applicable) and the target (if applicable) separately. It’s the same for batch transforms that do data augmentation: when the input is an image and the target is a segmentation mask, the transform needs to be applied (the same way) to the input and the target.
We can see this behavior if we pass a tuple of texts to tok
:
In [ ]:
tok((txts[0], txts[1]))
Out[ ]:
((#374) ['xxbos','xxmaj','well',',','"','cube','"','(','1997',')'...],
(#207) ['xxbos','xxmaj','conrad','xxmaj','hall','went','out','with','a','bang'...])
Writing Your Own Transform
If you want to write a custom transform to apply to your data, the easiest way is to write a function. As you can see in this example, a Transform
will only be applied to a matching type, if a type is provided (otherwise it will always be applied). In the following code, the :int
in the function signature means that f
only gets applied to int
s. That’s why tfm(2.0)
returns 2.0
, but tfm(2)
returns 3
here:
In [ ]:
def f(x:int): return x+1
tfm = Transform(f)
tfm(2),tfm(2.0)
Out[ ]:
(3, 2.0)
Here, f
is converted to a Transform
with no setup
and no decode
method.
Python has a special syntax for passing a function (like f
) to another function (or something that behaves like a function, known as a callable in Python), called a decorator. A decorator is used by prepending a callable with @
and placing it before a function definition (there are lots of good online tutorials about Python decorators, so take a look at one if this is a new concept for you). The following is identical to the previous code:
In [ ]:
@Transform
def f(x:int): return x+1
f(2),f(2.0)
Out[ ]:
(3, 2.0)
If you need either setup
or decode
, you will need to subclass Transform
to implement the actual encoding behavior in encodes
, then (optionally), the setup behavior in setups
and the decoding behavior in decodes
:
In [ ]:
class NormalizeMean(Transform):
def setups(self, items): self.mean = sum(items)/len(items)
def encodes(self, x): return x-self.mean
def decodes(self, x): return x+self.mean
Here, NormalizeMean
will initialize some state during the setup (the mean of all elements passed), then the transformation is to subtract that mean. For decoding purposes, we implement the reverse of that transformation by adding the mean. Here is an example of NormalizeMean
in action:
In [ ]:
tfm = NormalizeMean()
tfm.setup([1,2,3,4,5])
start = 2
y = tfm(start)
z = tfm.decode(y)
tfm.mean,y,z
Out[ ]:
(3.0, -1.0, 2.0)
Note that the method called and the method implemented are different, for each of these methods:
asciidoc
[options="header"]
|======
| Class | To call | To implement
| `nn.Module` (PyTorch) | `()` (i.e., call as function) | `forward`
| `Transform` | `()` | `encodes`
| `Transform` | `decode()` | `decodes`
| `Transform` | `setup()` | `setups`
|======
So, for instance, you would never call setups
directly, but instead would call setup
. The reason for this is that setup
does some work before and after calling setups
for you. To learn more about Transform
s and how you can use them to implement different behavior depending on the type of the input, be sure to check the tutorials in the fastai docs.
Pipeline
To compose several transforms together, fastai provides the Pipeline
class. We define a Pipeline
by passing it a list of Transform
s; it will then compose the transforms inside it. When you call Pipeline
on an object, it will automatically call the transforms inside, in order:
In [ ]:
tfms = Pipeline([tok, num])
t = tfms(txts[0]); t[:20]
Out[ ]:
tensor([ 2, 8, 76, 10, 23, 3112, 23, 34, 3113, 33, 10, 8, 4477, 22, 88, 32, 10, 27, 42, 14])
And you can call decode
on the result of your encoding, to get back something you can display and analyze:
In [ ]:
tfms.decode(t)[:100]
Out[ ]:
'xxbos xxmaj well , " cube " ( 1997 ) , xxmaj vincenzo \'s first movie , was one of the most interesti'
The only part that doesn’t work the same way as in Transform
is the setup. To properly set up a Pipeline
of Transform
s on some data, you need to use a TfmdLists
.