Learner for the text application
All the functions necessary to build Learner
suitable for transfer learning in NLP
/usr/local/lib/python3.8/dist-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at /pytorch/c10/cuda/CUDAFunctions.cpp:100.)
return torch._C._cuda_getDeviceCount() > 0
The most important functions of this module are language_model_learner
and text_classifier_learner
. They will help you define a Learner
using a pretrained model. See the text tutorial for exmaples of use.
Loading a pretrained model
In text, to load a pretrained model, we need to adapt the embeddings of the vocabulary used for the pre-training to the vocabulary of our current corpus.
match_embeds
[source]
match_embeds
(old_wgts
,old_vocab
,new_vocab
)
Convert the embedding in old_wgts
to go from old_vocab
to new_vocab
.
For words in new_vocab
that don’t have a corresponding match in old_vocab
, we use the mean of all pretrained embeddings.
wgts = {'0.encoder.weight': torch.randn(5,3)}
new_wgts = match_embeds(wgts.copy(), ['a', 'b', 'c'], ['a', 'c', 'd', 'b'])
old,new = wgts['0.encoder.weight'],new_wgts['0.encoder.weight']
test_eq(new[0], old[0])
test_eq(new[1], old[2])
test_eq(new[2], old.mean(0))
test_eq(new[3], old[1])
load_ignore_keys
[source]
load_ignore_keys
(model
,wgts
)
Load wgts
in model
ignoring the names of the keys, just taking parameters in order
clean_raw_keys
[source]
clean_raw_keys
(wgts
)
load_model_text
[source]
load_model_text
(file
,model
,opt
,with_opt
=None
,device
=None
,strict
=True
)
Load model
from file
along with opt
(if available, and if with_opt
)
class
TextLearner
[source]
TextLearner
(dls
,model
,alpha
=2.0
,beta
=1.0
,moms
=(0.8, 0.7, 0.8)
,loss_func
=None
,opt_func
=Adam
,lr
=0.001
,splitter
=trainable_params
,cbs
=None
,metrics
=None
,path
=None
,model_dir
='models'
,wd
=None
,wd_bn_bias
=False
,train_bn
=True
) ::Learner
Basic class for a Learner
in NLP.
Adds a ModelResetter
and an RNNRegularizer
with alpha
and beta
to the callbacks, the rest is the same as Learner
init.
This Learner
adds functionality to the base class:
TextLearner.load_pretrained
[source]
TextLearner.load_pretrained
(wgts_fname
,vocab_fname
,model
=None
)
Load a pretrained model and adapt it to the data vocabulary.
wgts_fname
should point to the weights of the pretrained model and vocab_fname
to the vocabulary used to pretrain it.
TextLearner.save_encoder
[source]
TextLearner.save_encoder
(file
)
Save the encoder to file
in the model directory
The model directory is Learner.path/Learner.model_dir
.
TextLearner.load_encoder
[source]
TextLearner.load_encoder
(file
,device
=None
)
Load the encoder file
from the model directory, optionally ensuring it’s on device
Language modeling predictions
For language modeling, the predict method is quite different from the other applications, which is why it needs its own subclass.
decode_spec_tokens
[source]
decode_spec_tokens
(tokens
)
Decode the special tokens in tokens
test_eq(decode_spec_tokens(['xxmaj', 'text']), ['Text'])
test_eq(decode_spec_tokens(['xxup', 'text']), ['TEXT'])
test_eq(decode_spec_tokens(['xxrep', '3', 'a']), ['aaa'])
test_eq(decode_spec_tokens(['xxwrep', '3', 'word']), ['word', 'word', 'word'])
class
LMLearner
[source]
LMLearner
(dls
,model
,alpha
=2.0
,beta
=1.0
,moms
=(0.8, 0.7, 0.8)
,loss_func
=None
,opt_func
=Adam
,lr
=0.001
,splitter
=trainable_params
,cbs
=None
,metrics
=None
,path
=None
,model_dir
='models'
,wd
=None
,wd_bn_bias
=False
,train_bn
=True
) ::TextLearner
Add functionality to TextLearner
when dealing with a language model
LMLearner.predict
[source]
LMLearner.predict
(text
,n_words
=1
,no_unk
=True
,temperature
=1.0
,min_p
=None
,no_bar
=False
,decoder
=decode_spec_tokens
,only_last_word
=False
)
Return text
and the n_words
that come after
The words are picked randomly among the predictions, depending on the probability of each index. no_unk
means we never pick the UNK
token, temperature
is applied to the predictions, if min_p
is passed, we don’t consider the indices with a probability lower than it. Set no_bar
to True
if you don’t want any progress bar, and you can pass a long a custom decoder
to process the predicted tokens.
Learner
convenience functions
language_model_learner
[source]
language_model_learner
(dls
,arch
,config
=None
,drop_mult
=1.0
,backwards
=False
,pretrained
=True
,pretrained_fnames
=None
,loss_func
=None
,opt_func
=Adam
,lr
=0.001
,splitter
=trainable_params
,cbs
=None
,metrics
=None
,path
=None
,model_dir
='models'
,wd
=None
,wd_bn_bias
=False
,train_bn
=True
,moms
=(0.95, 0.85, 0.95)
)
Create a Learner
with a language model from dls
and arch
.
You can use the config
to customize the architecture used (change the values from awd_lstm_lm_config
for this), pretrained
will use fastai’s pretrained model for this arch
(if available) or you can pass specific pretrained_fnames
containing your own pretrained model and the corresponding vocabulary. All other arguments are passed to Learner
.
path = untar_data(URLs.IMDB_SAMPLE)
df = pd.read_csv(path/'texts.csv')
dls = TextDataLoaders.from_df(df, path=path, text_col='text', is_lm=True, valid_col='is_valid')
learn = language_model_learner(dls, AWD_LSTM)
/home/jhoward/anaconda3/lib/python3.7/site-packages/numpy/core/_asarray.py:83: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
return array(a, dtype, copy=False, order=order)
You can then use the .predict
method to generate new text.
learn.predict('This movie is about', n_words=20)
'This movie is about a couple of years distant , but Lucas tells the story of a boy who wants her to become'
By default the entire sentence is feed again to the model after each predicted word, this little trick shows an improvement on the quality of the generated text. If you want to feed only the last word, specify argument only_last_word
.
learn.predict('This movie is about', n_words=20, only_last_word=True)
'This movie is about a mature parent and the final nine - minded , and other uses their work with Diary of the'
text_classifier_learner
[source]
text_classifier_learner
(dls
,arch
,seq_len
=72
,config
=None
,backwards
=False
,pretrained
=True
,drop_mult
=0.5
,n_out
=None
,lin_ftrs
=None
,ps
=None
,max_len
=1440
,y_range
=None
,loss_func
=None
,opt_func
=Adam
,lr
=0.001
,splitter
=trainable_params
,cbs
=None
,metrics
=None
,path
=None
,model_dir
='models'
,wd
=None
,wd_bn_bias
=False
,train_bn
=True
,moms
=(0.95, 0.85, 0.95)
)
Create a Learner
with a text classifier from dls
and arch
.
You can use the config
to customize the architecture used (change the values from awd_lstm_clas_config
for this), pretrained
will use fastai’s pretrained model for this arch
(if available). drop_mult
is a global multiplier applied to control all dropouts. n_out
is usually inferred from the dls
but you may pass it.
The model uses a SentenceEncoder
, which means the texts are passed seq_len
tokens at a time, and will only compute the gradients on the last max_len
steps. lin_ftrs
and ps
are passed to get_text_classifier
.
All other arguments are passed to Learner
.
path = untar_data(URLs.IMDB_SAMPLE)
df = pd.read_csv(path/'texts.csv')
dls = TextDataLoaders.from_df(df, path=path, text_col='text', label_col='label', valid_col='is_valid')
learn = text_classifier_learner(dls, AWD_LSTM)
/home/jhoward/anaconda3/lib/python3.7/site-packages/numpy/core/_asarray.py:83: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
return array(a, dtype, copy=False, order=order)
©2021 fast.ai. All rights reserved.
Site last generated: Mar 31, 2021