Fast Text Tutorial
What is FastText?
FastText is an open-source, free, lightweight library that allows users to learn text representations and text classifiers. It works on standard, generic hardware. Models can later be reduced in size to even fit on mobile devices.
How to use Fast Text?
We use fast text either as a commandline tool or python module. In this tutorial, I will take you through fast-text as a puthon module.
Clone the Github Repository into your local machine to start working with it.
git clone https://github.com/facebookresearch/fastText.git
once you clone, go to the fastText directory and install the required packages
sudo pip install .
Now, we are ready to use but confirm if the set-up was successful. You can do that by importing fasttext
using import fasttext.
Getting Started
There is a help function here which basically works as a manual/documentation at a very high level
so, try help(fasttext.FastText)
In this tutorial, I will try my best to use all the possible functionalities available in fastText but our prime focus will be on train_supervised
Getting and Organizing data
As a part our ritual, we need labeled data to train our supervised classifier. We are taking the data from stack-exchange.com . Download here
unzip and you will find a folder name cooking.stackexchange
.
Now, we have all the data. The next job on our hand is to split data into training and validation sets.
let us create files cooking.train and cooking.valid:
wc cooking.stackexchange.txt
gives the number of datapoints in dataset. let us split it into 80:20 ratio
head -n 12404 cooking.stackexchange.txt > cooking.train tail -n 3000 cooking.stackexchange.txt > cooking.valid
Training our classifier
The input
argument indicates the file containing the training examples.
so we have our model tarined in model
variable. We can also call save_model
to save it as a file and load it later with load_model
function.
Let’s play around now with our model
:
We are good to predict based on the training we did. Let’s do it:
Did you observe?: Adding ?
changed the prediction value a bit but the prediction label is same. With this we can expect our model to perform bad under certain scenario. So we need to improve our model. Let’s dig deep into that now.
Making the model Accurate:
The first thing that comes to my mind is why can’t we pre-process the data. Like remove unnecessry spaces, special characters, and doing some cool pre-processing stuff like identifying @
as a
. It’s cool but we shall not do that now.
Now, let’s train the model
You can clearly see the number of words came down from 14543
to 8952
. This means our pre-processing combined various forms of same word.
let’s predict the same sentences as earlier
You can clearly see the prediction value has gone up significantly.
How can we even improve the performace?
Altering the learning rate and epoch,batch size??
More epoch
By default, fastText sees each training example only five times during training, which is pretty small, given that our training set only have 12k training examples.
let’s predict now:
Here, 3000 is the size of sample we tested on. 0.518 is the precision and 0.224 is the recall rate.
The precision is the number of correct labels among the labels predicted by fastText. The recall is the number of labels that successfully were predicted, among all the real labels.
Example:
model.predict("is it safe to go to an restaurant?",k=5)
(('__label__food-safety', '__label__beef', '__label__storage-method', '__label__chicken', '__label__storage-lifetime'), array([0.67451811, 0.05916279, 0.02460619, 0.02096862, 0.02020765]))
the query has 2 real labels(assumption)
Thus, one out of five labels predicted by the model is correct, giving a precision of 0.20. Out of the two real labels, only one is predicted by the model, giving a recall of 0.50.
see, the numbers they are raising , which anyways is a good sign we are training the model good.
Now, let’s play with learning rate
Learning Rate
Learning rate corresponds to how much the model changes after processing each example. A learning rate of 0 would mean that the model does not change at all, and thus, does not learn anything. Good values of the learning rate are in the range 0.1 - 1.0
Just wow!! prediction value is getting bigger and bigger
Now, let’s try with the combination of words as one. In other words, use word n-grams instead of uni-grams used.
Word N-grams
96% prediction value is toomn good to have. However, let’s test on validation set.
Now, we have achieved 60% precision. It is good for a beginner.
using loss fucntions of our choice:
let try using Hierarchical softmax. we can enable this using loss = ‘hs’. let’s try this out with bucket=200000, dim=50
Interesting, performance, went down.
When we want to assign a document to multiple labels, we can still use the softmax loss and play with the parameters for prediction, namely the number of labels to predict and the threshold for the predicted probability. However playing with these arguments can be tricky and unintuitive since the probabilities must sum to 1.
A convenient way to handle multiple labels(Multi-label classification) is to use independent binary classifiers for each label. This can be done with -loss one-vs-all or -loss ova
Multi-label classification
I’ll try to im provise and take precision and recall and continue the blog later some-time.
References
fasttext is the reference and this tutorial is written as part of learn by writing philosophy