Leverage TensorFlow and JavaScript to train and predict an accurate binary classification. Includes code sample and Github link to the notebook.
Tensorflow is a really powerful library and I’m still discovering the powers bestowed to it! It not only handles basic Machine Learning tasks, but from what I’ve seen in the documentation it works for more advanced topics such as Image Recognition, Audio sampling or even large language models. Slowly but steadily I want to be able to reach that level to tackle these particular subjects as well 😉.
The code for this tutorial can be found in this Github notebook.
If you have not configured Miniconda + Jupyter yet, please read through this article before proceeding.
If you have Jupyter up and running but no Javascript support with Deno, please follow the steps outlined here in order to follow through this tutorial.
What are we trying to predict
A binary classification problem has 2 possible output values: 0 or 1. One example of a binary classification problem is predicting if a patient has cancer or not. Usually, the model will output a probability between 0 and 1, where values closer to 1 shows that model is very confident that the output should be classified as 1 (eg. patient has cancer).
I came up with a simple problem that our model should predict: is AFI going for a run today? The answer can be 0 or 1. We’ll need to generate our training data according to some rules. Practically, AFI runs weekly if:
- The current day is a Monday, Wednesday or Friday
- With the exception of last days of the month. This day has been decreed as cleaning day
Generating this data is not super complicated because we’ll make good use of Luxon‘s DateTime class. We’ll generate data for 4 months initially. First, let’s import the DateTime object from Luxon:
Then generate the data from 1st June to end of October 2024 using this code:
Prepare data for fitting the model
In this stage we will use the NodeJS-Polars library to import data from the generated learningData array above. We start by importing the library in our notebook:
Now, machine learning models do not understand strings or booleans, they talk only numbers so we need to convert the boolean values that we have into numbers (0 and 1). For fitting the model, we will not need to dayStr property, we’ll use the remaining 3. Here’s how we can use the select method and when statement of the Polars to achieve what we want:
If we display our newly created modelDf data frame, we’ll see the conversion was achieved smoothly:
Still, we cannot feed the data as it is to the model, because we want to separate the input and output. Also, Tensorflow expects data in multidimensional arrays. We can use the toRecords function to convert to array and map it correctly:
Now, at this stage, we are ready to instantiate the model and train it.
Instantiate and train the model
In order to instantiate the Tensorflow library, we’ll need to import it correctly:
- If you have a CUDA-enabled GPU: ‘npm:@tensorflow/tfjs-node-gpu‘
- Otherwise use the CPU version: ‘npm:@tensorflow/tfjs-node‘
After creating our main model variable, we’ll define the 3 composing dense layers of this model:
- Input layer: with 16 units, input shape of 2 and “relu” activation
- Learning (hidden) layer: with 16 units and “relu” activation
- Output layer: with 1 unit and “sigmoid” activation to enable probability output
Super nice, we just have 3 more steps to complete. Firstly, compile the model into memory with “binaryCrossentropy” loss function and “adam” optimizer. This will make the model predict binary classifications. Secondly, define the 2D tensors from the input and output vectors defined at our previous stage.
Now, the last step of this stage is to fit/train the model with our data. For start, we’ll use 100 epochs and 30% validation split. Training should not take very long because the epochs parameter is not that high. The code for this step looks like this:
Test our lovely model and check the predictions
Before magic happens, we’ll need some data to be predicted which will be supplied in the form of a 2D array. The first element will represent the day of the week, whereas the other one is going to represent whether it’s the last day of the month or not. Since every input in Tensorflow is practically a tensor, we initialize a 2D tensor out of it and pass it to the predict function of the model itself.
If we take a brief look on our predictions, we can notice they are not necessarily accurate which means that the process has to be tweaked a bit. We can try different values for these parameters:
- The number of epochs in the fit function
- Increase the number of units in the input and learning layer
- Add more training data to increase the likelihood of better predictions (in our scenario more data can drag the model down since a lot of it repeats itself)
Tweaking the model
I tweaked the model by changing the number of epochs to 3000 and adding more training data as of January 2024, so more than 6 months of data. This way the prediction was really improved except for the case when it’s Thursday and not the last day of the month. Also added a threshold to display the result in a more proper fashion.
The final predictions look like these and there is some room to improve, but at this stage I’m very satisfied with the result. Will do some research to see how to improve however in due time.
Conclusion
Performing binary classifications with Tensorflow is really fun and adventurous. I do feel that you need to understand a lot of what’s happening under the hood to allow for a more efficient tweaking. With any learning process, be it machine or human, we take slow steps to improve and achieve more. Hope you really enjoyed the today’s lesson on binary classification with Tensorflow JS!
Thanks for reading, I hope you found this article useful and interesting. If you have any suggestions don’t hesitate to contact me. If you found my content useful please consider a small donation. Any support is greatly appreciated! Cheers  😉