The year 2022 was all about Artificial Intelligence (AI). AI has found mainstream success and wide introduction to the masses, with applications such as PhotoAI, Lensa, Dream by Wombo and most notably: GPT-3 by OpenAI (fun fact: the T in GPT stands for transformer!)
Many successful app creators have used the OpenAI’s API to create their own spin on AI. The fine tuning possibilities have been numerous. However, what if you could create your own model, free of charge and put it online, the open source way for everyone to use? To explain how this idea became a reality, let us dive deeper into Transformer Models.
Into the Tech of It
Transformer models are a type of deep learning model that is based on attention mechanisms. They are most prominently used in natural language processing (NLP) but can also be used in other tasks, such as image generation, computational vision tasks or text-to-speech tasks just to name a few. Transformer models are based on the encoder-decoder architecture, which consists of an encoder to process the input sequence and a decoder to generate an output sequence.
The encoder reads the input sequence and uses an attention mechanism to learn the context of each word. The decoder then uses this context to generate the output sequence. Many of the transformer models are trained on many GPUs which would be almost impossible for an individual user to achieve (unless said individual has NVIDIA as a sponsor, and unlimited electricity). This makes it so important that these base models are shared!
There are two main encoder types, autoencoding models and autoregressive models. Autoregressive models guess the next token based on previous ones and are used for text generation. GPT is an example. Autoencoding models corrupt input tokens and try to reconstruct the sentence, and are used for classification. BERT is an example.
Importance of Open Source in AI
The proverb ‘Knowledge is Power’ still remains true. However, together we are smarter! We can only think about what the future holds, as hopefully more institutions will share their knowledge. Transformers could become not only applicable to language but also molecular structures which pave the way to discoveries of new material techniques or medicine. They are based on self-supervised learning – this means they train themselves on unlabeled data. Additionally, there could be transformer models based on laws from each country, to help make law services more accessible, including to low income families.
Another reason why it is so important to make this open for anyone is to create new use cases. Let me give an example with one of the most important models in use today: BERT. BERT self-supervised learning is for two tasks:
Masked Language Modeling (MLM): In this task, 15% of the words in a sentence are randomly masked by the model. The model then has to predict the masked words. This is different from traditional recurrent neural networks (RNNs) that usually see the words one after the other, or from autoregressive models like GPT, which internally masks the future tokens. It allows the model to learn a bidirectional representation of the sentence.
Next Sentence Prediction (NSP) in this NLP task, the model concatenates two masked sentences as input during pre-training. These can be sentences that were originally seen by the model or also variations. The prediction that the model has to make is whether these two sentences were following each other or not.
Although these tasks are interesting, and not trivial, let us now look at the real power of open source! These tasks are by far not the only thing these transformer models are used for today. Many variations on these tasks have been uploaded to the platform such as hate-speech detection, sentiment analysis and many more. It could be compared to a lego-set: you can build what is on the packaging, or some variation of it. That’s why in the next section, an example will be given on how to finetune your own BERT model.
The next chapter will happen in a Jupyter Notebook!
Although, I chose a hard dataset for this demo, it is the intention that you can play around with it and change certain variables like the learning rate and the number of categories to make it less overwhelming. You can also change the data and model if you feel like it!