Implement Transformer with Self-Attention
- Introduction
- Task
Introduction
Over the past few decades, significant efforts have been made to understand language. Language, as a primary means of human communication, reflects our culture, evolution, and facilitates connections between individuals. Linguistics formally studies the fundamental structure of languages and investigates whether there exists any common ground across different languages. It is remarkable to witness a computer algorithm learning the intricacies of various languages and comprehending the context of a sentence. However, it’s important to recognize that this portrayal of an algorithm’s capabilities may be somewhat exaggerated.
Language models are continuously evolving, and there is a substantial amount of human effort involved in their development, as you will discover while completing this project. One of the most compelling aspects of the attention mechanism is its ability to learn and accurately preserve the context in both input and output. The work of Vaswani et al., Attention is All You Need, NeurIPS 2017, significantly advanced the state of the art in language modeling. It showcased the ability of transformers not only for translation. Interestingly, it has been quite challenging for me to find an area or task transformers are not applicable. Also, BERT is encoder only model and GPT models are decoder only.
In this project, we implement the English-to-German translation task in the paper 2. Software source code is available at 4, 6. 4 is PyTorch, while 6 is in TensorFlow. You may choose either. Alternatively, clean pytorch only implementation of pre-training, and evaluation is available using TorchTune
The Crux: Attention Mechanism
Clean knowledge of three elements or buildingblocks help understand the paper and also the complete this work. First, $d_model$: dimension of the embedding vector, next, attention mechanism used in work is scaled, with dimension $d_k$=$d_model/h$, where h is number of multi-heads, so we need to first perform projections on Q, K and V matrices and concatenate before performing scaled attention.
Task
-
Implement English-to-German translation task.
-
Get the dataset from Huggingface.
-
Since the dataset is large (4.5 million entries), please use just the first 80,000 translations. Feel free to use more rows in the dataset if you have access to the appropriate computing resources. There might be a significant need for computing power to finish the task with more entries.
- Reserve 0.1$\%$ of the 80,000, i.e., 80 entries, for testing.
-
Use either byte pair encoding or word-level tokenization. Feel free to try others as well.
-
Please use exactly all the parameters as described in the paper, except for the number of entries in the dataset. (This is to make sure the code runs smoothly).
-
Run for 100 epochs and the rest of the architecture of the paper must remain the same.
- Save the trained model after all epochs have been completed, to save disk space.
- Use the attached validation.py file to run validations on the final trained model. The validation.py file contains two small translation datasets. (Table 1 display some of the entries in the validation set.) Please print a table with the predicted translation produced by your implementation, and the correct translation for the two datasets.
German | English |
---|---|
Guten Morgen! | Good morning! |
Wie geht es dir? | How are you? |
Ich bin hungrig. | I am hungry. |
Entschuldigung, wo ist die Toilette? | Excuse me, where is the restroom? |
Wie viel kostet das? | How much does that cost? |
Ich spreche kein Deutsch. | I don't speak German. |
Was ist dein Name? | What is your name? |
Es tut mir leid. | I'm sorry. |
Woher kommst du? | Where are you from? |
Ich liebe dich. | I love you. |
Wie spät ist es? | What time is it? |
Kannst du mir helfen? | Can you help me? |
Ich verstehe nicht. | I don't understand. |
Auf Wiedersehen! | Goodbye! |
Wo ist der Bahnhof? | Where is the train station? |
Ich habe eine Frage. | I have a question. |
Wie alt bist du? | How old are you? |
Ich bin müde. | I am tired. |
Was machst du gerne in deiner Freizeit? | What do you like to do in your free time? |
Was ist das? | What is that? |
Mein Name ist John. | My name is John. |
Wie heißt das auf Deutsch/Englisch? | What is that called in German/English? |
Ich bin beschäftigt. | I am busy. |
Wie war dein Tag? | How was your day? |
Ich habe Hunger. | I am hungry. |
References
- Dataset
- Transformer Paper
- Pytorch transformer code
- Example source code for assignment
- Streamlit application to build chat interface like chatgpt
- Tensor2Tensor, or T2T for short, is a library of deep learning models
- Tokenizer implementations