Blue Navigation Bar



Implement Transformer with Self-Attention

Language models are continuously evolving, and there is a substantial amount of human effort involved in their development, as you will discover while completing this project. One of the most compelling aspects of the attention mechanism is its ability to learn and accurately preserve the context in both input and output. The work of Vaswani et al., Attention is All You Need, NeurIPS 2017, significantly advanced the state of the art in language modeling. It showcased the ability of transformers not only for translation. Interestingly, it has been quite challenging for me to find an area or task transformers are not applicable. Also, BERT is encoder only model and GPT models are decoder only.

In this project, we implement the English-to-German translation task in the paper 2. Software source code is available at 4, 6. 4 is PyTorch, while 6 is in TensorFlow. You may choose either. Alternatively, clean pytorch only implementation of pre-training, and evaluation is available using TorchTune

The Crux: Attention Mechanism

Clean knowledge of three elements or buildingblocks help understand the paper and also the complete this work. First, $d_model$: dimension of the embedding vector, next, attention mechanism used in work is scaled, with dimension $d_k$=$d_model/h$, where h is number of multi-heads, so we need to first perform projections on Q, K and V matrices and concatenate before performing scaled attention.

Task

Table 1: German Sentences and their English Translations
German English
Guten Morgen! Good morning!
Wie geht es dir? How are you?
Ich bin hungrig. I am hungry.
Entschuldigung, wo ist die Toilette? Excuse me, where is the restroom?
Wie viel kostet das? How much does that cost?
Ich spreche kein Deutsch. I don't speak German.
Was ist dein Name? What is your name?
Es tut mir leid. I'm sorry.
Woher kommst du? Where are you from?
Ich liebe dich. I love you.
Wie spät ist es? What time is it?
Kannst du mir helfen? Can you help me?
Ich verstehe nicht. I don't understand.
Auf Wiedersehen! Goodbye!
Wo ist der Bahnhof? Where is the train station?
Ich habe eine Frage. I have a question.
Wie alt bist du? How old are you?
Ich bin müde. I am tired.
Was machst du gerne in deiner Freizeit? What do you like to do in your free time?
Was ist das? What is that?
Mein Name ist John. My name is John.
Wie heißt das auf Deutsch/Englisch? What is that called in German/English?
Ich bin beschäftigt. I am busy.
Wie war dein Tag? How was your day?
Ich habe Hunger. I am hungry.

References

  1. Dataset
  2. Transformer Paper
  3. Pytorch transformer code
  4. Example source code for assignment
  5. Streamlit application to build chat interface like chatgpt
  6. Tensor2Tensor, or T2T for short, is a library of deep learning models
  7. Tokenizer implementations