Blue Navigation Bar



Implement Transformer with Self-Attention

-Introduction -Task

Introduction

Over the past few decades, significant efforts have been made to understand language. Language, as a primary means of human communication, reflects our culture, evolution, and facilitates connections between individuals. Linguistics formally studies the fundamental structure of languages and investigates whether there exists any common ground across different languages. It is remarkable to witness a computer algorithm learning the intricacies of various languages and comprehending the context of a sentence. However, it’s important to recognize that this portrayal of an algorithm’s capabilities may be somewhat exaggerated.

Language models are continuously evolving, and there is a substantial amount of human effort involved in their development, as you will discover while completing this project. One of the most compelling aspects of the attention mechanism is its ability to learn and accurately preserve the context in both input and output. The work of Vaswani et al., Attention is All You Need, NeurIPS 2017, significantly advanced the state of the art in language modeling. It showcased the ability of transformers not only for translation. Interestingly, it has been quite challenging for me to find an area or task transformers are not applicable. Also, BERT is encoder only model and GPT models are decoder only.

In this project, we implement the English-to-German translation task in the paper 2. Software source code is available at 4, 6. 4 is PyTorch, while 6 is in TensorFlow. You may choose either. Alternatively, clean pytorch only implementation of pre-training, and evaluation is available using TorchTune

Task

Table 1: German Sentences and their English Translations
German English
Guten Morgen! Good morning!
Wie geht es dir? How are you?
Ich bin hungrig. I am hungry.
Entschuldigung, wo ist die Toilette? Excuse me, where is the restroom?
Wie viel kostet das? How much does that cost?
Ich spreche kein Deutsch. I don't speak German.
Was ist dein Name? What is your name?
Es tut mir leid. I'm sorry.
Woher kommst du? Where are you from?
Ich liebe dich. I love you.
Wie spät ist es? What time is it?
Kannst du mir helfen? Can you help me?
Ich verstehe nicht. I don't understand.
Auf Wiedersehen! Goodbye!
Wo ist der Bahnhof? Where is the train station?
Ich habe eine Frage. I have a question.
Wie alt bist du? How old are you?
Ich bin müde. I am tired.
Was machst du gerne in deiner Freizeit? What do you like to do in your free time?
Was ist das? What is that?
Mein Name ist John. My name is John.
Wie heißt das auf Deutsch/Englisch? What is that called in German/English?
Ich bin beschäftigt. I am busy.
Wie war dein Tag? How was your day?
Ich habe Hunger. I am hungry.

References

  1. Dataset
  2. Transformer Paper
  3. Pytorch transformer code
  4. Example source code for assignment
  5. Streamlit application to build chat interface like chatgpt
  6. Tensor2Tensor, or T2T for short, is a library of deep learning models
  7. Tokenizer implementations