I’ll admit it: my mind was blown (again. and this is about a daily occurence in the lightspeed world of AI design & development) when I heard a professor proclaim: “I’m not sure if you got this vibe: that this is something incredibly amazing… because it’s a generic system… its just 4 lines of equations… and you give it data… and it learns to translate… it learns to speak fluent French…”. (link: AI masterclass video: Łukasz Kaiser) So he claimed: the Transformer Source Code, at its heart, was a mere four lines of equations. Well, then.
If that was so, I should be able to take a look at those equations with my own two eyes. So I downloaded the seminal paper : “Attention is All You Need.” (2017, Vaswani, Shazeer, et al) and scanned it. urgh. a little above my pay grade. So next up: I did what is, more an more, becoming second nature, and well on its way to replacing my Google reflex: I asked ChatGPT.
GR:
What are the four key algorithms that comprise a modern LLM
based on the paper “Attention is all you need”?
The four key algorithms that comprise a modern LLM based on the paper “Attention is all you need” are:
-
- Self-attention mechanism,
- Multi-head attention mechanism,
- Position-wise feed-forward network,
- Layer normalization.
Transformer Source Code: Structure
- The Scaled Dot-Product Attention — 14 lines (8 actual)
- Multi-head Attention — 50 lines (29 actual)
- Positional Encoding — 24 lines (14 actual)
- The Encoder
- layer — 42 lines (24 actual)
- component — 37 lines (17 actual)
- The Decoder
- layer — 53 lines (30)
- component — 41 lines (17)
- The Transformer — 62 lines (21)
- Training
- Custom Loss Function (mask padding tokens) — 8 lines (6)
- Adam Optimizer Schedule (variable learning rate) — 13 lines (9)
- Main Train Function — 42 lines (26)
- Perform the Training on the Dataset — 37 lines (13)
- The actual runtime Program:
- Predict & Translate — 36 lines (19)
TOTAL CODEBASE — 460 lines (233 actual lines of code)
- lines of code indicate the simple totals in the supplied codebase.
- actual lines is a truer measure resulting from stripping out blank lines & comments.
Transformer Source Code: 233 Lines that Define AI Reality
Defining the Key Components of the Neural Net Creation Engine
The Scaled Dot-Product Attention — 14 lines
Multi-head Attention — 50 lines (29)
Positional Encoding — 24 lines (14 actual)
4a. The Encoder Layer — 42 lines (24 actual)
4b. The Encoder Component — 37 lines (17 actual)
The Decoder Layer — 53 lines (30 actual)
The Decoder Component — 41 lines (17 actual)
The Transformer — 62 lines (21 actual)
Training the Neural Net
Custom Loss Function (mask padding tokens) — 8 lines (6 actual)
Adam Optimizer Schedule (variable learning rate) — 13 lines (9 actual)
Main Train Function — 42 lines (26 actual)
Perform the Training on the Dataset — 37 lines (13 actual)
The actual Program:
Predict & Translate — 36 lines (13 + 6)
Transformer Code Repository:
https://github.com/edumunozsala/Transformer-NMT
Transformer Source Code: Example I/O
Finally, wouldn’t you like to see this code running, see it in live action? Me too. Here are a few example results:
#Show some translations sentence = "you should pay for it." print("Input sentence: {}".format(sentence)) predicted_sentence = translate(sentence) print("Output sentence: {}".format(predicted_sentence)) Input sentence: you should pay for it. Output sentence: Deberías pagar por ello. #Show some translations sentence = "we have no extra money." print("Input sentence: {}".format(sentence)) predicted_sentence = translate(sentence) print("Output sentence: {}".format(predicted_sentence)) Input sentence: we have no extra money. Output sentence: No tenemos dinero extra. #Show some translations sentence = "This is a problem to deal with." print("Input sentence: {}".format(sentence)) predicted_sentence = translate(sentence) print("Output sentence: {}".format(predicted_sentence)) Input sentence: This is a problem to deal with. Output sentence: Este problema es un problema con eso.
So there it is. The codebase that changed a civilisation. Now, go enjoy your donuts.