Understanding Transformers Part 12: Building the Decoder Layers

In the previous article, we just began with the concept of decoders in a transformer.

Now we will start adding the positional encoding.

Adding Positional Encoding in the Decoder

Now, for the decoder, let’s add positional encoding.

Just like before, we use the same sine and cosine curves to get positional values based on the embedding positions.

These are the same curves that were used earlier when encoding the input.

Applying Positional Values

Since the <EOS> token is in the first position and has two embedding values, we take the corresponding positional values from the curves.

For the first embedding, the value is 0
For the second embedding, the value is 1

Now, we add these positional values to the embedding:

As a result, we get 2.70 and -0.34, which represent the <EOS> token after adding positional encoding.

Adding Self-Attention

Next, we add the self-attention layer so the decoder can keep track of relationships between output words.

The self-attention values for the <EOS> token are -2.8 and -2.3.

Note that the weights used in the decoder’s self-attention (for queries, keys, and values) are different from those used in the encoder.

Adding Residual Connections

Now, we add residual connections, just like we did in the encoder.

What’s Next?

So far, we have seen how self-attention helps the transformer understand relationships within the output sentence.

However, for tasks like translation, the model also needs to understand relationships between the input sentence and the output sentence.

We will explore this in the next article.

Looking for an easier way to install tools, libraries, or entire repositories?
Try Installerpedia: a community-driven, structured installation platform that lets you install almost anything with minimal hassle and clear, reliable guidance.

Just run:

ipm install repo-name

… and you’re done! 🚀

🔗 Explore Installerpedia here

DE

Source

This article was originally published by DEV Community and written by Rijul Rajesh.

Read original article on DEV Community

Back to Discover