Understanding Transformers Part 1: How Transformers Understand Word Order
In this article, we will explore transformers. We will work on the same problem as before: translating a simple English sentence into Spanish using a transformer-based neural network. Since a trans...

Source: DEV Community
In this article, we will explore transformers. We will work on the same problem as before: translating a simple English sentence into Spanish using a transformer-based neural network. Since a transformer is a type of neural network, and neural networks operate on numerical data, the first step is to convert words into numbers. Neural networks cannot directly process text, so we need a way to represent words in a numerical form. There are several ways to convert words into numbers, but the most commonly used method in modern neural networks is word embedding. Word embeddings allow us to represent each word as a vector of numbers, capturing meaning and relationships between words. Before going deeper into the transformer architecture, let us first understand positional encoding. This is a technique used by transformers to keep track of the order of words in a sentence. Unlike traditional models, transformers do not process words sequentially. Because of this, they need an additional way