Clip

The Key Attributes that Make the Transformer Algorithm Successful
The transformer algorithm is successful because it combines multiple ideas simultaneously, including attention, a great fit to the GPU, and being less deep, making it easier to optimize. It allows achieving better results for the same amount of compute.