Which of the following attention models uses a subset of the input to derive the output, and can not be trained directly with gradient methods?Single choice
A
Hard attention
B
Soft attention
Log in for full answers
We've collected over 50,000 authentic original questions and detailed explanations from around the globe. Log in now and get instant access to the answers!
Similar Questions
Aside from the transformer architecture itself, what is the major technological breakthrough in natural language techniques that significantly advanced natural language AI capabilities, particularly with respect to understanding the relationships of words and their context, even when there are interdependencies and long-range dependencies (e.g., within sentences, paragraphs, etc). ย As a bonus, this breakthrough also solved many of the problems that recurrent neural network (RNN) techniques such as LSTMs had with long-range dependencies and contextual understanding, and is one of the major reasons that transformers are replacing the use of RNNs.
What is the main role of the attention mechanism in an LLM?
ย On scaled dot-product attention and training stability of a transformer: ย I Without scaling by ๐ท ๐ , the variance of the dot product ๐ ๐ โค ๐ ๐ grows with dimensionality, producing large logits that can saturate the softmax. II Scaling by ๐ท ๐ primarily solves exploding-gradient problems inside the value projection ๐ . III The softmax-normalized matrix S o f t m a x ( ๐ ๐พ โค ) is applied row-wise, making each row represent how strongly a query attends to all keys. IV Scaled dot-product attention computes A t t e n t i o n ( ๐ , ๐พ , ๐ ) = S o f t m a x ! ( ๐ ๐พ โค ๐ท ๐ ) ๐ , and the resulting matrix always has the same dimension as ๐ . ย
Which innovation is at the core of the transformer architecture and enables modeling long-range dependencies effectively?
More Practical Tools for Students Powered by AI Study Helper
Making Your Study Simpler
Join us and instantly unlock extensive past papers & exclusive solutions to get a head start on your studies!