Attention Mechanism improves word embedding by:单项选择题

A

Replacing traditional word embeddings like Word2Vec and GloVe with fixed vector representations

B

Increasing the number of parameters in a neural network without improving contextual understanding

C

Ignoring the sequential nature of text and treating words as independent entities.

D

Dynamically adjusting word embeddings by weighting the relevance of surrounding words in context

登录即可查看完整答案

我们收录了全球超50000道真实原题与详细解析,现在登录,立即获得答案。

类似问题

Aside from the transformer architecture itself, what is the major technological breakthrough in natural language techniques that significantly advanced natural language AI capabilities, particularly with respect to understanding the relationships of words and their context, even when there are interdependencies and long-range dependencies (e.g., within sentences, paragraphs, etc).   As a bonus, this breakthrough also solved many of the problems that recurrent neural network (RNN) techniques such as LSTMs had with long-range dependencies and contextual understanding, and is one of the major reasons that transformers are replacing the use of RNNs.

What is the main role of the attention mechanism in an LLM?

  On scaled dot-product attention and training stability of a transformer:   I Without scaling by 𝐷 𝑘 , the variance of the dot product 𝑞 𝑛 ⊤ 𝑘 𝑚 grows with dimensionality, producing large logits that can saturate the softmax. II Scaling by 𝐷 𝑘 primarily solves exploding-gradient problems inside the value projection 𝑉 . III The softmax-normalized matrix S o f t m a x ( 𝑄 𝐾 ⊤ ) is applied row-wise, making each row represent how strongly a query attends to all keys. IV Scaled dot-product attention computes A t t e n t i o n ( 𝑄 , 𝐾 , 𝑉 ) = S o f t m a x ! ( 𝑄 𝐾 ⊤ 𝐷 𝑘 ) 𝑉 , and the resulting matrix always has the same dimension as 𝑉 .  

Which innovation is at the core of the transformer architecture and enables modeling long-range dependencies effectively?

更多留学生实用工具

加入我们,立即解锁 海量真题独家解析,让复习快人一步!