In the positional encoding of Transformers, sinusoidal functions are used with different formulas for odd and even indices, incorporating the term 10000^(2i/d_model). Analyze the following statements and choose the correct explanations for the effects of increasing or decreasing the constant 10000. See the formula below: Hint: Lec 19, Slides 29-32. 多项选择题
A
Decreasing the constant from 10000 to 10 would result in a narrower frequency range of positional encodings, potentially making it harder for the model to differentiate positions in longer sequences.
B
Increasing the constant from 10000 to 10000000 would generate a wider range of frequencies in the positional encodings. This could enhance the model's ability to discern positions in longer sequences but might make the encoding too complex, leading to difficulties in learning positional relationships effectively.
登录即可查看完整答案
我们收录了全球超50000道真实原题与详细解析,现在登录,立即获得答案。
类似问题
Which of the following best describes positional encoding in Transformers?
What is combined with the inputs (embeddings) to the transformer architecture that encodes contextual information that can be used by attention mechanisms to create embeddings with more context?
In a self-attention transformer network, which of the following is true for sinusoid-based positional encoding vectors
Project A has a required return on 9.2 percent and cash flows of −$87,000, $32,600, $35,900, and $43,400 for Years 0 to 3, respectively. Project B has a required return of 12.7 percent and cash flows of −$85,000, $14,700, $21,200, and $89,800 for Years 0 to 3, respectively. Which project(s) should you accept based on net present value if the projects are mutually exclusive?
更多留学生实用工具
希望你的学习变得更简单
加入我们,立即解锁 海量真题 与 独家解析,让复习快人一步!