In a single iteration of the Monte-Carlo Tree Search (MCTS) algorithm, what is the primary purpose of the "Simulate" step (also known as rollout)?单项选择题
A
To estimate the long-term value (expected future reward) from a newly expanded state by playing out a random or heuristic-guided sequence of actions until a terminal state is reached.
B
To select the most promising unexpanded node in the current search tree using a multi-armed bandit strategy like UCB.
C
To add new child nodes to a selected leaf node, representing possible outcomes of an unexplored action.
D
To update the visit counts and Q-values of all ancestor nodes in the path from the root to the newly expanded node, based on the outcome of the simulation.
登录即可查看完整答案
我们收录了全球超50000道真实原题与详细解析,现在登录,立即获得答案。
更多留学生实用工具
希望你的学习变得更简单
加入我们,立即解锁 海量真题 与 独家解析,让复习快人一步!