In a single iteration of the Monte-Carlo Tree Search (MCTS) algorithm, what is the primary purpose of the "Simulate" step (also known as rollout)?单项选择题

A

To estimate the long-term value (expected future reward) from a newly expanded state by playing out a random or heuristic-guided sequence of actions until a terminal state is reached.

B

To select the most promising unexpanded node in the current search tree using a multi-armed bandit strategy like UCB.

C

To add new child nodes to a selected leaf node, representing possible outcomes of an unexplored action.

D

To update the visit counts and Q-values of all ancestor nodes in the path from the root to the newly expanded node, based on the outcome of the simulation.

登录即可查看完整答案

我们收录了全球超50000道真实原题与详细解析,现在登录,立即获得答案。

类似问题

更多留学生实用工具

加入我们,立即解锁 海量真题独家解析,让复习快人一步!