What is the "advantage," typically denoted A(s,a), used for in Actor-Critic?单项选择题

A

It measures how much better an action is than the average at that state

B

It equals the value function minus the reward

C

It estimates the return from a state only

D

It equals the TD error plus entropy

登录即可查看完整答案

我们收录了全球超50000道真实原题与详细解析,现在登录,立即获得答案。

类似问题

更多留学生实用工具

加入我们,立即解锁 海量真题独家解析,让复习快人一步!