Consider the GridWorld example from the notes. Using the inverse Manhattan distance as a potential reward function, calculate Q(s, West) for state s = (1,2) and state s' = (0,2), receiving no immediate reward. Assume α=0.5 and γ=0.9 and Q(s,a)=0 for all states and actions.数值题

题目图片

登录即可查看完整答案

我们收录了全球超50000道真实原题与详细解析,现在登录,立即获得答案。

类似问题

更多留学生实用工具

加入我们,立即解锁 海量真题独家解析,让复习快人一步!