For the dice problem, the problem is now: every round we draw nice, if you get to stay, you get $4. If you is kicked out, you get nothing. If you choose to quit, you get $10. We will now use a mixed strategy: "I want to first take the risk and earn at least X dollars before I quit and take my $10". What's the optimal X?  Implement policy iteration to find out X, suppose the max money you can get is $100. Define the state space to include the money you got so far.  https://colab.research.google.com/drive/13VwGV6JRm5_mwuKb2mtX6XE45cKC8t14?usp=sharing Links to an external site.   The optimal X is: 简答题

登录即可查看完整答案

我们收录了全球超50000道真实原题与详细解析,现在登录,立即获得答案。

类似问题

更多留学生实用工具

加入我们,立即解锁 海量真题独家解析,让复习快人一步!