For the dice problem, the problem is now: every round we draw nice, if you get to stay, you get $4. If you is kicked out, you get nothing. If you choose to quit, you get $10. We will now use a mixed strategy: "I want to first take the risk and earn at least X dollars before I quit and take my $10". What's the optimal X?  Implement policy iteration to find out X, suppose the max money you can get is $100. Define the state space to include the money you got so far.  https://colab.research.google.com/drive/13VwGV6JRm5_mwuKb2mtX6XE45cKC8t14?usp=sharing Links to an external site.   The optimal X is: Short answer

Log in for full answers

We've collected over 50,000 authentic original questions and detailed explanations from around the globe. Log in now and get instant access to the answers!

More Practical Tools for Students Powered by AI Study Helper

Join us and instantly unlock extensive past papers & exclusive solutions to get a head start on your studies!