![]() You get your policy for the model by taking the max of those estimatesProblem with this basic version of Q learning is that it would take way to long if you think about how much more complex driving is to playing poker. Exploit during test because you aren’t changing the weights when testing, you are doing your q learning until your q function has weighted average is just average and the weights are equal for the two sides.Exploring is basically sampling from a set of actions in order to find out new possibilities and obtain better rewardsWhile Exploiting is taking advantage of the prior knowledge the agent knows by repeating actions that lead to favored long term results or rewards. ![]() Exploring more is ideal at the beginning where you are finding more about the world or environmentExploiting is best done after the initial exploration of the environment(From the image, you can see that exploring has a relative uncertainty involved with it, while exploiting is more organized and certain while going for a goal).There are some issues with behavioral cloningThe inputs are images so we do not have access to some things like speedData limitations as well so does not know what to do when car veers off trackPolicy based off human decisions, so cannot achieve superhuman performance or recognize even more complex patternsGood startMove on the reinforcement learning algorithm.Reiterate policy def.Behavioral cloning mimics what a human would doPoint out different portions of image (input, output, whole as data)Data: has the computer remember decisions at each time stepAccuracy. ![]() A self driving car has an environment where it can basically perform multiple actions and be in certain states to get to rewards.State Space: All possible states or configurations our self-driving car can be inAction space: Possible actions the self-driving car can be in -> Direction, step on the gas, breakRewards: if the car is doing what we want it to do, that is a reward.The systems do not start off with any knowledge. Deep Q-Learning systems take a lot longer to train compared to Behavior Cloning A neural network used to approximate the Q-Function An environment to give use observation/rewards/actionsģ. Deep Q-Learning agents require three things:ġ. Replaces regular Q-table with the neural networks. Deep Q-Learning is a kind of learning process that requires 2 neural networks. Exploring versus Exploiting(Epsilon Greedy)Įxploring - Sampling from a set of actions inĮxploiting - Taking advantage of what the.When the q-learning agent is training, what policy should it → Let’s move on to RL algorithms to learn a policy from scratch, without any human teacher at all Good way to transfer some human intuition to complex tasks! Does not know what to do when car veers off trackīehavior cloning approach is not perfect but it is a solid starting point. CNN output: action (accelerate, turn left, turn right, stay) CNN data: Human played the driving simulation Instead of a list of manual rules, we used a convolutional neural network A policy takes in a state and outputs an action In behavioral cloning, the policy tries to mimic what a human would do Slow down + turn until the front is a road ![]() Turn in the direction with more road pixels (action)ģ. Count the road pixels on the left and right half of the grid If the front of the car is a road (state): What we see What the computer sees as road Using our road image, separate road pixel colours from all other colours To do (Do we want the car to be in the left lane? Right Rewards - Basically what we want the self driving car State space - All the possible states the self driving carĬan be in - write out 96x96x3 images, put an image onĪction space - All the possible actions the self driving
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |