The problem of getting stuck in local minima is a common one for reinforcement learning agents. There are a few ways to overcome this problem.
One way is to use a technique called last-mile optimization. With last-mile optimization, the agent tries to find the global optimum by starting from the local optimum and then moving to the global optimum. The agent first finds the local optimum. To do this, the agent uses a technique called hill climbing. Hill climbing is a method of optimization where the agent starts at a random point and moves to the point that is closest to the global optimum. Once the agent has found the local optimum, it then tries to find the global optimum by moving to the next highest point.
Another way to overcome the problem of getting stuck in local minima is to use a technique called up-hill climbing. Up-hill climbing is similar to last-mile optimization in that the agent starts from a local optimum and then tries to find the global optimum. The difference is that, with up-hill climbing, the agent does not start from the local optimum. Instead, the agent starts from a random point and then tries to find the global optimum by moving to higher and higher points.
A third way to overcome the problem of getting stuck in local minima is to use a technique called random restarts. With random restarts, the agent starts from a random point and then tries to find the global optimum by restarting from different points. The agent will keep restarting from different points until it finds the global optimum.
References:
https://en.wikipedia.org/wiki/Local_minimum
https://tpu.googleapis.com/v1/engines/accelerated_computing:last-mile-optimization:https://cloud.google.com/tpu/docs/lmo
https://en.wikipedia.org/wiki/Hill_climbing
https://brilliant.org/wiki/uphill-climbing-algorithm/
https://towardsdatascience.com/random-restart-hill-climbing-56e6b71e122b