Presentations : Select spatial resolution as a hyperparameter in reinforcement learning for optimal path finding

Presentations

Select spatial resolution as a hyperparameter in reinforcement learning for optimal path finding

Moskalev P.V., Stebulyanin M.M., Myagkov A.S.

Moscow State University of Technology "STANKIN", Russian Federation, 127055, Moscow, Vadkovsky lane 1, moskaleff@mail.ru, mmsteb@rambler.ru, almyagkov99@yandex.ru

!	You need a Javascript-capable browser to display math equations correctly. Please enable Javascript in browser preferences.

!	Your browser should support MathML to properly display equations. You may use IE6/7 with MathPlayer plugin or Firefox with additional fonts installed.

When searching for the optimal path of a mobile robot in a physical space with obstacles, the latter can be represented as a lattice. The lattice spacing $h$ then serves as a hyperparameter defining the feasible solution space. The proposed algorithm can be viewed as a variation of reinforcement learning. In the first stage, the Leath algorithm is used to perform a reachability analysis in percolation, which is equivalent to verifying the connectivity of the state space for a given discretization $h$. In the second stage, a modified A* algorithm is used to search for the optimal path, where the path cost function is determined by a heuristic predicting the state cost and a penalty for entering a safety zone, modeling the negative reward for risky path segments. In the third stage, a combination of the Gilbert–Johnson–Keerthi distance (GJK) and expanding polyhedron (EPA) algorithms is used to refine the constraints that ensure the safety of solutions found on a discrete lattice in continuous physical space.

Statistical modeling revealed a nonlinear dependence of the probability of a feasible solution $P(q|h)$ on the hyperparameter $h$ and the dimensionless obstacle density $q$. A correct approximation of this dependence by a logistic function indicates the presence of a critical value $h_c(q)$, similar to the percolation threshold in percolation theory, beyond which the environment becomes "pathless" for the agent. The effectiveness of the resulting solution $E(q|h)$ saturates at small $h$, which corresponds to the well-known principle of "diminishing returns" with increasing model complexity in reinforcement learning: after a certain threshold, further refinement only slightly improves the quality of the solution but greatly increases the difficulty of finding it. The critical value of the grid spacing $h_c(q)$ can then be considered a Pareto optimum in the space.

Thus, the grid spacing is a hyperparameter that determines the properties of the decision-making process when controlling a mobile robot. Optimization of this parameter should precede the learning process. The approximations $P(q|h)$ and $E(q|h)$ found in [1] form the basis for developing reinforcement learning algorithms that adaptively select the optimal resolution in control problems.

This work was supported by the Ministry of Education and Science of the Russian Federation (theme No. FSFS-2024-0012).

References.

1. Moskalev P.V., Stebulyanin M.M., Myagkov A.S. Impact of spatial resolution on mobile robot path optimality in two-dimensional lattice models // Computer Research and Modeling. Vol. 17, No. 6. 2025 [in press].

abstract in Russian (PDF)