Drones Tracking Adaptation Using Reinforcement Learning: Proximal Policy optimization

This paper presents a reinforcement learning approach for automatic adaptation of the process noise covariance (Q). The Q value plays a crucial role in estimating future state values within a Kalman filter tracking system. Proximal Policy Optimization (PPO), a state-of-the-art policy optimization algorithm, was employed to determine the optimal Q value that enhances tracking performance, as measured by Root Mean Square Error (RMSE). Our results demonstrate the successful learning capability of the PPO agent over time, enabling it to suggest the optimal Q value by effectively capturing the policy of appropriate rewards under varying environmental conditions. These outcomes were compared with those of a feed-forward neural network learning, the Castella innovation/ Q values mapping, and fixed Q values. The PPO algorithm yielded promising results. We employed the Stone Soup library to simulate ground truths, measurements, and the Kalman filter tracking process.