Continuously Updating Reinforcement Learning (CURL) demonstrates the ability to rapidly maintain deployed ML models when there is a change in use case such as a denied target with minimal performance effects. Traditional Machine Learning (ML) lifecycle requires ML models to be retrained and redeployed in order to maintain performance of deployed models experiencing changes in underlying data such as data drift. Data drift can include a wide variety of changes in data such as the addition of a new class, operating in an entirely new environment, mislabeled data, or subtle changes in targets over time. (CURL) deviates from this traditional lifecycle with dynamic updates using Reinforcement Learning (RL) to identify and capture data changes, and then automatically retrain the model with data changes. CURL learns to identify changes in data through its RL policy that is designed to maximize the reward for identifying changes in data. Specifically, CURL’s RL approach includes an environment with both the model’s performance and current prediction confidence as the observation space for the agent to act on the discrete action space, and reward function of the model’s accuracy subtracted by the labeling cost to learn data changes. Our controlled experiment demonstrated that the same distribution of denied target data (3%) was found by our RL policy, and our retrained model exceeded the initial classifier performance. CURL can be considered a general purpose technology that could be applied to a wide spectrum of fielded ML systems.
|