Reinforcement learning (RL) encompasses a class of machine learning (ML) techniques that can be used to solve sequential decision-making problems. RL techniques have found widespread applications in numerous domains, including financial services, autonomous navigation, industrial control, and e-commerce. The objective of an RL problem is to train an agent that, given an observation from its environment, will choose the optimal action that maximizes cumulative reward. Solving a business problem with RL involves specifying the agent’s environment, the space of actions, the structure of observations, and the right reward function for the target business outcome. In policy-based RL methods, the outcome of model training is often a policy, which defines a probability distribution over the actions given an observation. The optimal policy will maximize the cumulative returns obtained by the agent.
In constrained decision-making problems, the agent is tasked with choosing the optimal actions under constraints. A distinct class of such