Reward function – In RL, the reward function assigns a numerical score to each action-state outcome. It quantifies success (e.g. profit) or failure (loss) of actions. Formally, R depends on state, action, and next state (R(s,a,s′)). In a trading RL setup, a reward could be daily profit or Sharpe ratio. For example, the reward might be +1 for a winning trade and 0 for a losing one. Designing the reward function is critical: a well-crafted reward guides the agent to desirable trading behaviors. RL algorithms optimize cumulative reward (total profit) over time.