Adopting an integrated production, maintenance, and quality policy in production systems is of great importance due to their interconnected influence. Consequently, investigating these aspects in isolation may yield an infeasible solution. This paper aims to
address the joint optimal policy of production, maintenance, and quality in a two-machine-single-product production system with an intermediate buffer and final product storage. The production machines have degradation levels from as-good-as-new to the breakdown state. The failures increase the production machine’s degradation level, and maintenance activities change the status to the initial state. Also, the quality of the final product depends on the level of degradation of the machines and the correlation between the degradation level of the production machines and the product’s quality in the case that high degradation of the previous production machines leads to a high probability to produce wastage by the following machines is considered. The production system studied in this research has been modeled using the agent-based simulation, and the Reinforcement Learning algorithm has obtained the optimal integrated policy. The goal is to find an integrated optimal policy that minimizes production costs, maintenance costs, inventory costs, lost orders, breakdown of production machines, and low-quality production. The Meta-heuristic technique has evaluated the joint policy obtained by the decision-maker agent. The results show that the acquired joint policy by the reinforcement learning algorithm offers acceptable performance and can be applied to the autonomous real-time decision-making process in manufacturing systems.