In the last few years, spare parts supply chains have faced severe disruptions due to global conflicts and the widespread of corona virus. As a result, manufacturers were obliged to look for alternative suppliers to fulfill their needs for spare parts, which can come at different qualities and prices. In this research, an integrated decision-making problem for production, maintenance, spare parts’ supplier selection, and inventory for both finished products and spare parts is addressed with the objective of minimizing total expected costs. This thesis considers a production facility consisting of a single machine producing similar products through a single process. Meanwhile, the demand for the finished product is stochastic. At the beginning of each period, the manufacturer needs to decide on the throughput rate (Production rate) in order to fulfill the demand of the finished product. assuming shortages are not allowed.
The machine has a critical spare part that deteriorates during production based on the throughput rate of the machine. The higher the throughput rate, the higher the deterioration rate of the part. The deterioration level is detected via means of condition monitoring. When the condition of the spare part reaches a predetermined threshold level, it must be replaced. If it fails, the machine becomes inoperable, and production stops, incurring the manufacturer breakdown cost. At the beginning of each production period, the manufacturer needs to decide whether to replace the spare part and with which spare part of the available ones in inventory. Furthermore, it is required to determine whether to place an order to purchase spare parts from a supplier to be selected from the available set of trusted suppliers to replenish inventory.
In this thesis, the studied problem is modelled as a Markov Decision Process (MDP), where the state of the system is defined at the beginning of each period, based upon which appropriate actions are to be decided. An algorithmic approach based on policy iteration is used to generate the optimal policy. A small example is presented to verify the developed MDP model. Then, Sensitivity analysis is conducted to investigate the effects of changing some key parameters of the studied problem on the generated optimal policy.
However, the curse of dimensionality limits the application of the policy iteration algorithm for medium and large size instances. Therefore, this thesis proposes an approach based on deep reinforcement learning (DRL). This approach integrates Q-learning along with neural networks (NN) in order to guide the machine on learning the optimal policy. In order to verify the proposed DRL approach, the small example is solved again, and the results with the policy iteration algorithm are compared.
The comparison of the two techniques showed that they are both effective in solving the developed MDP model with small cost deviations of about 0.2%. For small-size problems, it is more time-effective to use algorithmic approaches. While for larger size problems, it is inevitable to use DRL. The challenge with DRL is to optimize the parameters of the model in order to achieve the best results. In this thesis, design of experiments is used to select the levels of the different DRL parameters that yield the best results.
| Abstract |