A Reinforcement Learning Framework for Real-Time Personalized Treatment Planning in Clinical Environments
Received: 26 May 2025 | Revised: 9 June 2025 and 20 June 2025 | Accepted: 21 June 2025 | Online: 12 July 2025
Corresponding author: Leela Prasad Gorrepati
Abstract
This paper presents a Reinforcement Learning (RL) framework for real-time, personalized healthcare, aiming to optimize the treatment strategies for individual patients using longitudinal clinical data. The system models the patient-treatment environment as a Partially Observable Markov Decision Process (POMDP), allowing decision-making under uncertainty while integrating multimodal patient information, including Electronic Health Records (EHRs), lab tests, and imaging data. A deep policy network, trained through Proximal Policy Optimization (PPO), dynamically chooses the optimal interventions by balancing the long-term clinical outcomes, risks, costs, and adherence to medical guidelines. The framework combines a model-based simulator for off-policy data augmentation, auxiliary risk predictors to enhance the safety-aware optimization, and interpretable mechanisms to facilitate the clinician trust. Evaluated on more than 50,000 patient records and simulated environments, the proposed model surpassed the existing methods in accuracy, F1-score, Receiver Operating Characteristic-Area Under the Curve (ROC-AUC), and treatment efficiency. Specifically, it achieved 93.6% accuracy and a 0.937 F1-score while reducing the treatment cycles and enhancing safety compliance. These findings highlight the potential of RL to offer adaptive and interpretable decision support in clinical settings, although more real-world testing is necessary to confirm this result.
Keywords:
reinforcement learning, personalized healthcare, treatment optimization, Deep Q-Network (DQN), clinical decision-makingDownloads
References
C. Jiang, B. Hu, Y. Wang, and S. Wu, "Reinforcement learning via nonparametric smoothing in a continuous-time stochastic setting with noisy data," Statistica Sinica, vol. 35, no. 2, pp. 831–852, 2025. DOI: https://doi.org/10.5705/ss.202022.0407
J. Chai, E. Chen, and J. Fan, "Deep Transfer $Q$-Learning for Offline Non-Stationary Reinforcement Learning." arXiv, Apr. 11, 2025.
R. Zhang et al., "Embodied AI-Enhanced Vehicular Networks: An Integrated Large Language Models and Reinforcement Learning Method." arXiv, Jan. 02.
M. A. Mubeen, F. Chen, and K. M. U. Rehman, "Optimization of Silver Nanocluster Geometries: A Deep Reinforcement Learning Approach to Identifying the Most Stable Configurations in Ag15 Cluster," Journal of Chemistry and Environment, vol. 4, no. 1, pp. 1–17, Jun. 2025. DOI: https://doi.org/10.56946/jce.v4i1.589
R. Al-Dmour, H. Al-Dmour, E. Basheer Amin, and A. Al-Dmour, "Impact of AI and big data analytics on healthcare outcomes: An empirical study in Jordanian healthcare institutions," Digital Health, vol. 11, May 2025. DOI: https://doi.org/10.1177/20552076241311051
Z. Nicolaou, "Perspective Chapter: Treating Facial Asymmetries – Significant Points to Take into Consideration for Optimal Results," in Orthodontics - Current Principles and Techniques, IntechOpen, 2025. DOI: https://doi.org/10.5772/intechopen.113398
M. Mazonakis, E. Tzanis, S. Kachris, E. Lyraraki, and J. Damilakis, "A qualitative, quantitative and dosimetric evaluation of a machine learning-based automatic segmentation method in treatment planning for gastric cancer," Physica Medica: European Journal of Medical Physics, vol. 130, Feb. 2025. DOI: https://doi.org/10.1016/j.ejmp.2025.104896
C. SaiTeja and J. B. Seventline, "A hybrid learning framework for multi-modal facial prediction and recognition using improvised non-linear SVM classifier," AIP Advances, vol. 13, no. 2, Feb. 2023, Art. no. 025316. DOI: https://doi.org/10.1063/5.0136623
S. J. Gershman and A. Lak, "Policy Complexity Suppresses Dopamine Responses," Journal of Neuroscience, vol. 45, no. 9, Feb. 2025, Art. no. e1756242024. DOI: https://doi.org/10.1523/JNEUROSCI.1756-24.2024
W. Zhang et al., "A Proton Treatment Planning Method for Combining FLASH and Spatially Fractionated Radiation Therapy to Enhance Normal Tissue Protection." arXiv, May 09, 2025. DOI: https://doi.org/10.2139/ssrn.5095563
M. Al-Asali, A. Y. Alqutaibi, M. Al-Sarem, and F. Saeed, "Deep learning-based approach for 3D bone segmentation and prediction of missing tooth region for dental implant planning," Scientific Reports, vol. 14, no. 1, Jun. 2024, Art. no. 13888. DOI: https://doi.org/10.1038/s41598-024-64609-0
S. Chopparapu and B. S. Joseph, "A hybrid facial features extraction-based classification framework for typhlotic people," Bulletin of Electrical Engineering and Informatics, vol. 13, no. 1, pp. 338–349, Feb. 2024. DOI: https://doi.org/10.11591/eei.v13i1.5628
H. Du et al., "Enhancing Deep Reinforcement Learning: A Tutorial on Generative Diffusion Models in Network Optimization," IEEE Communications Surveys & Tutorials, vol. 26, no. 4, pp. 2611–2646, 2024. DOI: https://doi.org/10.1109/COMST.2024.3400011
F.-M. Luo, T. Xu, H. Lai, X.-H. Chen, W. Zhang, and Y. Yu, "A survey on model-based reinforcement learning," Science China Information Sciences, vol. 67, no. 2, Jan. 2024, Art. no. 121101. DOI: https://doi.org/10.1007/s11432-022-3696-5
A. Mirzaee Moghaddam Kasmaee et al., "ELRL-MD: a deep learning approach for myocarditis diagnosis using cardiac magnetic resonance images with ensemble and reinforcement learning integration," Physiological Measurement, vol. 45, no. 5, Feb. 2024, Art. no. 055011. DOI: https://doi.org/10.1088/1361-6579/ad46e2
P. Jayaraman, J. Desman, M. Sabounchi, G. N. Nadkarni, and A. Sakhuja, "A Primer on Reinforcement Learning in Medicine for Clinicians," npj Digital Medicine, vol. 7, no. 1, Nov. 2024, Art. no. 337. DOI: https://doi.org/10.1038/s41746-024-01316-0
C. Yu, J. Liu, and H. Zhao, "Inverse reinforcement learning for intelligent mechanical ventilation and sedative dosing in intensive care units," BMC Medical Informatics and Decision Making, vol. 19, no. 2, Apr. 2019, Art. no. 57. DOI: https://doi.org/10.1186/s12911-019-0763-6
C. Voloshin, H. M. Le, N. Jiang, and Y. Yue, "Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning." arXiv, Nov. 27, 2021.
A. M. Alghamdi, M. A. Al-Khasawneh, A. Alarood, and E. Alsolami, "The Role of Machine Learning in Managing and Organizing Healthcare Records," Engineering, Technology & Applied Science Research, vol. 14, no. 2, pp. 13695–13701, Apr. 2024. DOI: https://doi.org/10.48084/etasr.7027
M. Rahardi, B. P. Asaddulloh, A. Aminuddin, F. F. Abdulloh, I. Saifudin, and F. P. Kusumawijaya, "Optimizing Machine Learning Models for Class Imbalance in Heart Disease Prediction," Engineering, Technology & Applied Science Research, vol. 15, no. 3, pp. 23599–23604, Jun. 2025. DOI: https://doi.org/10.48084/etasr.10407
S. Chopparapu, G. Chopparapu, and D. Vasagiri, "Enhancing Visual Perception in Real-Time: A Deep Reinforcement Learning Approach to Image Quality Improvement," Engineering, Technology & Applied Science Research, vol. 14, no. 3, pp. 14725–14731, Jun. 2024. DOI: https://doi.org/10.48084/etasr.7500
A. Johnson, T. Pollard, and R. Mark, "MIMIC-III Clinical Database (version 1.4)." PhysioNet, 2016.
Downloads
How to Cite
License
Copyright (c) 2025 Leela Prasad Gorrepati, Ravi Teja Potla

This work is licensed under a Creative Commons Attribution 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain the copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) after its publication in ETASR with an acknowledgement of its initial publication in this journal.
