Help ?

IGMIN: あなたがここにいてくれて嬉しいです. お願いクリック '新しいクエリを作成してください' 当ウェブサイトへの初めてのご訪問で、さらに情報が必要な場合は.

すでに私たちのネットワークのメンバーで、すでに提出した質問に関する進展を追跡する必要がある場合は, クリック '私のクエリに連れて行ってください.'

Abstract

要約 at IgMin Research

私たちの使命は、学際的な対話を促進し、広範な科学領域にわたる知識の進展を加速することです.

Technology Group Review Article Article ID: igmin210

Exploring Markov Decision Processes: A Comprehensive Survey of Optimization Applications and Techniques

Machine Learning RoboticsData Science Affiliation

Affiliation

    Department of Computer Engineering, Jeju National University, Jejusi 63243, Jeju Special Self-Governing Province, Republic of Korea

Abstract

Markov decision process is a dynamic programming algorithm that can be used to solve an optimization problem. It was used in applications like robotics, radar tracking, medical treatments, and decision-making. In the existing literature, the researcher only targets a few applications area of MDP. However, this work surveyed the Markov decision process’s application in various regions for solving optimization problems. In a survey, we compared optimization techniques based on MDP. We performed a comparative analysis of past work of other researchers in the last few years based on a few parameters. These parameters are focused on the proposed problem, the proposed methodology for solving an optimization problem, and the results and outcomes of the optimization technique in solving a specific problem. Reinforcement learning is an emerging machine learning domain based on the Markov decision process. In this work, we conclude that the MDP-based approach is most widely used when deciding on the current state in some environments to move to the next state.

Figures

References

    1. Goyal V, Grand-Clement J. Robust Markov decision processes: Beyond rectangularity. Math Oper Res. 2023;48(1):203-26. Available from: https://dl.acm.org/doi/10.1287/moor.2022.1259
    2. Alsheikh MA, Lin S, Niyato D, Tan HP, Han Z. Markov decision processes with applications in wireless sensor networks: A survey. IEEE Commun Surv Tutor. 2015;17(3):1239-67. Available from: https://arxiv.org/abs/1501.00644
    3. Bazrafshan N, Lotfi MM. A finite-horizon Markov decision process model for cancer chemotherapy treatment planning: an application to sequential treatment decision making in clinical trials. Ann Oper Res. 2020;295(1):483-502. Available from: https://ideas.repec.org/a/spr/annopr/v295y2020i1d10.1007_s10479-020-03706-5.html
    4. Yao Q, Guo X, Wang Y, Liang H, Wu K. Adversarial decision-making for moving target defense: a multi-agent Markov game and reinforcement learning approach. Entropy. 2023;25(4):605. Available from: https://pubmed.ncbi.nlm.nih.gov/37190393/
    5. Zheng J. Optimal policy for dynamically changing system controls in moving target defense [dissertation]. 2020. Available from: https://ttu-ir.tdl.org/items/26335752-875d-4219-a0eb-795dd653bf78
    6. Zhang SP, Suen SC, Sundaram V, Gong CL. Quantifying the benefits of increasing decision-making frequency for health applications with regular decision epochs. IISE Trans. 2024:1-15. Available from: https://www.tandfonline.com/doi/pdf/10.1080/24725854.2024.2321492
    7. Bozkus T, Mitra U. Link analysis for solving multiple-access MDPs with large state spaces. IEEE Trans Signal Process. 2023;71:947-62. Available from: https://ieeexplore.ieee.org/document/10078382/authors#authors
    8. Xu Z, Song Z, Shrivastava A. A tale of two efficient value iteration algorithms for solving linear MDPs with large action space. In: International Conference on Artificial Intelligence and Statistics. PMLR; 2023;206:788-836. Available from: https://proceedings.mlr.press/v206/xu23b.html
    9. Ghatrani Z, Ghate A. Inverse Markov decision processes with unknown transition probabilities. IISE Trans. 2023;55(6):588-601. Available from: https://www.tandfonline.com/doi/full/10.1080/24725854.2022.2103755
    10. Low SM, Kumar A, Sanner S. Safe MDP planning by learning temporal patterns of undesirable trajectories and averting negative side effects. In: Proceedings of the International Conference on Automated Planning and Scheduling. 2023;33(1). Available from: https://doi.org/10.48550/arXiv.2304.03081
    11. Wang Y, Xu Z, Liu Y, Chen X, Qiu S, Yu Y. Robust average-reward Markov decision processes. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2023;37(12): AAAI-23 Special Tracks. Available from: https://doi.org/10.1609/aaai.v37i12.26775
    12. Valeev S, Kondratyeva N. Large scale system management based on Markov decision process and big data concept. In: 2016 IEEE 10th International Conference on Application of Information and Communication Technologies (AICT). IEEE; 2016. Available from: https://ieeexplore.ieee.org/document/7991829
    13. Winder J. Concept-aware feature extraction for knowledge transfer in reinforcement learning. In: AAAI Workshops. 2018. Available from: https://cdn.aaai.org/ocs/ws/ws0470/16910-76005-1-PB.pdf
    14. Johnson FA, Fackler PL, Boomer GS, Zimmerman GS, Williams BK, Nichols JD, et al. State-dependent resource harvesting with lagged information about system states. PLoS One. 2016;11(6). Available from: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0157494
    15. Pourmoayed R, Nielsen LR, Kristensen AR. A hierarchical Markov decision process modeling feeding and marketing decisions of growing pigs. Eur J Oper Res. 2016;250(3):925-938. Available from: https://www.sciencedirect.com/science/article/abs/pii/S0377221715008802
    16. Morato PG, Papakonstantinou KG, Andriotis CP, Nielsen JS, Rigo P. Optimal inspection and maintenance planning for deteriorating structures through dynamic Bayesian networks and Markov decision processes. 2021. Available from: https://arxiv.org/abs/2009.04547
    17. Boucherie RJ, Van Dijk NM. Markov decision processes in practice. Switzerland: Springer; 2017;248. Available from: https://research.utwente.nl/en/publications/markov-decision-processes-in-practice
    18. Butkova Y, Hatefi H, Hermanns H, Krcal J. Optimal continuous time Markov decisions. In: International Symposium on Automated Technology for Verification and Analysis. Springer, Cham; 2015. Available from: https://arxiv.org/abs/1507.02876
    19. Van Heerde HJ, Neslin SA. Sales promotion models. In: Handbook of Marketing Decision Models. Springer, Cham; 2017;13-77. Available from: https://ideas.repec.org/h/spr/isochp/978-3-319-56941-3_2.html
    20. Zhang Z, Tian Y. A novel resource scheduling method of netted radars based on Markov decision process during target tracking in clutter. EURASIP J Adv Signal Process. 2016;2016:9. Available from: https://www.infona.pl/resource/bwmeta1.element.springer-doi-10_1186-S13634-016-0309-3
    21. Conesa D, Martínez-Beneito MA, Amorós R, López-Quílez A. Bayesian hierarchical Poisson models with a hidden Markov structure for the detection of influenza epidemic outbreaks. Stat Methods Med Res. 2015 Apr;24(2):206-23. Available from: https://pubmed.ncbi.nlm.nih.gov/21873301/
    22. Cheung WC, Simchi-Levi D, Zhu R. Reinforcement learning for non-stationary Markov decision processes: The blessing of (more) optimism. 2020. Available from: https://arxiv.org/abs/2006.14389
    23. Killian T, Konidaris G, Doshi-Velez F. Transfer learning across patient variations with hidden parameter Markov decision processes. 2016. Available from: https://arxiv.org/pdf/1612.00475
    24. Geist M, Scherrer B, Pietquin O. A theory of regularized Markov decision processes. 2019. Available from: https://arxiv.org/abs/1901.11275
    25. Wei CY, Jafarnia-Jahromi M, Luo H, Sharma H, Jain R. Model-free reinforcement learning in infinite-horizon average-reward Markov decision processes. 2019. Available from: https://arxiv.org/abs/1910.07072
    26. Wachi A, Sui Y. Safe reinforcement learning in constrained Markov decision processes. 2020. Available from: https://arxiv.org/abs/2008.06626
    27. Lim SH, Xu H, Mannor S. Reinforcement learning in robust Markov decision processes. Math Oper Res. 2016;41(4):1325-53. Available from: https://pubsonline.informs.org/doi/abs/10.1287/moor.2016.0779
    28. Le TP, Vien NA, Chung TC. A deep hierarchical reinforcement learning algorithm in partially observable Markov decision processes. IEEE Access. 2018;6:49089-102. Available from: https://ieeexplore.ieee.org/document/8421749
    29. Modi A, Tewari A. Contextual Markov decision processes using generalized linear models. 2019. Available from: https://openreview.net/pdf?id=Bklh0SiQiN
    30. Lee K, Choi S, Oh S. Sparse Markov decision processes with causal sparse Tsallis entropy regularization for reinforcement learning. 2017. Available from: https://arxiv.org/abs/1709.06293
    31. Ding T, Zeng Z, Bai J, Qin B, Yang Y, Shahidehpour M. Optimal electric vehicle charging strategy with Markov decision process and reinforcement learning technique. IEEE Trans Ind Appl. 2020;56(5):5811-23. Available from: https://vbn.aau.dk/ws/portalfiles/portal/331224034/final.pdf
    32. Wang Z, Qiu S, Wei X, Yang Z, Ye J. Upper confidence primal-dual reinforcement learning for CMDP with adversarial loss. In: Adv Neural Inf Process Syst. 2020;33. Available from: https://arxiv.org/abs/2003.00660
    33. Wei Z, Xu J, Lan Y, Guo J, Cheng X. Reinforcement learning to rank with Markov decision process. In: Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval; 2017:945-948. Available from: https://dl.acm.org/doi/10.1145/3077136.3080685
    34. Selvi E, Buehrer RM, Martone A, Sherbondy K. On the use of Markov decision processes in cognitive radar: An application to target tracking. In: 2018 IEEE Radar Conference (RadarConf18). IEEE; 2018. Available from: https://ieeexplore.ieee.org/document/8378616
    35. Ruan A, Shi A, Qin L, Xu S, Zhao Y. A reinforcement learning-based Markov-Decision Process (MDP) implementation for SRAM FPGAs. IEEE Trans Circuits Syst. 2020;67:2124-2128. Available from: https://ieeexplore.ieee.org/document/8850046
    36. De Giacomo G, Calvanese D, Dalmonte T, De Masellis R, Orsi G. Digital twin composition in smart manufacturing via Markov decision processes. Comput Ind. 2023;149:103916. Available from: https://www.sciencedirect.com/science/article/pii/S0166361523000660
    37. Rosenberg A, Mansour Y. Online convex optimization in adversarial Markov decision processes. 2019. Available from: https://arxiv.org/abs/1905.07773
    38. Chen CT, Chen AP, Huang SH. Cloning strategies from trading records using agent-based reinforcement learning algorithm. In: 2018 IEEE International Conference on Agents (ICA); 2018 Jul; IEEE. p. 34-37. Available from: https://ieeexplore.ieee.org/document/8460078
    39. Archibald TW, Possani E. Investment and operational decisions for start-up companies: a game theory and Markov decision process approach. Ann Oper Res. 2019:1-14.
    40. Bai Y, Meng J, Meng F, Fang G. Stochastic analysis of a shale gas investment strategy for coping with production uncertainties. Energy Policy. 2020;144:111639. Available from: https://ideas.repec.org/a/eee/enepol/v144y2020ics0301421520303748.html
    41. Nasir A, Khursheed A, Ali K, Mustafa F. A Markov Decision Process Model for Optimal Trade of Options Using Statistical Data. Comput Econ. 2020;58:327-346. Available from: https://link.springer.com/article/10.1007/s10614-020-10030-4
    42. Hambly B, Xu R, Yang H. Recent advances in reinforcement learning in finance. Math Finance. 2023;33(3):437-503. Available from: https://onlinelibrary.wiley.com/doi/epdf/10.1111/mafi.12382
    43. Huong TT, Thanh NH, Van NT, Dat NT, Van Long N, Marshall A. Water and energy-efficient irrigation based on Markov decision model for precision agriculture. In: 2018 IEEE Seventh International Conference on Communications and Electronics (ICCE); 2018 Jul; IEEE. p. 51-56. Available from: https://ieeexplore.ieee.org/document/8465723
    44. Bu F, Wang X. A smart agriculture IoT system based on deep reinforcement learning. Future Gener Comput Syst. 2019;99:500-507. Available from: https://typeset.io/papers/a-smart-agriculture-iot-system-based-on-deep-reinforcement-226i9iipdo?citations_has_pdf=true
    45. Toai TK, Huan VM. Implementing the Markov Decision Process for Efficient Water Utilization with Arduino Board in Agriculture. In: 2019 International Conference on System Science and Engineering (ICSSE); 2019 Jul; IEEE. p. 335-340. Available from: https://ieeexplore.ieee.org/document/8823432
    46. Pan W, Wang J, Yang W. A cooperative scheduling based on deep reinforcement learning for multi-agricultural machines in emergencies. Agriculture. 2024;14(5):772. Available from: https://www.mdpi.com/2077-0472/14/5/772
    47. Liu D, Khoukhi L, Hafid A. Data offloading in mobile cloud computing: A Markov decision process approach. In: 2017 IEEE International Conference on Communications (ICC); 2017 May; IEEE. p. 1-6. Available from: https://ieeexplore.ieee.org/document/7997070
    48. Li M, Carter A, Goldstein J, Hawco T, Jensen J, Vanberkel P. Determining Ambulance Destinations When Facing Offload Delays Using a Markov Decision Process. Omega. 2021;101:102251. Available from: https://www.sciencedirect.com/science/article/abs/pii/S0305048319308229
    49. Parras J, Zazo S. Learning attack mechanisms in wireless sensor networks using Markov decision processes. Expert Syst Appl. 2019;122:376-387. Available from: https://www.sciencedirect.com/science/article/abs/pii/S0957417419300235
    50. Li X, Fang Z, Yin C. A machine tool matching method in cloud manufacturing using Markov Decision Process and cross-entropy. Robot Comput Integr Manuf. 2020;65:101968. Available from: https://www.sciencedirect.com/science/article/abs/pii/S0736584519300924
    51. Li Z. An adaptive overload threshold selection process using Markov decision processes of virtual machine in cloud data center. Clust Comput. 2019;22(2):3821-3833. Available from: https://link.springer.com/article/10.1007/s10586-018-2408-4
    52. Yousefi S, Derakhshan F, Karimipour H, Aghdasi HS. An efficient route planning model for mobile agents on the Internet of Things using Markov decision process. Ad Hoc Netw. 2020;98:102053. Available from: https://www.sciencedirect.com/science/article/abs/pii/S1570870519309527
    53. Njilla LL, Kamhoua CA, Kwiat KA, Hurley P, Pissinou N. Cyber security resource allocation: a Markov decision process approach. In: 2017 IEEE 18th International Symposium on High Assurance Systems Engineering (HASE); 2017 Jan; IEEE. p. 49-52. Available from: https://ieeexplore.ieee.org/abstract/document/7911870/
    54. Chitsaz B, Cosenza B, Gupta V, Thain D, Mackay S. Scaling power management in cloud data centers: A multi-level continuous-time MDP approach. IEEE Trans Serv Comput. 2024;1-12. Available from: https://ieeexplore.ieee.org/abstract/document/10400800
    55. Duan J, Lv C, Xing Y, Du H, Cheng B, Sangiovanni-Vincentelli AL. Hierarchical reinforcement learning for self-driving decision-making without reliance on labeled driving data. IET Intell Transp Syst. 2020;14(5):297-305. Available from: https://ietresearch.onlinelibrary.wiley.com/doi/full/10.1049/iet-its.2019.0317
    56. Kamrani M, Rakha H, Ma Y. Applying Markov decision process to understand driving decisions using basic safety messages data. Transp Res Part C Emerg Technol. 2020;115:102642. Available from: https://www.sciencedirect.com/science/article/abs/pii/S0968090X20305490
    57. Qi X, Jiang R, Li K, Wang W, Qi J. Deep reinforcement learning enabled self-learning control for energy-efficient driving. Transp Res Part C Emerg Technol. 2019;99:67-81. Available from: https://www.sciencedirect.com/science/article/abs/pii/S0968090X18318862
    58. Ghosh S, Topcu U, Chong E, Etigowni S, Fainekos G, Kakade U. Model, data and reward repair: Trusted machine learning for Markov Decision Processes. In: 2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN-W); 2018; IEEE. Available from: https://ieeexplore.ieee.org/abstract/document/8416249
    59. Song Y, Han S, Huh K. A self-driving decision making with reachable path analysis and interaction-aware speed profiling. IEEE Access. 2023;11:122302-122314. Available from: https://ieeexplore.ieee.org/abstract/document/10301421
    60. de Almeida Costa M, de Azevedo Peixoto Braga JP, Ramos Andrade A. A data-driven maintenance policy for railway wheelsets based on survival analysis and Markov decision process. Qual Reliab Eng Int. 2021;37:176-198. Available from:
    61. Ao Y, Zhang H, Wang C. Research of an integrated decision model for production scheduling and maintenance planning with economic objectives. Comput Ind Eng. 2019;137:106092. Available from: https://www.sciencedirect.com/science/article/abs/pii/S0360835219305613
    62. Gerum PCL, Altay A, Baykal-Gürsoy M. Data-driven predictive maintenance scheduling policies for railways. Transp Res Part C Emerg Technol. 2019;107:137-154. Available from: https://www.sciencedirect.com/science/article/abs/pii/S0968090X18314918
    63. Arcieri G, Masegosa AD, Stella F, Vercellis C. POMDP inference and robust solution via deep reinforcement learning: An application to railway optimal maintenance. Mach Learn. 2024:1-29. Available from: https://link.springer.com/article/10.1007/s10994-024-06559-2

Similar Articles

Dimensioning of Splices Using the Magnetic System
Ryszard Błażej, Leszek Jurdziak, Agata Kirjanów-Błażej, Paweł Kostrzewa and Aleksandra Rzeszowska
DOI10.61927/igmin204