北京交通大学 北京交通大学智慧高铁系统前沿科学中心,北京 100044
郭彬(1988-),男,河北唐山人,讲师,博士,从事轨道交通运输组织、机器学习与最优化方法等研究;E-mail:guobin@bjtu.edu.cn
收稿:2025-12-04,
网络首发:2026-02-11,
移动端阅览
苏翔宇, 乐逸祥, 郭彬, 等. 基于改进PPO算法的动车组交路计划优化方法[J/OL]. 铁道科学与工程学报, 2026,1-14.
SU Xiangyu, YUE Yixiang, GUO Bin, et al. Optimization method for rolling stock circulation planning based on an improved PPO algorithm[J/OL]. Journal of Railway Science and Engineering, 2026, 1-14.
苏翔宇, 乐逸祥, 郭彬, 等. 基于改进PPO算法的动车组交路计划优化方法[J/OL]. 铁道科学与工程学报, 2026,1-14. DOI: 10.19713/j.cnki.43-1423/u.T20251880.
SU Xiangyu, YUE Yixiang, GUO Bin, et al. Optimization method for rolling stock circulation planning based on an improved PPO algorithm[J/OL]. Journal of Railway Science and Engineering, 2026, 1-14. DOI: 10.19713/j.cnki.43-1423/u.T20251880.
动车组交路计划作为铁路运营组织的重要组成部分,直接影响运输效率及整体运营成本。针对该优化问题中存在的高维变量组合状态、复杂时序依赖及多重运行约束等挑战,设计了一种结合深度强化学习与启发式搜索思想的混合优化方法。该方法首先将动车组交路优化问题抽象为马尔可夫决策过程,通过状态空间描述剩余列车任务、当前交路状态与段所资源情况,动作空间定义列车任务选择及交路构建操作,奖励函数综合考虑动车组使用数量、列车接续合理性及检修等关键约束,以全面刻画优化目标。为了有效建模时序依赖关系,引入基于长短期记忆网络(long short-term memory, LSTM)的近端策略优化(proximal policy optimization, PPO)算法,提升策略网络对动态任务序列的建模能力,增强全局探索与策略收敛稳定性。考虑到强化学习在训练初期易陷入局部最优,引入模拟退火机制作为局部扰动算子,对策略输出的初始交路计划进行扰动修正,通过温度控制机制实现跳出局部最优解的能力。在东北路网算例中的数值实验表明,所设计的混合优化算法在动车组使用数量、整体求解效率等性能指标上,均显著优于传统深度强化学习方法(如 PPO 和 DQN)以及商业优化软件Gurobi。实验结果显示,该方法可在较短时间内生成质量较优的交路方案,有效提升铁路运输组织的智能化水平与经济效益。
Rolling stock circulation planning is a fundamental component of high-speed railway operations
directly influencing transport efficiency and overall operating costs. To address the challenges arising from high-dimensional combinatorial states
complex temporal dependencies
and multiple operational constraints
this study proposes a hybrid intelligent optimization method that integrates deep reinforcement learning with heuristic search. The circulation planning problem is formulated as a Markov Decision Process
where the state space describes the remaining train tasks
current circulation structure
and depot resource status
while the action space defines task selection and circulation construction. An LSTM-enhanced Proximal Policy Optimization (PPO) algorithm is developed to capture temporal dependencies within task sequences and improve policy stability and global exploration. Furthermore
a simulated annealing mechanism is incorporated as a local search module to refine initial solutions and avoid premature convergence by probabilistically accepting perturbed solutions under a temperature-controlled scheme. Numerical experiments based on the Northeast China high-speed railway network demonstrate that the proposed method outperforms conventional PPO
DQN
and the commercial solver Gurobi in terms of rolling stock usage and computational efficiency. The results confirm that the hybrid approach can generate high-quality circulation plans within reasonable computational time
contributing to improved intelligence and economic performance in high-speed railway operations.
魏利亨 , 赵鹏 , 乔珂 , 等 . 高速铁路动车组运用优化研究综述 [J ] . 北京交通大学学报 , 2025 , 49 ( 2 ): 1 - 13 .
WEI Liheng , ZHAO Peng , QIAO Ke , et al . A review of optimization research on high-speed railway emu operation [J ] . Journal of Beijing Jiaotong University , 2025 , 49 ( 2 ): 1 - 13 .
FIOOLE P J , KROON L , MARÓTI G , et al . A rolling stock circulation model for combining and splitting of passenger trains [J ] . European Journal of Operational Research , 2006 , 174 ( 2 ): 1281 - 1297 .
王莹 , 刘军 , 苗建瑞 . 基于列生成算法的动车组检修计划优化 [J ] . 中国铁道科学 , 2010 , 31 ( 2 ): 115 ? 120 .
WANG Ying , LIU Jun , MIAO Jianrui . Column generation algorithms based optimization method for maintenance scheduling of multiple units [J ] . China Railway Science , 2010 , 31 ( 2 ): 115 ? 120 .
LUSBY R M , HAAHR J T , LARSEN J , et al . A Branch-and-Price algorithm for railway rolling stock rescheduling [J ] . Transportation Research Part B: Methodological , 2017 , 99 : 228 - 250 .
杨军 , 杨浩 , 卢海波 . 遗传算法在动车组周转优化模型中的应用 [J ] . 铁道运输与经济 , 2004 , 26 ( 7 ): 65 ? 67 .
YANG Jun , YANG Hao , LU Haibo . Application of genetic algorithm in optimizing emu circulating model [J ] . Railway Transport and Economy , 2004 , 26 ( 7 ): 65 ? 67 .
佟璐 , 聂磊 , 赵鹏 . 蚁群算法在动车组运用问题中的应用 [J ] . 交通运输系统工程与信息 , 2009 , 9 ( 6 ): 161 ? 167 .
TONG Lu , NIE Lei , ZHAO Peng . Application of ant colony algorithm in train-set scheduling problem [J ] . Journal of Transportation Systems Engineering and Information Technology , 2009 , 9 ( 6 ): 161 ? 167 .
郭倩倩 , 王振宇 , 林柏梁 . 离所时间限制下动车组交路计划优化研究 [J ] . 铁道学报 , 2023 , 45 ( 11 ): 11 - 19 .
GUO Qianqian , WANG Zhenyu , LIN Boliang . Research on optimization of emu routing plan under time limit of departure from station [J ] . Journal of the China Railway Society , 2023 , 45 ( 11 ): 11 - 19 .
CACCHIANI V , CAPRARA A , TOTH P . A Lagrangian heuristic for a train-unit assignment problem [J ] . Discrete Applied Mathematics , 2013 , 161 ( 12 ): 1707 - 1718 .
GAO Yuan , SCHMIDT M , YANG Lixing , et al . A branch-and-price approach for trip sequence planning of high-speed train units [J ] . Omega , 2020 , 92 : 102150 .
GAO Yuan , XIA Jun , D'ARIANO A , et al . Weekly rolling stock planning in Chinese high-speed rail networks [J ] . Transportation Research Part B: Methodological , 2022 , 158 : 295 - 322 .
WANG Ying , GAO Yuan , YU Xiaoyuan , et al . Optimization models for high-speed train unit routing problems [J ] . Computers & Industrial Engineering , 2019 , 127 : 1273 - 1281 .
CADARSO L , MARÍN Á . Improving robustness of rolling stock circulations in rapid transit networks [J ] . Computers & Operations Research , 2014 , 51 : 146 - 159 .
NISHI T , OHNO A , INUIGUCHI M , et al . A Combined column generation and heuristics for railway short-term rolling stock planning with regular inspection constraints [J ] . Computers & Operations Research , 2017 , 81 : 14 - 25 .
耿敬春 , 肖荣国 , 倪少权 , 等 . 客运专线动车组周期性运用计划编制的研究 [J ] . 铁道学报 , 2006 ,( 04 ): 17 - 21 .
GENG Jingchun , XIAO Rongguo , NI Shaoquan , et al . Research on periodicity of motor train set scheduling for special lines for passenger traffic [J ] . Journal of the China Railway Society , 2006 , 28 ( 4 ): 17 - 21 .
ZHONG Qingwei , LUSBY R M , LARSEN J , et al . Rolling stock scheduling with maintenance requirements at the Chinese High-Speed Railway [J ] . Transportation Research Part B: Methodological , 2019 , 126 : 24 - 44 .
BELLO I , PHAM H , LE Q V , et al . Neural combinatorial optimization with reinforcement learning [EB/OL ] . 2016 : arXiv :1611. 09940 , https://arxiv.org/abs/1611.09940 https://arxiv.org/abs/1611.09940 .
DAI H , KHALIL E B , ZHANG Y , et al . Learning combinatorial optimization algorithms over graphs [EB/OL ] . 2017 : arXiv :1704. 01665 , https://arxiv.org/abs/1704.01665 https://arxiv.org/abs/1704.01665 .
ŠEMROV D , MARSETIČ R , ŽURA M , et al . Reinforcement learning approach for train rescheduling on a single-track railway [J ] . Transportation Research Part B: Methodological , 2016 , 86 : 250 - 267 .
LI Wenqing , NI Shaoquan . Train timetabling with the general learning environment and multi-agent deep reinforcement learning [J ] . Transportation Research Part B: Methodological , 2022 , 157 : 230 - 251 .
SCHULMAN J , WOLSKO F , DHARIWAL P , et al . Proximal policy optimization algorithms [EB/OL ] . 2017 : arXiv : 1707 . 06347 . https://arxiv.org/abs/1707.06347 https://arxiv.org/abs/1707.06347 .
CHEN Wei , ZHANG Zequn , TANG Dunbing , et al . Probing an LSTM-PPO-Based reinforcement learning algorithm to solve dynamic job shop scheduling problem [J ] . Computers & Industrial Engineering , 2024 , 197 : 110633 .
袁博 , 李博 , 郭一唯 , 等 . 基于车次互换的动车组交路方案优化方法研究 [J ] . 铁道科学与工程学报 , 2025 , 22 ( 11 ): 4858 - 4869 .
YUAN Bo , LI Bo , GUO Yiwei , et al . Research on the optimization method of emu train operation plan based on train set interchange [J ] . Journal of Railway Science and Engineering , 2025 , 22 ( 11 ): 4858 - 4869 .
HU Liang , LIU Zhenyu , HU Weifei , et al . Petri-net-based dynamic scheduling of flexible manufacturing system via deep reinforcement learning with graph convolutional network [J ] . Journal of Manufacturing Systems , 2020 , 55 : 1 - 14 .
0
浏览量
0
下载量
0
CSCD
关联资源
相关文章
相关作者
相关机构
京公网安备11010802024621