Reinforcement Learning for Shortest Path Problem on Stochastic Time-dependent Road Network
编号:1971 访问权限:仅限参会人 更新:2021-12-03 14:43:49 浏览:129次 张贴报告

报告开始:2021年12月17日 08:44(Asia/Shanghai)

报告时间:1min

所在会场:[P2] Poster2021 [P2T1] Track 1 Advanced Transportation Information and Control Engineering

演示文件

提示:该报告下的文件权限为仅限参会人,您尚未登录,暂时无法查看。

摘要
Finding a shortest path between two locations on stochastic time-dependent road network is an important constituent in vehicle guidance system. However, it is difficult for traditional heuristic algorithm to handle the complexity and stochasticity within the road network. In this paper, we model the stochastic time-dependent routing problem as a Markov decision process and utilize several reinforcement learning methods to solve this problem, such as Sarsa, Q-learning and Double Q-learning method. Sarsa method uses the actual Q value for iteration instead of the maximum value function used by Q-Learning, while Double Q-learning utilizes two estimators to compute the value function, which can overcome the shortcoming of overestimation. Evaluated on ten stochastic time-dependent road networks, we can draw the conclusion that Double Q-learning method outperforms other methods. Finally, the optimal paths acquired at different epochs are visualized to display the process of agent exploration.
关键词
CICTP
报告人
Ke Zhang
Tsinghua University

稿件作者
Ke Zhang Tsinghua University
发表评论
验证码 看不清楚,更换一张
全部评论
重要日期
  • 会议日期

    12月17日

    2021

    12月20日

    2021

  • 12月16日 2021

    报告提交截止日期

  • 12月24日 2021

    注册截止日期

主办单位
Chinese Overseas Transportation Association
Chang'an University
联系方式
移动端
在手机上打开
小程序
打开微信小程序
客服
扫码或点此咨询