fix(StoreQueue): add nc_req_ack state to avoid duplicated request (#4625)
Bug Discovery
The Svpbmt CI of master at https://github.com/OpenXiangShan/XiangShan/actions/runs/14639358525/job/41077890352 reported the following implicit output error:
check_misa_h PASSED test_pbmt_perf TEST: read 4 Bytes 1000 times Svpbmt IO test... addr:0x10006d000 start: 8589, end: 59845, ticks: 51256 Svpbmt NC test... addr:0x10006c000 start: 67656, end: 106762, ticks: 39106 Svpbmt NC OUTSTANDING test... smblockctl = 0x3f7 addr:0x10006c000 start: 118198, end: 134513, ticks: 16315 Svpbmt PMA test... addr:0x100000000 start: 142696, end: 144084, ticks: 1388 PASSED test_pbmt_ldld_violate ERROR: untested exception! cause NO: 5 (mhandler, 219) [FORK_INFO pid(1251274)] clear processes... Core 0: HIT GOOD TRAP at pc = 0x80005d64 Core-0 instrCnt = 174,141, cycleCnt = 240,713, IPC = 0.723438
Design Background
For NC (Non-Cacheable) store operations, the handshake logic between the StoreQueue and Uncache is as follows:
Without Outstanding Enabled:
In thenc_idle
state, when an executablenc store
is encountered, it transitions to thenc_req
state. Afterreq.fire
, it moves to thenc_resp
state. Onceresp.fire
is triggered, it returns tonc_idle
, and bothrdataPtrExtNext
anddeqPtrExtNext
are updated to handle the next request.With Outstanding Enabled:
In thenc_idle
state, upon encountering an executablenc store
, it transitions to thenc_req
state. Afterreq.fire
, it returns tonc_idle
(Point A). Once the request is fully written into Uncache, i.e., upon receivingncSlaveAck
(Point B), it updatesrdataPtrExtNext
anddeqPtrExtNext
to handle the next request.Bug Description
In the above scenario, since the transition to
nc_idle
at Point A occurs earlier (by two cycles) than Point B due to timing differences, therdataPtr
at Point A still points to the location of the previous uncache request (let’s call it NC1). The condition for sending uncache request is still met at this moment, leading Point A to issue a duplicateuncache
request for NC1.By the time Point B occurs, two identical requests for NC1 have already been sent. At Point B,
rdataPtr
is updated to proceed to the next request. However, when the secondncSlaveAck
for NC1 returns,rdataPtr
is updated again, causing it to move forward twice for a single request. This eventually results in one of the following requests never being executed.Bug Fix
Given that multiple cycles are required to ensure that a request is fully written to Uncache, a new state called
nc_req_ack
is introduced. The revised handshake logic with outstanding enabled is as follows:In the
nc_idle
state, when an executablencstore
is encountered, it transitions to thenc_req
state. Afterreq.fire
, it moves to thenc_req_ack
state. Once the request is fully written to Uncache andncSlaveAck
is received, it transitions back tonc_idle
, and updatesrdataPtrExtNext
anddeqPtrExtNext
to handle the next request.
XiangShan
XiangShan (香山) is an open-source high-performance RISC-V processor project.
中文说明在此。
Documentation
XiangShan’s documentation is available at docs.xiangshan.cc.
The microarchitecture documentation on docs.xiangshan.cc is currently outdated for the latest version (Kunminghu). An updated version is in progress.
XiangShan User Guide has been published separately. You can find it at XiangShan-User-Guide/releases.
Publications
MICRO 2022: Towards Developing High Performance RISC-V Processors Using Agile Methodology
Our paper introduces XiangShan and the practice of agile development methodology on high performance RISC-V processors. It covers some representative tools we have developed and used to accelerate the chip development process, including design, functional verification, debugging, performance validation, etc. This paper is awarded all three available badges for artifact evaluation (Available, Functional, and Reproduced).
Paper PDF | IEEE Xplore | BibTeX | Presentation Slides | Presentation Video
Follow us
Wechat/微信:香山开源处理器
Zhihu/知乎:香山开源处理器
Weibo/微博:香山开源处理器
You can contact us through our mailing list. All mails from this list will be archived here.
Architecture
The first stable micro-architecture of XiangShan is called Yanqihu (雁栖湖) and is on the yanqihu branch, which has been developed since June 2020.
The second stable micro-architecture of XiangShan is called Nanhu (南湖) and is on the nanhu branch.
The current version of XiangShan, also known as Kunminghu (昆明湖), is still under development on the master branch.
The micro-architecture overview of Kunminghu (昆明湖) is shown below.
Sub-directories Overview
Some of the key directories are shown below.
IDE Support
bsp
IDEA
Generate Verilog
make verilog
to generate verilog code. The output file isbuild/XSTop.v
.Makefile
for more information.Run Programs by Simulation
Prepare environment
NEMU_HOME
to the absolute path of the NEMU project.NOOP_HOME
to the absolute path of the XiangShan project.AM_HOME
to the absolute path of the AM project.mill
. Refer to the Manual section in this guide.make init
to initialize submodules.Run with simulator
make emu
to build the C++ simulator./build/emu
with Verilator../build/emu --help
for run-time arguments of the simulator.Makefile
andverilator.mk
for more information.Example:
Troubleshooting Guide
Troubleshooting Guide
Acknowledgement
The implementation of XiangShan is inspired by several key papers. We list these papers in XiangShan document, see: Acknowledgements. We very much encourage and expect that more academic innovations can be realised based on XiangShan in the future.
LICENSE
Copyright © 2020-2025 Institute of Computing Technology, Chinese Academy of Sciences.
Copyright © 2021-2025 Beijing Institute of Open Source Chip
Copyright © 2020-2022 by Peng Cheng Laboratory.
XiangShan is licensed under Mulan PSL v2.