目录
Yanqin Li

fix(StoreQueue): add nc_req_ack state to avoid duplicated request (#4625)

Bug Discovery

The Svpbmt CI of master at https://github.com/OpenXiangShan/XiangShan/actions/runs/14639358525/job/41077890352 reported the following implicit output error:

check_misa_h                                                          PASSED
test_pbmt_perf                                                        
TEST: read 4 Bytes 1000 times

Svpbmt IO test...
addr:0x10006d000
start: 8589, end: 59845, ticks: 51256

Svpbmt NC test...
addr:0x10006c000
start: 67656, end: 106762, ticks: 39106

Svpbmt NC OUTSTANDING test...
smblockctl = 0x3f7
addr:0x10006c000
start: 118198, end: 134513, ticks: 16315

Svpbmt PMA test...
addr:0x100000000
start: 142696, end: 144084, ticks: 1388
PASSED
test_pbmt_ldld_violate                                                ERROR: untested exception! cause NO: 5
 (mhandler, 219)
[FORK_INFO pid(1251274)] clear processes...
Core 0: HIT GOOD TRAP at pc = 0x80005d64
Core-0 instrCnt = 174,141, cycleCnt = 240,713, IPC = 0.723438

Design Background

For NC (Non-Cacheable) store operations, the handshake logic between the StoreQueue and Uncache is as follows:

  1. Without Outstanding Enabled:
    In the nc_idle state, when an executable nc store is encountered, it transitions to the nc_req state. After req.fire, it moves to the nc_resp state. Once resp.fire is triggered, it returns to nc_idle, and both rdataPtrExtNext and deqPtrExtNext are updated to handle the next request.

  2. With Outstanding Enabled:
    In the nc_idle state, upon encountering an executable nc store, it transitions to the nc_req state. After req.fire, it returns to nc_idle (Point A). Once the request is fully written into Uncache, i.e., upon receiving ncSlaveAck (Point B), it updates rdataPtrExtNext and deqPtrExtNext to handle the next request.

Bug Description

In the above scenario, since the transition to nc_idle at Point A occurs earlier (by two cycles) than Point B due to timing differences, the rdataPtr at Point A still points to the location of the previous uncache request (let’s call it NC1). The condition for sending uncache request is still met at this moment, leading Point A to issue a duplicate uncache request for NC1.

By the time Point B occurs, two identical requests for NC1 have already been sent. At Point B, rdataPtr is updated to proceed to the next request. However, when the second ncSlaveAck for NC1 returns, rdataPtr is updated again, causing it to move forward twice for a single request. This eventually results in one of the following requests never being executed.

Bug Fix

Given that multiple cycles are required to ensure that a request is fully written to Uncache, a new state called nc_req_ack is introduced. The revised handshake logic with outstanding enabled is as follows:

In the nc_idle state, when an executable ncstore is encountered, it transitions to the nc_req state. After req.fire, it moves to the nc_req_ack state. Once the request is fully written to Uncache and ncSlaveAck is received, it transitions back to nc_idle, and updates rdataPtrExtNext and deqPtrExtNext to handle the next request.

3天前10875次提交
目录README.md

XiangShan

XiangShan (香山) is an open-source high-performance RISC-V processor project.

中文说明在此

Documentation

XiangShan’s documentation is available at docs.xiangshan.cc.

The microarchitecture documentation on docs.xiangshan.cc is currently outdated for the latest version (Kunminghu). An updated version is in progress.

XiangShan User Guide has been published separately. You can find it at XiangShan-User-Guide/releases.

Publications

MICRO 2022: Towards Developing High Performance RISC-V Processors Using Agile Methodology

Our paper introduces XiangShan and the practice of agile development methodology on high performance RISC-V processors. It covers some representative tools we have developed and used to accelerate the chip development process, including design, functional verification, debugging, performance validation, etc. This paper is awarded all three available badges for artifact evaluation (Available, Functional, and Reproduced).

Artifacts Available Artifacts Evaluated — Functional Results Reproduced

Paper PDF | IEEE Xplore | BibTeX | Presentation Slides | Presentation Video

Follow us

Wechat/微信:香山开源处理器

Zhihu/知乎:香山开源处理器

Weibo/微博:香山开源处理器

You can contact us through our mailing list. All mails from this list will be archived here.

Architecture

The first stable micro-architecture of XiangShan is called Yanqihu (雁栖湖) and is on the yanqihu branch, which has been developed since June 2020.

The second stable micro-architecture of XiangShan is called Nanhu (南湖) and is on the nanhu branch.

The current version of XiangShan, also known as Kunminghu (昆明湖), is still under development on the master branch.

The micro-architecture overview of Kunminghu (昆明湖) is shown below.

xs-arch-kunminghu

Sub-directories Overview

Some of the key directories are shown below.

.
├── src
│   └── main/scala         # design files
│       ├── device         # virtual device for simulation
│       ├── system         # SoC wrapper
│       ├── top            # top module
│       ├── utils          # utilization code
│       └── xiangshan      # main design code
│           └── transforms # some useful firrtl transforms
├── scripts                # scripts for agile development
├── fudian                 # floating unit submodule of XiangShan
├── huancun                # L2/L3 cache submodule of XiangShan
├── difftest               # difftest co-simulation framework
└── ready-to-run           # pre-built simulation images

IDE Support

bsp

make bsp

IDEA

make idea

Generate Verilog

  • Run make verilog to generate verilog code. The output file is build/XSTop.v.
  • Refer to Makefile for more information.

Run Programs by Simulation

Prepare environment

  • Set environment variable NEMU_HOME to the absolute path of the NEMU project.
  • Set environment variable NOOP_HOME to the absolute path of the XiangShan project.
  • Set environment variable AM_HOME to the absolute path of the AM project.
  • Install mill. Refer to the Manual section in this guide.
  • Clone this project and run make init to initialize submodules.

Run with simulator

  • Install Verilator, the open-source Verilog simulator.
  • Run make emu to build the C++ simulator ./build/emu with Verilator.
  • Refer to ./build/emu --help for run-time arguments of the simulator.
  • Refer to Makefile and verilator.mk for more information.

Example:

make emu CONFIG=MinimalConfig EMU_THREADS=2 -j10
./build/emu -b 0 -e 0 -i ./ready-to-run/coremark-2-iteration.bin --diff ./ready-to-run/riscv64-nemu-interpreter-so

Troubleshooting Guide

Troubleshooting Guide

Acknowledgement

The implementation of XiangShan is inspired by several key papers. We list these papers in XiangShan document, see: Acknowledgements. We very much encourage and expect that more academic innovations can be realised based on XiangShan in the future.

LICENSE

Copyright © 2020-2025 Institute of Computing Technology, Chinese Academy of Sciences.

Copyright © 2021-2025 Beijing Institute of Open Source Chip

Copyright © 2020-2022 by Peng Cheng Laboratory.

XiangShan is licensed under Mulan PSL v2.

关于

Open-source high-performance RISC-V processor

58.1 MB
邀请码
    Gitlink(确实开源)
  • 加入我们
  • 官网邮箱:gitlink@ccf.org.cn
  • QQ群
  • QQ群
  • 公众号
  • 公众号

©Copyright 2023 CCF 开源发展委员会
Powered by Trustie& IntelliDE 京ICP备13000930号