A FPGA friendly 32 bit RISC-V CPU implementation

tverbeure · 2025-01-26T02:11:05 1737857465

6 years ago, I wrote an in-depth blog post about the design principles of the Vexriscv. It’s unlike any other CPU I’ve seen.

https://tomverbeure.github.io/rtl/2018/12/06/The-VexRiscV-CP...

nynx · 2025-01-26T06:35:36 1737873336

Do you still use spinal? Have there been other advances in HDL that you've seen over the last 6 years?

tverbeure · 2025-01-26T07:37:50 1737877070

Yes, I still use it for all my hobby projects but I don't use any of the advanced techniques that are used in the Vexriscv. My Scale knowledge is way too limited for that. I use SpinalHDL as a more efficient way to write pure RTL.

awjlogan · 2025-01-26T06:40:43 1737873643

This was a really interesting read, thanks.

bri3d · 2025-01-25T18:48:17 1737830897

The most interesting thing about this isn’t that it’s a RISC-V implementation but that it’s written in a Scala HDL language, SpinalHDL. There are quite a few of these now - Chisel (which Spinal forked from long ago), Amaranth (Python), and Clash (Haskell) all come to mind.

fayalalebrun · 2025-01-26T02:37:53 1737859073

SpinalHDL did not fork from Chisel. You might be able to say that it was inspired by Chisel, but it does not share a commit history. See this comment by the author: https://www.reddit.com/r/chisel/comments/4ivevd/comment/d3lj...

bri3d · 2025-01-26T04:13:46 1737864826

Thank you for the correction! My original comment is too old to update. It’s so frequently described as one (for example in https://github.com/SpinalHDL/SpinalHDL/issues/202#issuecomme... ) that I assumed they shared a history.

dailykoder · 2025-01-25T19:50:56 1737834656

Do those languages get used in the industry, except of academia? There are so many HDLs and I am wondering if there is any other benefit of learning any of these, except of possible fun.

IshKebab · 2025-01-26T22:26:18 1737930378

Basically no. Almost everybody uses SystemVerilog. The main issue is that all the simulators only support SystemVerilog so every other HDL is compile-to-SV, and often they output truly awful code that is a nightmare to debug.

Also SV has an absolutely enormous feature set, and often alternative HDLs miss out important parts like support for verification, coverage, formal verification, etc.

Getting away from SV is like getting away from JavaScript. The network effects are insane.

There was an attempt to make a kind of IR for RTL that would break the tie with SV (kind of like WASM has for JS)... I can't remember the name (LL..something?) but it seemed to have died.

Maybe this is similar I'm not sure: https://github.com/llvm/circt

Anyway the only really interesting new HDL I've seen is https://filamenthdl.com/

jeff_ciesielski · 2025-01-25T20:34:55 1737837295

We do a fair bit of FPGA design in SpinalHDL, and have taped out several ASICs with parts of the design done in SpinalHDL at my dayjob.

In general: No, alternative HDLs don't see a lot of use, and I'd argue that we qualify as 'academia' since the ASICs are NIH funded and we tend to work with a lot of academic partners and on low-quantity R&D projects.

Having said that, every time we've deployed SpinalHDL for a commercial client they've been blown away by the results. The standard library, developer ergonomics, test capabilities, and little things like having clock domains as a part of the type system make development so much faster and less error prone that the NRE for doing it in verilog just doesn't make sense.

You get access to the entire Java and Scala ecosystem at elaboration and test time. We deploy ScalaCheck in our test harnesses to automatically generate test cases that can reduce inputs to identify edge cases. It's incredibly powerful.

dailykoder · 2025-01-25T20:47:49 1737838069

Huh, this sounds interesting. Maybe I'll give it a shot. Thanks!

wespiard · 2025-01-25T21:06:20 1737839180

I don't know much about SpinalHDL or Chisel, but one example of an alternate HDL is HardCaml, which is used by JaneStreet for FPGA designs:

https://github.com/janestreet/hardcaml

sdvsfqe · 2025-01-26T00:15:49 1737850549

In hardware everything boils down to volume and NRE.

If the design is low volume then minimizing NRE, which is mostly set by engineering hours, makes sense. At low volume, the semiconductor unit cost is mostly irrelevant so you can potentially use things like SpinalHDL to keep engineering hours down, and therefore potentially save NRE, and eat the higher unit cost which occur due to toolchain inefficiencies.

At high volume NRE is mostly irrelevant and unit cost is everything. So even if a tool or language is hard and annoying to use, if it gives a lower unit cost, you use it. Here you see things like an engineers hand tuning the layout of a single MUX to eek out a bit more of something good in the PPA space.

I only have experience with high volume HW and there something like Chisel or SpinalHDL wouldn't be considered as it just adds complexity to the flow, and makes it hard to do the optimizations that high volume enable us to consider, for a potential benefit we're not interested in.

aseipp · 2025-01-25T22:36:43 1737844603

They're overall more prevalent in the FPGA world, I think. I've used and done several jobs with them (Clash/Haskell, Bluespec, etc) and know others who have, too. But you basically need to know someone or do it yourself. Pretty marginal overall, but IME the results have basically been good (and more fun to write, too.)

rowanG077 · 2025-01-26T02:13:54 1737857634

At lumiguide we use clash for FPGA stuff. It's not perfect but we are very, very happy we didn't go the verilog route. What a horrible experience that is.

almostgotcaught · 2025-01-25T20:12:30 1737835950

no matter what anyone says to you on here (or elsewhere on the blagosphere): no. the answer is absolutely flat out no.

oldgradstudent · 2025-01-25T20:30:43 1737837043

No is a good first approximation.

There is a little bit of industry usage, with the biggest user being SiFive - the founders come from the UC Berkeley group that developed Chisel.

Also, VexRiscv has some industry presence.

almostgotcaught · 2025-01-25T20:43:44 1737837824

> There is a little bit of industry usage, with the biggest user being SiFive

do ask sifive how much they regret that decision though <shrug>

oldgradstudent · 2025-01-25T22:57:05 1737845825

I'm pretty sure they're going to say they don't regret it at all. Either because it's true, or because they are too invested in it.

When I've started doing FPGA consulting a few years ago I've started using Chisel, but eventually had to go back to SystemVerilog due to client reluctance.

I was dramatically more productive with Chisel than with SystemVerilog.

almostgotcaught · 2025-01-25T23:01:15 1737846075

> I'm pretty sure they're going to say they don't regret it at all.

i didn't say that as a supposition - i know that they regret it. the chisel compiler has been an enormous (enormous) technical debt/burden for them because of how slow/resource intensive it is.

d_tr · 2025-01-26T08:12:38 1737879158

Interesting. So, do you know what they'd choose now if they started over? SystemVerilog?

sdvsfqe · 2025-01-26T00:29:12 1737851352

> how slow/resource intensive it is

compared to what?

It's not like all the other EDA tools are really fast or not resource intensive. For smaller design firms I would think things like FireSim [1] would be a significant advantage.

I can imagine it is a disadvantage in other ways, i.e. it's only possible to do single phase positive edge synchronous design, which could be an impediment to high performance digital design.

But I wouldn't imagine that scala performance is particularly significant.

[1] https://fires.im

almostgotcaught · 2025-01-26T00:58:22 1737853102

It's pointless to argue with people on hn because you'll tell them "I have cold hard experience" and they'll respond with hype links and conjecture.

> But I wouldn't imagine that scala performance is particularly significant.

Imagine all you'd like - reality is much less imaginative though.

oldgradstudent · 2025-01-26T21:10:16 1737925816

Just curious, do they have a migration plan? Have they started new designs using Verilog/SystemVerilog/VHDL?

foota · 2025-01-25T21:17:08 1737839828

This is off topic, but I recognize your username from a thread a couple weeks ago but your account is relatively new. Out of curiosity did you just find hacker news and decide to make an account, or is this a new alias and you have an older account? I guess I'd be surprised if there's still new people joining lol.

almostgotcaught · 2025-01-25T22:20:33 1737843633

my account is 8 months old? also i'm sure new people join hn all the time because you know... new people are being born all the time...

foota · 2025-01-25T23:50:57 1737849057

I was just wondering. It seemed like you had a perspective that I wouldn't associate with someone new to the industry.

skissane · 2025-01-26T00:04:12 1737849852

I have no idea who they are, but I think you'd find there are lots of "old-timers" (even notable ones) who've never had HN accounts. Any of them could decide to join at any moment

bri3d · 2025-01-25T23:14:05 1737846845

In the ASIC space, sure, I don't think any of these tools scale in the way that most ASIC companies have forced their "traditional" HDL toolchains to scale.

In the FPGA-based space (accelerators, RF/SDR, trading), hard disagree. There's plenty of boutique FPGA work going on in these.

davidjade · 2025-01-25T18:41:43 1737830503

There is a successor project as well: https://github.com/SpinalHDL/VexiiRiscv

15155 · 2025-01-25T19:19:29 1737832769

And a spiritual sibling: https://github.com/SpinalHDL/NaxRiscv

disdi · 2025-01-25T22:03:14 1737842594

Latest presentation on this topic by main developer:

https://youtu.be/dR_jqS13D2c?si=bbZf7Oo5a3JsINYs

phendrenad2 · 2025-01-26T04:23:52 1737865432

What does "FPGA friendly" mean? I tried to figure it out from the README, which says "Implement multiplication using multiple sub multiplication operations in parallel ("FPGA friendly")". Put another way: what is the FPGA-UNfriendly way to do multiplication?

phire · 2025-01-26T07:12:22 1737875542

Most FPGAs have converged on 18 bit wide multiplier blocks. If you ask for a 64 bit multiplier, the router will automatically chain together four multiplier blocks and add them together in a single cycle, which is really going to hurt your maximum clock speed (fmax).

VexRiscv is aware of this unofficial standard, and asks for four 16x64 multiplies and adds the result together on the next cycle. This produces a much better fmax on FPGAs, but if you were targeting an ASIC, you would be better off asking for a 64-bit multiplier, or not trying for a single-cycle multiply.

Most modern CPUs tend to target a 3 cycle pipelined multiplication, which means 22-bit wide multipliers. Doing this on an FPGA each 22-bit multiplication would require two 18-bit multiplier blocks, for a total of six multipliers, wasting more resources.

-----

In general, "FPGA friendly" means optimizing your design to take advantage of the things which are cheap on FPGAs, like the 18-bit wide multipliers and the block ram. Such designs tend to run faster on FPGAs and use less resources, but it's wasteful to synthesize them to ASICs.

cjbillington · 2025-01-26T07:20:28 1737876028

It took me to the end of your comment to realise the crucial bit I was missing: that they're talking about implementing the CPU on an FPGA.

As opposed to, say, interfacing with an FPGA which could be totally different way to be "FPGA-friendly".

le-mark · 2025-01-25T18:14:37 1737828877

Fits on an ICE-40 fpga, that’s not nothing!

IshKebab · 2025-01-25T18:30:23 1737829823

How does it compare to the many other RISC-V CPUs?

tliltocatl · 2025-01-25T20:40:35 1737837635

The code is much more readable and modular than you typical verilog dump, so it's probably the best CPU for microarchitecture experimentation. Source: did my master thesis prototyping a specialized cache. Started on Rocket Core, which turned out to be a total mess with all of the pipeline in a single module, basically impossible to introduce a new datapath without rewriting everything. Vex was a breath of fresh air. Spinal is also awesome, lots of QoL features for separating concerns between modules in a way that's impossible on Verilog and fixes lots of rough edges of Chisel.

Performance on FPGA was better than most open-source RISC-V cores out there as of 2020. Rocket might have been better on silicon, but that's it. I haven't looked much into it since then through.

vollbrecht · 2025-01-25T19:00:58 1737831658

I find it fascinating, calling a CPU implementation FPGA friendly. I don't know why everybody always wants to run soft CPU's on an FPGA.

I mean I understand that its nice for the development stage of a CPU, but for all practical purposes, a FPGA is a thing where you can do hyper specialized things in massively parallel fashion, and essentially don't do something to run general purpose code.

I am not saying that people should stop doing this things, everybody is free to do what they want, still i don't understand why most of FPGA talks are about soft CPU's when the really interesting stuff is something completely different.

15155 · 2025-01-25T19:18:40 1737832720

This has nothing to do with performance or hardware CPU development.

FPGA-specific soft cores like VexRiscv and NaxRiscv are immensely useful for anything involving state machine logic or glue code that you do not want to implement in-fabric.

Peripherals like on-chip MMCMs/PLLs, on-board I2C and SPI peripherals, etc. with complicated initialization routines or communication flows or sequencing are very easily handled in a soft CPU.

Soft CPUs can also be used like high-powered programmable in-circuit logic analyzers: without rebuilding a potentially massive FPGA bitstream, you can probe/observe/inspect, inject/alter any signals or buses you pipe to the CPU. VexRiscv is far more pleasant to use than any vendor ILA IP.

Soft CPUs also normally utilize FPGA LUTRAM/BRAM resources, enabling whatever program to run with hard real-time latency consistency.

sdvsfqe · 2025-01-26T00:38:44 1737851924

Also, soft cpus without strict performance and power requirements are really easy to implement with modern toolchains. In one quarter you can take an undergraduate who doesn't even know digital design and have them make a cpu core as a final project, and they can make a decent one.

HW is actually really hard. If you can use a soft core to simplify the overall design and suck up a bunch of peripheral logic it's probably a good idea. Then the engineers can spend their time focusing on getting the hard parts of the design correct.

almostgotcaught · 2025-01-25T20:15:10 1737836110

this guy gets it - softcores are for giving people access to your IP without forcing them write their own RTL. it's literally the exact same thing as an embedded scripting language (i.e., vm interpreter.......) in a C/C++ program.

wildzzz · 2025-01-25T19:56:00 1737834960

Sometimes you just need a microcontroller to handle some tasks that would be immensely complicated to do yourself. Or maybe you want custom instructions that make use of extra logic on the fabric. I use a RISC-V in my design but most of the chip is dedicated as a modem, I just needed a way to easily send commands and receive telemetry without bit banging hundreds of pins. Another nice thing about using a CPU is that the logic blocks are reusable. I could write a bunch of verilog to receive data from an ADC once a second, average some samples, convert to units, and then send them out as ASCII but now those logic blocks are sitting idle 99.9% of the time. Instead I could have the CPU convert the data and then get back to work on other any other tasks using the same logic blocks. It's certainly possible to reduce area usage by trying to reuse blocks for other functions but it's a lot more work for the engineer.

You wouldn't have only a soft CPU on an FPGA, that's a waste of time and money.

tverbeure · 2025-01-26T02:06:26 1737857186

I have designed large FPGAs with 5 soft CPUs. They’re immensely useful as programmable replacements of very complex FSMs and their use of FPGA resources is marginal.

One example: our vendor had an FSM to quickly save and restore trained SERDES parameters. We replaced that with a tiny CPU and it allowed us to make training decisions that could be changed without resynthesis.

Similarly, Altera themselves use a Nios CPU for their DDR4 DRAM controller IO training.

There are so many other possibilities. In one case, we fixed a corner case bug in a HW I2C controller by bit-banging the protocol.

Soft CPUs cost a few thousands gates, one or 2 BRAMs which is totally fine if you have some left. It’s no different than having tons of tiny controller CPUs in large ASICs (which literally everybody does these days.)

What makes the Vexriscv (and Nios and Microblaze) FPGA friendly is that they don’t require zero latency access to the register file. You can use BRAM instead. FF based register files are murder on the FPGA routing fabric.

shadowpho · 2025-01-25T19:55:36 1737834936

Fpga are great at some things but they are pretty difficult with others. There are many applications where you can use the CPU as the control block while keeping fpga for other reasons. Furthermore keeping the cpu inside the fpga means you get to have direct access to many knobs and settings.

For example I worked on a project that used fpga to mux audio/video. It simply redirected digital pins. However the internal cpu was used to control/decide what to mux, when and how.

It could’ve been all done in fpga but that would’ve been more work (difficult/tricky/inflexible). Instead we had a small core that run a simple program and communicated to external world.

Philpax · 2025-01-25T21:41:49 1737841309

Aside from the legitimate use cases mentioned by the sibling comments, it's just fun to run a soft CPU. There's something that tickles me about setting up a computer that can run real software, especially if it's one you've had some part in designing.

nomad86 · 2025-01-25T18:34:13 1737830053

It reminded me of how, a long time ago, FPGAs were used in Bitcoin mining.

TheAmazingRace · 2025-01-25T18:42:04 1737830524

I thought it was ASICs?

zoenolan · 2025-01-25T18:49:20 1737830960

CPU to GPU to FPGA to ASIC

All the acronyms

dboreham · 2025-01-25T18:30:57 1737829857

Saw .scala files and thought "some verilog thing that uses that extension". Nope. Lots of Scala. That's not what I expected!