More

craigjb · on Dec 26, 2019

Thanks Chris! I didn't know you hang out on HN too :P

You're slowly becoming a software guy after all...

ChrisGammell · on Dec 26, 2019

Someone had mentioned Contextual Electronics as a way to learn KiCad, so I got a notification about that. But yes, I also hang out here from time to time :-)

craigjb · on Dec 26, 2019

Yeah! Embedded Rust on STM32* right now is pretty good! A lot of consolidation has happened in the device crates and HAL crates recently, so I'm adjusting a bit. But, it's looking good.

I'm actually working on a post about restarting my firmware using the latest and greatest (I started working on this firmware over a year ago).

I used the RFTM real-time OS (RTOS) because it's super light-weight and handy for checking at compile time that my interrupt handlers don't cause conflicts. It's really a nice framework for any event-based firmware (a big chunk).

craigjb · on Dec 26, 2019

Thanks! I did a video talking about the case a bit more: https://craigjb.com/2019/12/06/gameslab-case/

I had it fabricated for me by a company by Front Panel Express. In a DIY audiophile forum I saw people using them, and if they're good enough for audiophile gear, probably good enough for me :P

blakesmith · on Dec 26, 2019

Thanks! Love the video. I love the brushed and beveled panel. I'll definitely look at something like this for some of my designs.

craigjb · on Dec 25, 2019

Thanks! There was definitely a lot of learning involved--I wrote up some here: https://craigjb.com/2019/12/04/gameslab-fails/

To be honest, I really don't know a good KiCad tutorial. I kind of just banged myself against it until I learned. But, I had previous experience with Eagle, Altium, and Cadence Allegro.

Das U-Boot is the embedded bootloader and is pretty standard for embedded ARM systems (and others! https://github.com/u-boot/u-boot/tree/master/arch).

On a dev-board prototype before, I had also used Buildroot (https://buildroot.org) to create the minimal root filesystem image. This time around, I decided to use Debian since it makes installing packages I want much easier. For example, I can just apt-get install the USB wifi firmware packages instead of hunting them down and including in a manually generated image. It's still using a custom compiled kernel though, since I have custom drivers for things like the graphics hardware.

I'm trying to use Rust for as much as possible (I like Rust a lot). The STM32L0 runs rust, I bodged together a framebuffer driver in Rust, and the games in userspace are also in Rust. I'll post about the Rust framebuffer driver at some point.

de6u99er · on Dec 25, 2019

Thanks a lot! I love that you are using Rust.

craigjb · on Dec 25, 2019

Yeah, Rust is great! I've used it on several production projects for work, but not for extensive embedded work before this. Embedded Rust is getting better and better!

Shout out to the Real Time For the Masses project (it's an RTOS) I used: https://github.com/rtfm-rs/cortex-m-rtfm

craigjb · on Dec 25, 2019

I've seen a split. Some people are looking for CPU acceleration. Others say Zynq effectively covers the case where FPGA designs include a soft-core anyway for high-level control, so why not make it a hard block.

craigjb · on Dec 25, 2019

Yeah, I was going to mention the PS3 too. It was the most unique hardware of the generation and definitely took the longest for devs to adjust to and learn to leverage fully.

craigjb · on Dec 25, 2019

Thanks!

I actually didn't originally intend it to be for retro game emulation, more for experimenting with custom hardware for each game and with 3D acceleration hardware. But, the idea of putting retro emulators in the FPGA fabric is so easy.

In fact, over the holidays, I'm working on porting some of the MiSTer FPGA retro emulators to the Gameslab: https://github.com/MiSTer-devel/Main_MiSTer/wiki

Lerc · on Dec 25, 2019

This seems Just the sort of thing that I need to make a physical version of my fantasy console.

It's 64k ram 8-bit avr instruction set and a blitter doing a variety of data formats. It's not quite ready for prime-time yet, but you can see some of it online. The virtual hardware registers can be seen at https://k8.fingswotidun.com/static/docs/io_registers.html It might be quite amenable to something like this.

nerpderp82 · on Dec 27, 2019

How does your blitter compare to the Amiga blitter? http://amigadev.elowar.com/read/ADCD_2.1/Hardware_Manual_gui...

Lerc · on Dec 28, 2019

It's quite different for a number of reasons.

The read area and the write area are separate spaces. It is strictly main RAM to frame buffer.

It is pixel based instead of bit-planes, so a lot of the features of the Amiga blitter aren't required. The minterms , multiple sources and the shifter were great for masking and sliding the bits to the right place, but once you go to chunky pixels they aren't so useful.

It does pixel format conversions to convert various compact data forms to colour graphics. 8 pixels per byte in 2 colours, 4 pixels per byte in 4 colours, 3 pixels per byte (where each 3 pixel block can have 4 colours from one of 4 micropalettes), 2 pixels per byte in 16 colours.

It supports Cell modes were it can decode cells of pixels from data, mode 0 has 3x3 blocks described in two bytes each. Cells may have any two of 16 colours. mode 1 is quite similar to the NES tiled graphics mode with 8x8 cells chosen by index to a table and individually coloured and flipped.

I don't have a line drawing option but I was wondering about having some registers to accumulate with each pixel written and if the accumulators overflow the X or Y pixel position increments. It would add some rudimentary skewing for little cost.

A lot of it comes down to the fact that the blitter is doing much of the job of the display hardware in other systems. The frame buffer is the output device with no smarts at all.

craigjb · on Dec 25, 2019

I don't think 5W is too far off from what this thing draws under load. I have a 10,000mAh ~4.2V battery in it, and I get about 10 hours of life. The battery is clunky, 9.6mm thick, since it was originally intended for external phone battery packs.

craigjb · on Dec 25, 2019

Thanks! I always recommend getting hands-on for learning more about FPGAs. FPGA boards are cheaper than ever, and you have all different ones now. I'd say get a Lattice ECP5 based board (like this year's Hackaday Supercon badge). The open-source symbiflow toolchain works for these (Xilinx 7-series will be soon!).

With regard to Verilog/VHDL, you'll have to learn at least verilog at some point, but I stay away as much as I can. SpinalHDL (based on Scala) is my goto. Some people like the Python based ones like migen, but I like me some strong typing.

I have a couple of blog posts about starting to put together a Gameboy CPU on craigjb.com (not finished yet).

syntaxing · on Dec 25, 2019

Thank you for the response! I'm definitely going to check out your Gameboy posts! Any other projects you recommend similar to that? I really like the idea of building something I can use.

craigjb · on Dec 25, 2019

The ZipCPU tutorials section is also great! They include verilog, using verilator (for simulation), and formal verification. And, personally, I think learning formal verification early is great, since it will probably be used more and more.

It's verilog, but you're going to have to learn some anyway. All of the new-generation HDLs compile down to verilog, which then goes into the various synthesis tools.

https://zipcpu.com/tutorial/

SoylentYellow · on Dec 25, 2019

Was there a reason you decided to use SpinalHDL instead of Chisel?

craigjb · on Sept 25, 2019

It's been fun to see Dr. Subu present this concept and prototypes at several conferences, and the level of integration possible is absolutely insane. I think the industry is definitely moving toward chiplets, such as the latest AMD release.

I definitely think we will see more chiplets and more standardization on interfaces between chiplets. The focus will be on how to minimize energy per bit transferred (a big topic in Subu's talks) and how to minimize the die area used for inter-chiplet communication. In monolithic silicon, you don't have to think about die area, since your parallel wires between sections might just need a register or two along the way. With chiplets, you typically can't run wires at that density yet, so you still have some serialization/deserialization hardware. But, since it's not crossing multiple high inductance solder balls and PCB traces, you can get away with less. Hopefully also you can get away without area-intensive resynchronization, PLLS, etc.

I think it will definitely be awhile before this kind of integration is used outside of niche cases though. The costs are just insane. You have to pre-test all manufactured chiplets before integration, and that test engineering is nothing to sneeze at. If you don't then you have all kinds of commercials issues about who is liable for the $500k prototype one bad chip broke.

On the bright side, I see the chiplet approach benefitting other integration technologies. For example, wafer level and panel level embedded packaging technologies can be used for 1-2um interconnects now. You won't get a wafer sized system out of it with any kind of yield, but it's probably the direction mobile chips and wearables will go.

Anyway, disorganized info-dump over.

jabl · on Sept 25, 2019

I agree this looks promising, though I'm not an expert in this field.

But the title is a bit, well, overpromising or broad. I don't think we'll replace traditional motherboards anytime soon (except maybe in smartphones?). Rather, it will be an incremental progress.

- first, SoC's will be replaced with chiplets

- then we'll start seeing more and more stuff being integrated on this wafer.

- say, instead of a server motherboard with multiple sockets, have all the CPU chiplets on the same wafer and enjoy much better bandwidth than you get with a PCB

- integrate DRAM on the wafer. This will be painful as we're used to being able to simply add DIMM's, but the upside is massively higher bandwidth.

The motherboard pcb per se will live for a long time still, if nothing else then as the place to mount all the external connectors (network, display, pcie, usb, power, whatnot).

elihu · on Sept 26, 2019

> integrate DRAM on the wafer. This will be painful as we're used to being able to simply add DIMM's, but the upside is massively higher bandwidth.

One way I imagine this working out is that, instead of just replacing the plastic motherboard with a silicon motherboard, you eventually do away with a single monolithic motherboard entirely. Instead, you have "compute blocks" (comprised of chiplets bonded to a silicon chip, or conventional chips on a conventional circuitboard) that connect with each other via copper or fiber optic point-to-point communication cables, and you can just wire them together arbitrarily to build a complete computer. Like, you might have a couple blocks that house CPUs, one or two that have memory controllers and DRAM, and maybe one with a PCI bus so you can connect peripherals, and you can connect them all in a ring bus. You could house these blocks in a case and call it a server, or connect a lot more blocks and call it a cluster.

The main advantage of such a setup is that you don't have a single component (the motherboard) that determines how much memory, how many processors, or what sort of peripherals you can have.

DarkWiiPlayer · on Sept 26, 2019

This becomes specially interesting if you imagine these components becoming smart enough to support high(er)-level atomic operations and some form of access-control, so you could have shared resources between two subsystems.

Also if all these components are reasonably smart and interconnected, it could become more common for the CPU to merely coordinate communication in many cases, so larger chunks of data could easily be handed around different components and the processor only telling them what range of bytes to send where.

smrq · on Sept 27, 2019

I've been playing a lot of TIS-100 lately, so this architecture sounds like a nightmare to develop on.

Doxin · on Sept 27, 2019

You know, I bet the TIS-100 wouldn't even be all that tricky given a) way more cells and b) a compiler to abstract a bunch of stuff.

craigjb · on Sept 25, 2019

I think the embedded wafer level or "panel" level packaging technologies are the mid-ground. These technologies don't use expensive silicon, and instead surround the die with cheaper epoxy. Then the interconnects are built on top of that, and can connect multiple die together. Yield and interconnect pitch are the big issues here though, and that's why I think you're right, that we will see SoCs or mobile systems first, not whole motherboards.

With that said, some of these technologies can have a layer of surface mount pads on top. So you have a substrate of epoxy with all your chips and interconnects embedded in it, and then surface mount parts on top. For example, passives, connectors, etc. It would look almost like a motherboard, but with all the chips inside. Of course, for cost and yield reasons, this will be for mobile devices only at first.

jacobush · on Sept 25, 2019

Say what now? The die is the silicon, right?

craigjb · on Sept 25, 2019

I didn’t phrase that well. I meant that the wafer wafer and panel level embedded technologies embedded the silicon die inside of cheaper epoxy, instead of building expensive silicon interconnect to integrate them on. They basically make a plastic wafer with a bunch of die in it. Then interconnect is built up on that.

Edit: the links below show solder balls. Today this technology is used for packaging, and has been used on chips in phones for years now. In the near future, we should be able to embed or surface mount passives and mechanical components, so maybe we don’t need the PCB.

https://www.semanticscholar.org/paper/3D-eWLB-%28embedded-wa...

https://www.semanticscholar.org/paper/Latest-material-techno...

vvanders · on Sept 25, 2019

We already pretty much do #1 & #4 today, it's call POP[1], take a look at a RPi3 and you can see the gap between the DRAM and SoC.

[1] https://en.wikipedia.org/wiki/Package_on_package

craigjb · on Sept 25, 2019

Package on Package has many downsides though:

- The interconnect pitch is huge, 0.3mm-0.4mm. HBM memories have 1000s of I/Os

- The inductance of the solder balls and the impedance discontinuities in the path mean the logic below still has to have big energy-hungry I/O drivers

- If you want to stack more than one die, you need something expensive like through silicon vias (TSV)

nine_k · on Sept 26, 2019

Putting more dies closer together makes thermal issues worse.

Hot parts next to other hot parts increase thermal power density, more heat to remove from a small area.

Colder parts next to hot parts can overheat because of the hot neighbors.

I suspect water cooling may become a must, air just cannot take away enough heat.

pingyong · on Sept 26, 2019

Air is gonna do fine. The bottleneck in CPU cooling right now is pretty much always the transfer between the die, the heat spreader and the cooling plate, not the transfer from the fins to the air. Water cooling can do slightly better because you can keep the water cool, and with that the cooling plate, and through that increase the heat flow from the CPU to the plate, but it's really only marginally better than a big air cooler.

And if you put more dies below a heat spreader, you get more surface area, i.e. better heat flow overall (compared to a single die with the same power consumption) from the dies to the heat spreader and from the heat spreader to the cooling plate.

That's also the reason why bigger air coolers don't really do as much as you'd think they should in terms of cooling performance or overclocking, the difference between an NH-U14S and an NH-D15 is really quite small. If the problem is heat dissipation through the fins all you have to do is make the cooler bigger.

nine_k · on Sept 26, 2019

You can bring water closer to the crystal, and make it pass faster past / inside the dissipator plate, thus achieving a larger stream of heat. Effectively you can turn the dissipation plate into moving liquid with high specific thermal capacity (5-7x of the metal plate).

tenebrisalietum · on Sept 26, 2019

Put the darn thing in mineral oil or other heat-conductive liquid and have a radiator dissipate that heat?

Wouldn't that be easier if everything's on a big wafer? It's already been done for a normal motherboard.

jabl · on Sept 26, 2019

Water has the big advantage that it's plentiful, cheap, and environmentally benign.

Sure, it'll take some more upfront engineering to design a system/rack/datacenter for water cooling than just immersing a server in a tank of inert liquid (flourinert or whatever they use these days), but I'm quite sure that at some point water cooling will be the standard solution in data centers.

mjevans · on Sept 25, 2019

NUMA will be much more important. This will really push on memory hierarchy aware data structures and programs.

jabl · on Sept 25, 2019

Hmm, I would say the opposite. If all the memory and CPU cores are integrated on a single wafer, the penalty for off-chip access would be much less than if you had to go through a PCB.

mjevans · on Sept 26, 2019

It'll be less than a networked cluster, but it still mattered with Threadripper units and I'd expect a racked board of this nature to expose more disparity between accessing memory in other chiplet areas.

whatitdobooboo · on Sept 26, 2019

What niche cases do you think this applies to first? They will probably be the ones to propel this technology forward if I had to guess

Quequau · on Sept 26, 2019

Are you perhaps aware if videos of these presentations are available anywhere online?

test6554 · on Sept 26, 2019

He said he prefers dielets over chiplets, but we’ll see what sticks.