I've honestly been waiting for this since for YEARS. I've had a vision for "the ...

revelation · on Oct 3, 2017

It just doesn't deliver on performance, or energy consumption. That was always going to be the case (you are adding a layer of abstraction at the silicon level). So for anything that is measured on "operations / second" or "operations / joule", FPGAs will always lose out. By now industry has learned that the key is to tailor algorithms for what we can do fast (vector+branching on CPU, everything branchless on GPU), not shoehorn silicon into algorithms.

So what can FPGA do? Fast, low latency, high bandwidth interaction with peripherals. The irony here is that to have this work out, you kind of want to have your peripheral connected to the FPGA.. which takes away all the fun from the reconfigurable stuff, because you can't reroute your PCB. So now 99% of FPGAs deployed end up running in the same configuration always and companies with the necessary scales pour it into ASICs.

FPGAs solve a niche problem of interacting with very fast, massively parallel data buses and systems (think CCD sensors, ADC sampling, ..) that a linear execution, Turing style processor isn't suitable for. And pretty much only for applications where you don't have the volume to convince a chip manufacturer to put your peripheral into silicon.

booblik · on Oct 3, 2017

> As CPUs are designed for one thing "few, large, complex cases" while FPGAs are perfect for many, parallel simple cases" even more than a GPU.

FPGAs are quite good at parallel simple cases, that is correct, but they would lose to GPUs in performance/watt in most cases. Where FPGAs really shine is in parallel complex, non-uniform cases, especially cases that don’t map well to the classic CPU instructions, but can easily be performed with small latency on FPGAs.

scottlegrand2 · on Oct 3, 2017

FPGAs own low latency computation (less than 1 microsecond) because GPUs really need 3-20 microseconds to initialize after a kernel launch. This is why they're used instead of GPUs at the front line of high frequency trading. When I was at a hedge fund, I tried in vain to get Nvidia to do something about this based on the unofficial work of another former Nvidia employee implying this could be improved dramatically.

All that said, these are golden years to be a low-level programmer who understands parallel algorithms whether you work in Tech or at a hedge fund because there just aren't that many of us.

But the real problem with FPGAs is that even if they find another lucrative application where they excel relative to GPUs, Nvidia can simply dedicate transistors in their next GPU family to erasing that advantage as they did with 8-bit and 16-bit MAD instructions in Pascal and with the tensor cores in Volta. Too bad they don't care about latency or I believe they could disrupt FPGAs from HFT in a year or two when someone started using them and started winning.

davrosthedalek · on Oct 3, 2017

Especially if the level of parallelism isn't too large, or if the memory bw requirement for each is low. The memory bandwidth of FPGAs is comically small typically compared to GPUs if it has to go to off-chip memory. Internal memory is limited to a couple MBit typically.

jcranmer · on Oct 3, 2017

> The biggest problems are...

0) Trying to do automatic parallelization is something we've been working on for 50 years and we still haven't solved in any practical degree. You can't just slap a #pragma on C/C++ code at this point to say "run this on some non-CPU architecture" and expect to get good performance.

signa11 · on Oct 3, 2017

this (or something very close to this) has been described in 1993 (!) paper/article called 'processor reconfiguration through instruction set metamorphosis' or PRISM.

perhaps you might find it interesting. have fun :)

Boothroid · on Oct 3, 2017

I've been thinking of something similar recently - to extend your idea slightly, why couldn't the flashing of the FPGA happen per tick? Thus at every tick your FPGA could become entirely different hardware, tailored to whatever task is required at that tick.

katastic · on Oct 5, 2017

Well, flashing is pretty slow compared to modern computers. You have to load it from flash (no pun) memory and Altera are the only ones (AFAIK) that support only changing subsections. (Which could be great because you can change only a half or quarter of your FPGA with new logic units while the others keep running.)

Also, you'd have to know what you'd need... before you need it. Which is kind of impossible. By time you know you need tons of integer units, you probably could have started working on them. That is, if you need to rapidly switch, then your workload is pretty rapidly completed to begin with.

However, I don't thing they need to reconfigure that fast. Once every second would be enough to keep up with most workloads. Most "heavy duty" workloads aren't changing that rapidly. You load a video game, it's a videogame for hours you play it. You load a web server, you're going to be doing SSL.

If you need much more fine control, it'd probably be better to treat the problem at a much higher level ("I need more SSL keys / sec", instead of "I need more integer adds to make SSL keys") or add another FPGA (one for each use case or set of use cases, ala one for web server keys, one for deciding some other major web server feature).

Of course, I'm no expert in the field. I'm just a guy with an idea and some experience / research into FPGA's as reconfigurable logic units.

Boothroid · on Oct 6, 2017

Yes, it's the predictive aspect of it that I've struggled with also! I understand there are technical limitations at present, was just trying to extend the idea beyond what's currently possible to see if it might be interesting.

balakk · on Oct 3, 2017

MSR is attempting something similar as part of Project Brainwave:

https://h2rc.cse.sc.edu/slides/chiou_h2rc.pdf

visarga · on Oct 3, 2017

> CPUs are designed for one thing "few, large, complex cases" while FPGAs are perfect for "many, parallel simple cases" even more than a GPU

<offtopic>Hmm, where have I heard something like this before ... ah, yes, the brain - CPUs/FPGAs are like reason and instinct, because reason deals with "few, large, complex cases" and instinct has "many, parallel simple cases". The brain has its own CPU/FPGA divide.</>

TeMPOraL · on Oct 3, 2017

Brain is an FPGA (of sorts).

reilly3000 · on Oct 3, 2017

Wow that is a solid plan. I propose you or whomever builds it calls is Nitro or something to that affect. Speed how you need it. Imagine an open library or FPGA profiles for popular apps. Build one for Photoshop and you have yourself a nice biz.