Launch HN: Tensil (YC S19) – Open-Source ML Accelerators

bjourne · on March 11, 2022

Wow! This looks amazingly impressive. Super-duper good. If the software is as impressive as the website (haven't tried it out yet!), you'll make tons of $$$ on this product. If I were a vc with millions I'd be begging you to take some. I wonder if you plan to support CGRAs and LSTMs? Also, what about quantized and compressed models? Model compression is a sore point and afaik, there aren't any good tools that lets you make tradeoffs between accuracy and compute efficiency.

tdba · on March 11, 2022

Thank you for the kind words! Just to clarify, the core technology here is free and open source, anyone can use it right now for free. We do have commercialization plans in addition - we may explore things like additional paid features for enterprise use or paid tiers of extra support.

Regarding LSTMs, yes. We're aiming to support all machine learning model architectures: do you have any particular models you're interested in that we should be prototyping with?

For CGRAs, we don't have any immediate plans to explicitly support them. What kind of use case do you have in mind? Generally, any platform that can implement a blob of generated RTL should be something we can work with quite easily.

balsam · on March 12, 2022

I am guessing some kind of vivado replacement is in the works? As it stands now your product seems to be fully offline+CLI - pretty please just tell me how you can get revenue from that?!

That said, I am going to get a PYNQ-Z2 just to try this out! Btw, quick glance at the tutorial says Z1.. can I assume Z2 would be barely an inconvenience?

tdba · on March 12, 2022

Yes, Pynq Z2 should work just as well (it's the exact same FPGA, just a slightly different board). We've been testing with Z1 which is why I recommended it in the tutorial.

For commercialization, the core technology will always be free and open source, but we plan to offer a “pro” version with extra enterprise features under a dual license arrangement, similar to Gitlab. We are also working on a cloud service for running our tools in a hosted setup, in which you’ll be able to run a search across all possible Tensil architectures to automatically find the best FPGA for your model. I'd love to hear your feedback on these plans!

tdba · on March 11, 2022

Just saw your edit re: model compression. One thing that Tensil can do is help you avoid the need to quantize or compress your model entirely! For example, we've found that using a 16-bit fixed point numeric data type preserves almost all the model accuracy while not sacrificing performance thanks to the huge amount of parallelism available on FPGA.

The broader point is that Tensil is extremely flexible, so you can try out lots of different accelerator configurations to find the one that works best for your ML model. Think of it as optimizing the hardware first, then the software if needed.

We're actually working on a tool to manage and automate this hardware architecture search - watch this space!

lagrange77 · on March 11, 2022

Wow! After repeatedly unsuccessfully tying to get an overview over NN accelerators today, i just found this on the HN homepage. Looks very promising and to me this seems to be a very logical approach, in terms of efficiency (besides analog computers).

I would also be very interested in some benchmarks comparing the generated hardware with things like Google Coral or Nvidia Jetson.

I am sure this will be a success.

tdba · on March 12, 2022

Glad this helped clarify things for you! The tricky thing about benchmarks is that one of the key benefits of Tensil is the flexibility to find a trade-off between performance, accuracy, cost and power usage that works for you. Benchmarks that only consider performance or performance per watt can be a bit narrow from that point of view. That said, this is a good idea and we'll add some comparisons that we think make sense to the docs!

touisteur · on March 12, 2022

I wanted to add something about the xilinx dpu, and you brushed on the subject but I was quite unhappy with the softip thing. It embeds all instructions for all kinds of networks so, taking a lot of gates for unused features, it's not much customizable, and perf for anything else than vanilla conv2d stuff quickly gets down. Buying an Alveo board to get such low inference perf was a gutpunch.

FINN seems far better there. At least you get millions inference/sec on simple quantized CNN1Ds.

The xrt api is simple and relatively ok, too. Stream data, execute inference, fetch results, mostly sync, so you have to wrap a lot of threading there, but the basics are there.

tdba · on March 12, 2022

Yep, this is something we've heard before. If you're really familiar with the Xilinx ecosystem, one way we've described Tensil is that it is the "Microblaze for ML" - easy to use, lots of flexibility and customizability, with performance good enough for most applications. The DPU and FINN would then be the more specialized tool for situations where you need specific features they are optimized for.

touisteur · on March 12, 2022

Ha, now you've made me curious. Let's see how everything progresses then. Thanks for the earnestness on these comment threads.

tdba · on March 12, 2022

You're very welcome! Stay in touch - I've listed some contact methods here and there in the thread, and we'd love to hear from you again.

lagrange77 · on March 12, 2022

Thank you!

touisteur · on March 12, 2022

One more thing to keep a look on in the NN accelerator world is TensTorrent. That thing looks amazing, but it's mostly for datacenter and 'heavy' edge (pcie board, at least 75W, so to measure against Alveo U50/U55 and Tesla T4/A30 and up to 300W so A40/A100).

tdba · on March 12, 2022

Thanks, we'll take a look!

mwcampbell · on March 11, 2022

How does this compare to Coral's USB Accelerator [1], which apparently uses Google's TPU? I'm guessing Tensil is better for companies that are already either working with an FPGA or producing custom silicon, but the Coral product might be easier to get started with when prototyping on something like a Raspberry Pi.

[1]: https://coral.ai/products/accelerator

tdba · on March 11, 2022

Coral is a great project, especially if you are using a completely vanilla off-the-shelf model. However if you've ever tried compiling a custom ML model for it, you know how finicky it can be. There are lots of ways that you can accidentally make it impossible for Coral to run your model, and it can be difficult to figure out what went wrong.

With Tensil, you circumvent that problem by changing the hardware to make it work for your model. If you have made modifications to an off-the-shelf model or have trained your own one from scratch, it might be a better option from the point of view of ease-of-use and even performance.

mwcampbell · on March 11, 2022

Ah, thanks for that clarification. I see that your tutorial is using the Avnet Ultra96 V2 dev board. Do you have anything that would work with a Raspberry Pi? Maybe some kind of FPGA addon board? Or do you feel that the Raspberry Pi isn't a good starting point for developing a real commercial product?

tdba · on March 11, 2022

This is a great idea, we're looking at boards that could be used in combination with a Raspberry Pi. The reason we haven't investigated this so far is that most of the dev boards we've tested with have an ARM core embedded in the FPGA fabric, so the additional CPU the Raspberry Pi would provide wasn't necessary.

diptanu · on March 12, 2022

Looks like pynq-z2 has header pins which connect with raspberry pi. https://www.tulembedded.com/FPGA/ProductsPYNQ-Z2.html

touisteur · on March 12, 2022

Heh very similar experience with myriad-x there. Going off the beaten path is a pain, especially since the low-level is now so hidden...

tdba · on March 12, 2022

Absolutely, the UX for compiler tools often leaves a lot to be desired. This is something we want to fix!

touisteur · on March 12, 2022

This is hard, very hard stuff. Between MLIR, the xla world, most HLS things (generalist stuff leaves a lot of perf on the table and you often end up in vhdl/asm anyway - while specialised stuff is often too restricted...) and the vivado 'let's write HDL/RTL like C', many broke their teeth.

I wish you good luck there, but you're up a huge task. You have all my congrats for going open source, and I think now it's mostly the only way forward. FINN is OSS and I'm very happy to have an OSS alternative. If only old Altera would go full OSS on new AI+FPGA stuff maybe we'd see great cross pollination.

Anyway, if Intel FPGA people aren't watching this, I can assure you they'll be looking soon.

tdba · on March 12, 2022

Thank you - we'd love to see more OSS support from FPGA vendors too and we'll be watching closely for any developments there.

maille · on March 12, 2022

Apart from the discontinued Intel sticks, are there pytorch compatible USB accelerators on the market?

tdba · on March 13, 2022

I'm not completely sure if the answer is no but I have had difficulty finding anything like that.

ZeroCool2u · on March 11, 2022

So, Tensil looks really cool. One of the constraints listed in the docs though is that it only supports convolutional networks at the moment.

What does the timeline look like for supporting some of the more popular transformer/attention based arch's look like?

tdba · on March 11, 2022

We're working on our roadmap right now and prioritizing support based on user interest. If there's a particular model or set of models you're interested in accelerating, I'd love to hear about it!

If there's a lot of interest in transformers, we'd aim to offer support in the next couple of months.

ZeroCool2u · on March 11, 2022

A lot of SOTA models seem to be gravitating towards transformer based models. Obviously, I can't speak for the entire field, but you can just go take a look at the most popular HuggingFace repos and see what I mean. They started out focused on language, but because transformers have become so popular, they're expanding into the audio and vision domains quickly. Their library called 'transformers' is, outside of research, most peoples go to high level framework as it largely abstracts away a lot of the boilerplate that writing in pure TF, PyTorch, Jax requires.

See:

https://huggingface.co/spaces

https://github.com/huggingface/transformers

tdba · on March 11, 2022

Agreed, this is the way things seem to be trending. We'll definitely add support for transformers in the near future, the question is only whether there are other things we should work on first, especially with respect to the edge and embedded domain where smaller conv models still dominate. Thank you for the links!

grepLeigh · on March 12, 2022

Wow, congratulations on the launch! I agree whole-heartedly that custom accelerators will fuel the next era of AI/ML advances.

I'm the founder of PrintNanny https://printnanny.ai/, which seems to fit the current use case for Tensil. My model's architecture is a "classic" CNN feature extractor, SSD box/region proposals, with a final non-max suppression op. I currently run a uint8 quantized TensorFlow Lite model on Raspberry Pi, without additional acceleration - but I'm very familiar with the hassle of using partially-closed source accelerators like Coral's Edge TPU. Excited to read through the graph compiler!

I joined your Discord, looking forward to tracking Tensil's progress.

I confess that I'm curious how you currently or intend to make money? How much time are you giving yourselves to figure out a sustainable financial model?

For what it's worth, I'm also ex-Red Hat and thoroughly understand the advantages of paying for high-quality support. I also want to re-iterate that I think that the accessibility of open-source ASIC/FPGA tools will define the future of AI/ML. This is important work that will change the world - I'm excited to see someone tackling it!

tdba · on March 13, 2022

Awesome, I think your use case would make a lot of for Tensil. Looking forward to chatting more!

The core technology will always be free and open source, so to commercialize Tensil we're planning to offer a "pro" version which would operate under a paid license and provide features specifically needed by enterprise users. We're also working on a web service that will let you run Tensil's tools in a hosted fashion, with one major feature being the ability to search across a large number of potential architectures to find the best FPGA for your needs. Extra paid support and other services will also be in the mix.

rowanG077 · on March 11, 2022

What kind of FPGAs can this reasonably run on? Is that model dependent? Could a small model run on an ICE40 FPGA? I looked over the doc but I can't find anything concrete.

tdba · on March 11, 2022

It depends on the model, yes. Here are some examples in the benchmarks section of our docs: https://www.tensil.ai/docs/reference/benchmarks/

We haven't specifically tested on any ICE40 FPGAs yet - if this is something that you'd really like to see, let me know! Taking a look at the lineup, the ICE40 LP8K and LP4K would be suitable for running a very small version of the Tensil accelerator. You'd want to run a small model in order to get reasonable performance.

Generally speaking, FPGAs with some kind of DSP (digital signal processing) capability will work best, since they can most efficiently implement the multiply-accumulate operations needed.

ColonelPhantom · on March 11, 2022

I think iCE40 LP/HX series are the biggest ones, but the iCE40UP5K is also neat: it has hardware multipliers unlike the LP/HX, and a relatively large 1 megabit RAM on-chip. Unfortunately, I think the UP family is relatively slow (as in propagation delay/max clock frequency).

tdba · on March 11, 2022

Thanks for pointing this out! The UP5K does look promising.

mochomocha · on March 11, 2022

I'm curious if you have any benchmark (or anecdotal evidence) on the relative perf&power efficiency of using the DSP blocks of the FPGA boards or not?

tdba · on March 11, 2022

I don't have hard numbers at hand, but I'd estimate something like an order of magnitude improvement for using DSP for multiplication vs not. If they're available on the fabric, you'll definitely want to use them! If this is an experiment you want to run, I'd be very happy to help you figure out how to do it.

rowanG077 · on March 11, 2022

Cool! Yeah I would be interested in that. I would actually have some use cases for edge compute if it can fit into tiny FPGAs like the ICE40.

rbkettlewell · on March 12, 2022

Here is an example of deploying a basic ML application to an ICE40 using a custom Keras to NN generator https://github.com/edge-analytics/fpga-sleep-tracker

So it definitely can be done with some careful attention to the limited number of multipliers on the device. I’ll be curious to check out how Tensil does in terms of mapping with highly resource constrained FPGAs. Regardless, Tensil looks like a very cool tool.

tdba · on March 12, 2022

Wow, awesome project! This is exactly the kind of thing we had in mind when we built Tensil. I'd be very curious to hear what happens if you make a v2 perhaps using Tensil for comparison.

tdba · on March 11, 2022

That's excellent - feel free to join our Discord if you'd like to brainstorm ideas or get help choosing models and boards https://discord.gg/TSw34H3PXr

mochomocha · on March 11, 2022

Very cool work, congratulations on the launch! Can you comment on how you see the trend of edge computing evolve in the future for SBCs? In terms of perf per watt, could FPGAs compete against a coral-style TPU? What if we had open Mali GPU or NPU APIs to program against the chips already present on SBCs? I'm just a hobbyist so I know very little of what people actually deploy in industrial settings - which would be your target customers.

tdba · on March 11, 2022

Cheers, and great question! FPGAs are pretty amazing devices, but one thing that's been holding them back is how difficult they have been to work with. Typically to actually make use of an FPGA you'd need to have an FPGA expert and an embedded software engineer on your team, along with all the requisite tools and materials.

That has started to change dramatically in the last decade, with open source FPGA toolchains like yosys, runtimes like the PYNQ framework and RTL generator tools like Tensil being developed. When you put these things together, working with FPGAs starts to become as easy as using any other compute platform. For that reason, I think there are lots of applications involving FPGAs that will soon be invented to take advantage of this trend. One could speculate that the reason Intel and AMD are buying up FPGA vendors is because they see the potential there.

As far as head-to-head comparisons go, as long as you're running the workload it was designed for in the environment it was designed for, an ASIC will always be the best possible perf per watt. The question is what happens when you go outside those bounds. Can you take your model, swap out a layer, and have it run just as fast on your Coral or NPU? Probably not, at least right now. But with Tensil, you can re-run your architecture search to find the best accelerator, and take advantage of it right away.

emacs28 · on March 13, 2022

Nice work with this. I was wondering, are all computations other than convolution performed on the FPGA as well - such as pooling, padding, inter-layer quantization operations (rescaling & offset additions)? If not, does the FPGA offload unsupported operations to the host before continuing? Does the FPGA need to transfer intermediate layer IO data back and forth between the host during GEMM if the data become too large to fit on the FPGA SRAM? Thanks

petrohi · on March 13, 2022

Great questions! With Tensil, all computations are performed on the FPGA. In addition to matrix multiplication Tensil supports SIMD instruction with various operations. The ML activations, average and maximum pooling, normalization, and image resizing use SIMD instruction. Some ML operations, such as padding, are achieved by changing the memory layout. Tensil uses DRAM0 and DRAM1 memory pools (usually in DDR memory) to interact with the host to read model inputs and weights and write outputs. It also uses these pools to offload intermediate results between layers and between tiles within a layer when FPGA does not have sufficient BRAM, which is common on lower-end devices. Tensil compiler takes care of finding the most efficient memory scheduling for given on-FPGA memory size.

emacs28 · on March 13, 2022

Okay thanks, so are DRAM0 & DRAM1 memory pools located on the host DDR memory, or is that a part of separate DDR DRAM hardware located on the FPGA board (kind of like how GPUs have their own separate DDR DRAM)? I definitely want to dive deeper into the source code of this project at some point and see how the compiler and everything works.

Edit: Sorry I think you already clarified that the DRAM0 & DRAM1 memory pools are located on the host

petrohi · on March 13, 2022

Something like the Alveo PCIe card has onboard HBM/DDR4 memory large enough for Tensil DRAM pools, so this would be similar to how GPU operates but could also reach to host memory via PICe if needed. Embedded applications with Zynq 7 and UltraScale+ have ARM processors on the same chip with FPGA and (usually) DDR as separate chips on one PCB. In this case, Tensil DRAM pools are just contiguous memory blocks in the memory shared with the CPU. We will be publishing documentation on the compiler design soon--stay tuned!

touisteur · on March 11, 2022

Hi, maybe you've addressed this somewhere and I haven't read fully (sorry) but how does it compare to FINN from Xilinx?

tdba · on March 11, 2022

FINN is a very cool project, but usually requires big changes to your model in order to work, e.g. quantizing down to 1 or 2 bit weights. It also works best on large FPGAs which are unsuitable for edge deployments. Tensil works out of the box with any model (no need to quantize / compress) and on small edge FPGAs.

touisteur · on March 11, 2022

Thanks for your answers.

True that, to get crazy perf with FINN, one needs to quantize like crazy (at least it's the default strategy, but it's something that might change if/when it can synthetize to use dsp slices or shiny Versal Weird Cores). Now I'll have to take a look at Tensil. How would it scale on large FPGAs though? Would you leave the floor planning to a seasoned vhdl person? Does Tensil handle it (generating parrallel pipelines, maxing out performance using all resources on chip) ? Say for someone doing 1D CNNs or some 1D VAEs with (tens of) millions inferences/second on a continuous stream (low batch size)? :-).

I'm not sure what Intel proposes nowadays on that front, with the abandonment of OpenVino for FPGA. No idea how one could use the stratix 10 nx with its 'ai cores' with actual neural networks. Tensil might be a gateway for all this (I sadly don't have much for FINN to become crossplatform...).

tdba · on March 11, 2022

So far we've been focused on edge devices like the Zynq, Artix and Zynq Ultrascale+ families. Tensil certainly works on larger devices but it's not as optimized there as we'd like it. If that's interesting to you, I'd love to talk and understand your use case in more depth.

The Intel FPGA side is interesting, as you say there are fewer projects targeting their technologies for ML use cases. We haven't tested support for their boards yet, but there is nothing in our generated RTL that is exclusive to Xilinx. The only thing we'd need to add is new drivers for their platforms.

vmaccel · on March 11, 2022

Would love to take a look at this. We just launched our FPGA-based cloud platform last year and currently we offer all of the Alveo series and some Intel as well. vmaccel.com

tdba · on March 11, 2022

VMAccel looks very interesting! Send me an email and we can explore how to collaborate.

dang · on March 11, 2022

(This comment was originally posted at https://news.ycombinator.com/item?id=30615605, where the question made more sense, but I've moved it into the new thread because it's interesting.)

emacs28 · on March 11, 2022

What physical connection is required between the FPGA and host? For example, do they communicate through a PCIe connection?

tdba · on March 11, 2022

In our current demos, the Tensil logic talks to the host through a couple of AXI and AXI Stream interfaces. There are AXI adapters for many other protocols, including PCIe, that should be able to support many different kinds of connectivity. Here's a link to our docs explaining the host<->Tensil connection: https://www.tensil.ai/docs/howto/integrate/#2-connect-the-ax...

bravura · on March 12, 2022

What sort of latency can you get on edge devices? Are there cases that processing can be done with 3ms or 10ms latency?

tdba · on March 12, 2022

Definitely! You can see some of our benchmarks here, and we'll be expanding this list soon https://www.tensil.ai/docs/reference/benchmarks/

YayaScript · on March 12, 2022

What’s the difference? https://hailo.ai/

tdba · on March 12, 2022

Generally the comparison between Tensil and any fixed ASIC is going to run along similar lines, which we explain in this comment regarding the Coral accelerator: https://news.ycombinator.com/item?id=30643520#30645318

The big difference is that while those fixed ASICs offer great performance on the set of models they were optimized for, there can be big limitations on their ability to implement other more custom models efficiently. Tensil offers the flexibility to solve that problem.

applgo443 · on March 12, 2022

I'm an ML engineer but I know nothing about the inference part. Are there that many kind of devices that optimizing for inference on a device is a thing? I thought almost everyone serves from GPUs/TPUs and hence there are only 2 major device types. What am I missing here?

tdba · on March 13, 2022

There are four big categories of ML accelerators. You already familiar with CPUs and GPUs; then there are FPGAs, which offer better performance and efficiency while remaining flexible. Finally there are ASICs (of which the TPU is an example), which offer the best performance and efficiency but retain very little flexibility, meaning if your ML model doesn't work well on an ASIC then your only option is to change your model.

We chose to focus on FPGAs first because with them we can maximize the usefulness of Tensil's flexibility. For example, if you want to change your Tensil architecture, you just re-run the tools and reprogram the FPGA. This wouldn't be possible with an ASIC. That said, we'll be looking for opportunities to offer an ASIC version of our flow so that we can bring that option online for more users.

p1esk · on March 12, 2022

I saw somewhere that 95% of all ML inference tasks is still done on CPU

tdba · on March 13, 2022

It's true that inference is still very often done on CPU, or even on microcontrollers. In our view, this is in large part because many applications lack good options for inference accelerator hardware. This is what we aim to change!

p1esk · on March 13, 2022

So, in your opinion, why would those CPU users want to migrate to an FPGA and your software rather than to Nvidia T4 or Tegra and CUDA?

tdba · on March 13, 2022

It depends on the application. For some use cases, moving to a GPU makes total sense. However, if you have power constraints, form factor constraints, performance constraints or simply want to be in control of your own hardware, using an FPGA with Tensil may be a better option.

balsam · on March 12, 2022

Is this going to be ad-supported in the future? Like, in an IDE?

Assuming you guys are not a nonprofit. :)

Just curious how money can be made from what seems like an FOSS CLI offline product.. maybe maintenance subscriptions somehow, then?

tdba · on March 12, 2022

I replied to your other comment here about commercialization https://news.ycombinator.com/item?id=30652150 I hope it was helpful!

balsam · on March 15, 2022

Hah not really.. I had already read your canned reply by that time. But I guess it was a combo of my lack of imagination and you not wanting to be constrained/havent fleshed out the details etc. Now, thinking about it, I can imagine a cloud sandboxed IDE interface— like repl.it. The tricky part, how do you interface with a edge/client device. Maybe your compiler emits (special) “wasm” or something.. (you could ship a docker but thats still another moving part — heres where the gitlab-like hosting comes in?) ..pretty sure my wanking here wont help you that much though lol

ColonelPhantom · on March 11, 2022

Great project! How does the performance compare with conventional CPU/GPU based inference? Those devices are usually a lot higher power (and bigger/more expensive), but obviously do not benefit from specialization.

tdba · on March 11, 2022

Thanks! The general answer is that it depends on your model and on which FPGA platform we're talking about, but in a head-to-head benchmark test you'll find results in the ballpark of 2-10x CPU and 0.5-2x GPU. As you point out, the power and cost are big differentiators. The other thing to consider is (as another commenter mentioned) that usually inference on CPU or GPU will require you to do some model quantization or compression, which can degrade model accuracy. Tensil can give you a way around that dilemma, so that you can have great performance without sacrificing accuracy.

touisteur · on March 11, 2022

Hi, I'm curious what you mean about model quantization being necessary on CPU and GPU? They're not necessary by default, as openvino, tvm, tensorrt can run single-precision inference on most classic models quite fast? If you're reaching for very low power or ultimate perf, yeah you can downgrade to fp16 (well... Mixed precision) with NVIDIA tensor cores or avx512-fp16, or bf16 in some Intel vnni confs? Going to integer will give you more throughput too but it's not necessary. Even myriad-x is supposed to handle some kind of fp16 with the shave cores.

The only time I had to reach for quantized (integer) networks to do anything at all was inferencing on FPGAs. Are you targeting dsp slices by default or implementing full ieee754 floating point by default?

Are you saying that with Tensil you can run single precision non-quantized models with up to 2x gpu perf?

I probably misunderstood your last sentence, sorry.

Genuinely curious!

tdba · on March 11, 2022

Sorry if this was unclear - in a datacenter use case you are right, but for an edge deployment, you will usually need to quantize, prune or compress your ML model to get it working as fast as you'd like on a sufficiently small CPU/GPU. Compared with running your ML model unchanged on those platforms, Tensil can run with the performance ranges listed above. You can also quantize and use Tensil too!

forgotmyoldacc · on March 12, 2022

It'd be great if you could add benchmark numbers for this comparing CPU/GPU on inference / sec and inference / watt.

tdba · on March 12, 2022

Will do - as I mentioned in another comment, it can be a bit subtle to find an apples-to-apples comparison, but we'll soon add some cross-platform that we think are reasonable.

37ef_ced3 · on March 12, 2022

Please compare against https://NN-512.com

tdba · on March 12, 2022

Sure, we'll check it out!

applgo443 · on March 12, 2022

I tried to look for it but didn't find how much better your compiled model is when compared to tensorflow/pytorch natively run on the device. Do you have this somewhere?

tdba · on March 13, 2022

If you have a device with known performance in mind, you can compare against our benchmarks listed here https://www.tensil.ai/docs/reference/benchmarks/

We'll be expanding this list and adding more comparisons to other platforms in the near future.

sathergate · on March 11, 2022

how does this compare to apache TVM?

tdba · on March 11, 2022

Great question - TVM / OctoML are a great option if you have an off-the-shelf ML model and off-the-shelf hardware. Tensil is different in that you can actually customize the accelerator hardware itself, allowing you to get the best trade-off of performance / accuracy / power usage / cost given your particular ML workload. This is especially useful if you want to avoid degrading the accuracy of your models (e.g. through quantization) to achieve performance targets.

sathergate · on March 11, 2022

That makes sense. So is this only for edge compute use cases, or can I use tensil on an FPGA I have running in my data centre?

tdba · on March 11, 2022

You absolutely can use it in a data centre. You can even tape out an ASIC using these designs! Currently we've done most of our prototyping with edge FPGA platforms but if you want to try other platforms we'd love to help you get started. You can email me at tom@tensil.ai or use the contact methods on the website.

balsam · on March 12, 2022

How do you guys plan to make $$$?

(Sorry in advance for helping me catch the elephant in the room!)

erulabs · on March 11, 2022

Congrats Tom! Can’t wait to have a use-case for this (soon!)

dang · on March 11, 2022

All: these guys did a Show HN yesterday at https://news.ycombinator.com/item?id=30615605 (there was a scheduling mixup on my part). I mention it here to (a) explain the dupe, for anyone who saw that thread; but also (b) to tell everyone that the discussion there was unusually high-quality, so you might want to check out those comments first.

Actually, maybe we should just merge that thread into this one. I'll double check if that makes sense.

Edit: ok, I've moved the comments in here now. Some of the times are messed up, but I think it makes more sense for the comments to be in one place so readers don't have to go back and forth. Sorry for any confusion!

normcoreashore · on March 9, 2022

Sweet!