My deep learning rig (2022)

knicholes · on Aug 15, 2023

I have pretty much the same setup but just 4 3080s in one machine. If you go to rent something from vast.ai, you can see the hardware specs of the most performant machines. I just copied those and pieced together the rest. Learning about PCIe versions and ports was very interesting, as I thought I could use another motherboard for a similar purpose, but just because there are four PCIe 3 slots, it doesn't mean they're all usable at 16x AT THE SAME TIME!

I bought 21 R12 Aurora Alienwares and turned a side-office in my garage into a hot-ass crypto farm with two swamp coolers, four 20A circuits, a 15A circuit, and a bunch of surge protectors. I was always afraid to move something in fear that I'd overload a circuit due to the effort in calculating which power supply was powering which machine(s).

I gave away the 3080s to friends and family and kept the 6 3090s.

keyle · on Aug 16, 2023

I have a dumb question - what do you do with it? And why is it better to have your own hardware for it?

Will you ever see positive return on investment from building this or is it feeding the nerd chute?

knicholes · on Aug 16, 2023

Oh yeah, some days I was making $600-800 profit a day! But most days it was ~$100-$200/day. The problem was "the merge" to Ethereum 2 was soon approaching, but was luckily delayed 1.5 years, so I was able to pull ~$80k profit during that time. It was awesome. But now I have a couple holes in my garage office with big fans installed (one on the bottom for intake and one on the top for exhaust, essentially turning that room into a big computer tower).

jacquesm · on Aug 16, 2023

> what do you do with it?

>> crypto

trustingtrust · on Aug 16, 2023

QuantFinance.

Money in the stock market (or crypto market but similar) is probably the only way to justify tens of thousands of dollars worth of GPUs that become obsolete quickly.

The other thing I can think of is Reseach scientists who use it for research. But funded research scientists that I know generally don't get hardware to take home. They usually either get tons of cloud credits or time on a supercomputer to do their stuff.

wwtdtgotiatl · on Aug 16, 2023

> tens of thousands of dollars worth of GPUs that become obsolete quickly

I sold a 4yo gpu for more than I bought it for not long ago.

It's the only thing in a pc that doesn't depreciate much these days, besides maybe the case.

jacquesm · on Aug 16, 2023

They wrote crypto in the original comment.

trustingtrust · on Aug 16, 2023

Ah okay missed it. So not QF. I was hoping someone does QF at home :P

knicholes · on Aug 17, 2023

If my hardware can do it, point me to how I can do it! I'll give it a try and let you know how you changed my life. ;)

belter · on Aug 16, 2023

Do you have a direct phone line to your local power company, and do you have to ask them for permission, before turning your rig on? :-)

knicholes · on Aug 16, 2023

Well, I can say that at $0.0875/kWh, it was a steal. I don't want to draw any conclusions about why they recently raised electricity price for everyone by 25%... But I'm suspicious that I and similar others may have contributed.

Havoc · on Aug 16, 2023

Swamp coolers for electronics?

nomel · on Aug 16, 2023

Consumer electronics are designed for consumer living conditions. In many parts of the world, indoor humidity can sit over 60%, all year long. In a dry climate, you could get some significant cooling, and still be under what you'll find in my house.

Havoc · on Aug 16, 2023

Makes sense thanks

jacquesm · on Aug 16, 2023

Phase change based cooling, especially when combined with a natural source of low temp air (such as a large basement) can be remarkably effective. Your typical DC has much too high a power density to be able to utilize this but for a single large rig in a home it could work very well.

eep_social · on Aug 16, 2023

In a dry climate the increased humidity still barely moves the needle and they’re extremely cheap to build and operate.

knicholes · on Aug 16, 2023

The humidity (in Utah) in that room was like 4%. Datacenters should be between ~45 or 60-80% humidity, depending on which source you read, to prevent electrostatic discharge. It was way cheaper than installing and running a mini-split. Plus, I NEEDED the added humidity.

metadat · on Aug 16, 2023

> One of my EPYC’s is a retail model, the other is a QS model.

What's a "QS" model?

Okay, I searched and it means "Qualification Sample", i.e. a grey market, non-production grade CPU.

Never encountered this initialism before, despite being a CPU aficionado. Hope this saves you some frustration!

seany · on Aug 16, 2023

ES = Engineering Sample. Is also one that you come across from time to time.

wincy · on Aug 16, 2023

Looks like I can get a $3000 MSRP EPYC 4th gen 9334 for $750. I can see why people buy them even if it’s not strictly legal.

kimixa · on Aug 16, 2023

Also taking the risk on assuming it'll get stuff like microcode updates, notable for the constant churn of new speculative execution attacks or other things that can cause those impossible-to-track-down occasional instability issues.

catchnear4321 · on Aug 16, 2023

but will it run crysis

throwing_away · on Aug 15, 2023

> This is because without dropping serious $$$ on mellanox high-speed NICs and switches, inter-server communication bandwidth quickly becomes the bottleneck when training large models. I can’t afford fancy enterprise grade hardware, so I get around it by keeping my compute all on the same machine. This goal drives many of the choices I made in building out my servers, as you will see.

10gbe is very cheap now, but I guess that's not enough?

liuliu · on Aug 15, 2023

Yeah, you need 100gbe minimal. 10gbe is too little (PCIe bandwidth can be a bottleneck, and that is already clocked around 100GbE (16GB)).

BTW: echo to the author, PSU and in the U.S. (120v) is a major issue why I am limiting to 4-GPUs. Also, it seems 3090 still have NVLink support, wondering why the author haven't put that up. From what I experienced, NVLink does help if you run data parallel training.

nixgeek · on Aug 16, 2023

You can do 100GbE for about $150 a port in switch cost (new); you sometimes see ConnectX-5 cards on eBay for about $100-150 a port (used). I’ve got a fairly good amount of 100GbE in my homelab. Pretty affordable in 2023.

200GbE and 400GbE is still totally unaffordable for anything remotely personal, IMO.

choppaface · on Aug 15, 2023

Do you have any examples of nvlink improving performance for 3090 with vanilla pytorch DDP? Or are you talking some other training impl?

liuliu · on Aug 18, 2023

It is model-dependent. I've seen that (NVLink benefits) when comparing against PCIe-3 connection, with small batch size, no gradient accumulation.

Once you have larger batch size and gradient accumulation, DDP won't be improved by NVLink I believe (the all-reduce traffic on gradients will be small comparing to your computation overhead).

Tostino · on Aug 16, 2023

I've literally not seen any real (more than buying another card) improvement with NVlink on anything i've seen online over the past ~year.

I'd honestly be a bit suspect.

liuliu · on Aug 18, 2023

Yeah, I am talking about ~10% performance differences for a specific model (I believe it is when benchmarking vanilla seq2seq model (from AIAYN paper) with small batch size and no gradient accumulation).

NVLink is about ~$100, so for these cases, I guess you shouldn't expect it to be "more than buying another card" type of improvements.

choppaface · on Aug 16, 2023

I have seen some benefit on A100s only but am unfamiliar with how to get any nvlink gains on RTX cards. Monitoring vanilla pytorch DDP in nvtop on RTX, I haven’t seen PCIE bus transfer speeds approaching the theoretical max. The OP uses bifurcators in his 8-gpu box so clearly OP does not seem to be bus-limited.

wredue · on Aug 16, 2023

I would be additionally skeptical that there’s any consumer, or even fire decal hardware (gamer brand stuff) that actually delivers consistently on 10gbe across several nodes.

Enthusiast / small business / entry level enterprise gear will get you there, but you’re looking at several hundred dollars per port.

magixx · on Aug 16, 2023

10Gbe can be pretty cheap ($30 for X540) PC side with used hardware (SFP mostly and not multigig however). Even generic PCIe cards can add a RJ45 10G multigig port for ~$50. The switch/router side is where I find it gets expensive.

jacquesm · on Aug 15, 2023

Couldn't you use a 240V dryer socket for that purpose? That should get you 7200 Watts on a 30A circuit.

supertrope · on Aug 15, 2023

You might need a second 240V outlet for the air conditioner to evacuate that much heat.

jacquesm · on Aug 15, 2023

True, but when you operate 7 GPUs on one board I reckon keeping a good eye on your thermals is where it starts. Otherwise you can kiss your GPUs goodbye well before you reach the payback moment.

Edit: and you'd have to factor in that cooling power as well into the running costs.

buildbot · on Aug 15, 2023

They might not be using nvlink due to selling on vast, someone could rent a single GPU out of the four available. No idea if there is some cross user security implication with nvlink in that scenario.

LTL_FTC · on Aug 15, 2023

If these server boards support thunderbolt AIC's, and I believe they might as my Threadripper Pro board does, daisy chaining them together could get you 40Gbps somewhat easily, if that is sufficient.

bradfox2 · on Aug 15, 2023

100gbe mellanox connect x cards are not actually that expensive though.

jacquesm · on Aug 15, 2023

You'd need a switch too, unless you're going point-to-point but that will eat up PCI slots that you probably would like to use for GPUs.

tuetuopay · on Aug 15, 2023

mikrotik makes some pretty nice hardware for rather cheap. they now offer a 4x100Gbps switch for $800, which is a darn good price.

jacquesm · on Aug 15, 2023

Indeed, at $200 / port that's really neat if that's all you need. Funny, I remember paying close to $1000 / port for 100 mbps not all that long ago :)

bradfox2 · on Aug 15, 2023

You really want gpu rdma though. It's a bit of a pain to get setup but it's worth it.

jacquesm · on Aug 15, 2023

Not on the 3090 though, isn't it?

jpdus · on Aug 15, 2023

What's your opinion on the coming (or not) GPU crunch?

When looking at cloud GPU availability and current trends (no one except some enthusiasts and bigtech is finetuning and serving on a large scale yet and results keep getting better and better), I fear we will run into a situation where GPUs will be extremely expensive and hard to come by until supply catches up?

I ordered a high end PC with 4090 for the first time in years (normally would always prefer cloud even if more expensive) because I want to be on the safe side. What do you think, is this irrational and just a bubble thing?

panarky · on Aug 16, 2023

It's a temporary crunch while TSMC expands capacity. They were reluctant to invest in capacity during the crypto crunch because they (rightly) understood that to be a temporary spike in demand. This time, they (also rightly) recognize AI as a longer-term shift in demand, and they're busy retooling to meet it.

The real question is whether NVIDIA will maintain its lock on the market, or if vendor-agnostic Torch will help commoditize the segment.

arvinsim · on Aug 16, 2023

I am also thinking if I should build a rig to bet against expensive GPU prices.

My fear is that Nvidia trying to anchor GPU prices to crypto levels whether there is TSMC shortage is not.

Tostino · on Aug 16, 2023

"thanks Facebook" was not a term I have ever actually wanted to utter.

drBonkers · on Aug 16, 2023

> vendor-agnostic Torch

Is this a competitor to CUDA?

Roark66 · on Aug 16, 2023

Cuda runs on a lower layer than torch. There is not much competition to cuda if one wants performance, but if one just wants to run it there is cpu (openblas etc and Intel's MKL) and onnx.

It's pretty astonishing to me AMD has neither a proper math library (like MKL) nor gpu compute library (like cuda).

doctorpangloss · on Aug 15, 2023

Without a fully connected NVLink network, the 3090s will be underutilized for models that distribute the layers across multiple GPUs.

If AMD were better supported, it would be most economical to use 4x MI60s for 128GB using an Infinity Fabric bridge. However, in order to get to the end of such a journey, you would have to know something.

mk_stjames · on Aug 15, 2023

The bifurcated risers mean that some of the cards are only running at x8 pcie speed as well, and they mention they are only working with pcie-3 not 4.

This would severely limit training using model parallelism.

For data parallel where the full model fits on each card and the batch size is just increased it wouldn't matter as much, and maybe that is the primary use for this.

I wonder how this is dealt with on vast.ai rentals. Because there is a huge difference if I needed 7x 3090's where I need all 168GB to load weights on a single giant LLM model vs. just wanting to run 4GB Stable Diffusion in parallel inference with a massive batch size....

jacquesm · on Aug 15, 2023

I don't see much in terms of differentiation based on topology and interconnect, I also searched the FAQ. Maybe I'm missing something?

mk_stjames · on Aug 15, 2023

It could completely depend on what size model your are training the topology, it may not make a difference for you, but for reference-

See here [0] about just the difference between having NV-link between cards or not, and the 23% increase in training speed, and the note that the peak bandwidth between 2x 3090s with the link is a peak of 112.5 GB/sec.

[0]: https://huggingface.co/docs/transformers/v4.31.0/en/perf_har...

Now look at PCIe 3.0 speeds, which would be what any two cards talking to each other would need to use thru your risers- only 15.754 GB/s on x16 and only 7.877 GB/s if you are on a x8 riser.

For some non-ML things that I use GPU's for (CFD), the interconnect / memory access bandwidth is the bottleneck, and the simulation time literally scales near linearly with the PCIe bandwidth I have between cpu lanes and the cards.

jacquesm · on Aug 16, 2023

Hm, that makes me wonder whether vast.ai does some kind of testing prior to dropping particular workloads on a machine so that this can be done without configuration. But there doesn't seem to be a financial incentive to suppliers of GPU capacity to provide for good interconnects.

mk_stjames · on Aug 16, 2023

vast.ai does denote GPU's as either PCIe or SXM (SXM for the A100's and now H100's up there). The SXM bandwidth between GPU's is the full n-way NV-link, the best you can get;

They also have a little stat that lists 'per-GPU' bandwidth in GB/s and what PCIe version and speed is being used. So they must run some tests beforehand to gauge this. When I look on there now it varys setup to setup. I see people running quad 4090's on PCIe4.0 x16 with 24GB/s bandwidth between them, some people running x8, some on PCIe3.0 with 11 GB/s, even someone with quad 3090's but all on PCIe 2.0 x1 slots with the bandwidth reading 0.3GB/s!!! (likely an old mining rig with those x1 slots)

jacquesm · on Aug 16, 2023

Do they get more money per instance or is that simply a matter of seeing more utilization?

mk_stjames · on Aug 16, 2023

I've never rented on there so I'm not sure. It seems like the prices are set by the sellers but with some guidance from their performance tool, likely?

The have a overall "DLperf" deep learning performance score which is an performance metric and it seems like if you want to get users you set your price per hour at something competitive in line with the market given that metric.

For instance, the guy with the quad 3090s on x1 slots actually has an hourly price set HIGHER than everyone else... even with horrible DLPerf score. No one is using that, ever.

jacquesm · on Aug 16, 2023

Interesting... I should try some workloads to see what it can do in practice. Beats having a whole bunch of fans in my room and it seems to be cheaper than the other cloud GPU providers.

woodson · on Aug 16, 2023

It was primarily being used to train TTS models (see https://github.com/neonbjb/tortoise-tts), which largely fit into a single GPUs memory. So, for data parallelism, x8 PCIe isn't that much of a concern.

jacquesm · on Aug 15, 2023

What kind of factor would that be?

brucethemoose2 · on Aug 16, 2023

I am duper frustrated that AMD doesn't make a "ML Edition" 48GB 7900. They dont have much to lose, so why not throw down the gauntlet.

Doubly so for Intel. They literally have no pro market to lose with a 32GB A770, and everything to gain from momentum for their stack.

slavik81 · on Aug 16, 2023

The Radeon Pro W7900 is 48 GB. https://www.amd.com/en/products/professional-graphics/amd-ra...

The W7900 is officially supported by ROCm on Windows. On Linux, the W7900 is enabled, though not officially supported.

brucethemoose2 · on Aug 16, 2023

For ~$4000.

Thats not A/H100 terrible, but its still out of reach for many nonprofessional ML users/tinkerers.

woodson · on Aug 16, 2023

It's roughly on par with NVIDIA RTX A6000, which also has 48GB VRAM.

brucethemoose2 · on Aug 16, 2023

...Yes, and thats the problem. As a reminder, the A6000 is basically a double RAM 3090 TI, a 2020 GPU going for $4K.

Nvidia is Nvidia as needs to preserve their pro tier stratification. But AMD has less to lose, and Intel has nothing to lose by pricing them more like gaming cards.

kimixa · on Aug 16, 2023

Probably because they want people to buy the W/MI series cards instead.

Though looking around you can get an MI60 on ebay for $500, seems really good for 32gb and still seems to be supported by rocm (as it's just an MI50 with more hbm). Looks to be the cheapest way of getting a GPU with that region of memory support, though things like FP16 speed and BF support suffers compared to later generations. Though from what I've seen most "home" ML tasks are often memory limited pretty hard before ALU limitations kick in. And no idea if a home user would be able to use the IF links either for helping multi-gpu either.

hooloovoo_zoo · on Aug 15, 2023

Interesting, wonder what the actual income from vast.ai looked like.

jacquesm · on Aug 15, 2023

Likewise, based on the costs listed on their page I'd say no more than $.8 / hour or so assuming a 50% gross margin for vast.ai.

And that includes energy costs so I assume the OP has a cheap source of power. Here in NL I could not do this profitably, even off solar power it would be more efficient to sell that power to the grid than to use it to drive a GPU rig.

pja · on Aug 16, 2023

The vast.ai FAQ claims they take 25% of the hourly rate: https://vast.ai/faq#Hosting-General

ragebol · on Aug 16, 2023

> PSUs: 3x EVGA 1600W G+

With the proper plumbing, you could hook up your water heater to it as well.

jacquesm · on Aug 16, 2023

Hehe, spot the Dutchman ;)

Co-generation is in fact a pretty good idea when you start running large computers at home. But in summer...

ragebol · on Aug 16, 2023

> Hehe, spot the Dutchman ;)

Guilty as charged.

jacquesm · on Aug 16, 2023

Copper wire was invented in the Hague in 1880, when two residents ended up fighting over a cent...

rthnbgrredf · on Aug 16, 2023

Anyone know what kind of performance, in terms of token per second, one could expect with such a system running llama-2-70b?

EVa5I7bHFq9mnYK · on Aug 16, 2023

That system is for training, it will probably spew tokens like a Terminator's machine gun.

tamimio · on Aug 16, 2023

We barely got done with mining craziness and now we have DL/LLM craziness..

Tostino · on Aug 16, 2023

There is an actual difference. You can make acutal, real-world, impacts with today's LLMs in the existing business world. They can actually make a difference for customers.

That was never the case with a crypto currency.

Zuiii · on Aug 18, 2023

> That was never the case with a crypto currency.

That's a flat out lie and you know it.

slt2021 · on Aug 16, 2023

Gamers though can breathe free, because LLM requires max GPU memory cards, while gamers are fine with smaller memory size cards

Havoc · on Aug 16, 2023

To a point. All those gaming gpus with 8gb aber starting to reach their limits

coolspot · on Aug 16, 2023

Until games will start bundling some 7B models to drive NPC dialogs.

kitanata · on Aug 16, 2023

[flagged]

biztos · on Aug 16, 2023

Not sure what your context is here, but I delete comments from time to time. It works fine as long as you get to it reasonably quickly and nobody has replied yet.

From the FAQ:

"What does [deleted] mean?

The author deleted the post outright, or asked us to. Unlike dead posts, these remain deleted even when showdead is turned on.

https://news.ycombinator.com/newsfaq.html

[edit: formatting]