Hacker News new | past | comments | ask | show | jobs | submit login

This looks mostly like an academic or 'for fun' exercise. People run Doom on their fridge for fun so why not squeeze the Moana asset through the bottleneck that sits between your CPU and your GPU? :)

Is it practical/useful? Let's put the timing in perspective.

A commercial CPU production renderer, 3Delight, has timing for the Moana asset rendered at 4k on their website.[1]

Time: ~34 minutes.

I asked them for details about the settings they used before posting this as the page only lists resolution.

4k resolution, 64 (shading) samples per pixel (spp), ray depths: diffuse 2, specular 2, refraction 4 (or 3, 3, 5, depending how you count ray depth). Machine was a contemporary 24 core server at the end of 2018. Mind you, the image is fine with 64 spp. Spp are hard to compare between renderers because optimizing path tracers is a lot about sampling. One renderer will converge to something useable with 1k samples while another just needs 64.

The 3Delight example is rendering all the geometry as subdivision surfaces with displacement (and their own Ptex implementation for texture lookups).

Timing comparisons of a different scene with recent 24core desktop AMD CPUs suggest that this asset would render much faster in 2020.[2]

The timing shows the issue with GPUs vs CPUs for this kind of assets. 5h for a 1k (!) resolution image with <= 5 bounces and 1024 spp (samples per pixel). That is terrible. Not using the real (subdivision) geometry and not using displacement.

I would love to see a breakdown how much of these 5hs is owed to the fact that the data doesn't fit on the device.

Using subdivision surfaces and displacement mapping make the amount of geometry grow exponentially. I.e. the out of core handling would predictably take an exponentially larger part of the render time.

Looking at the numbers I regularly get to see when counseling VFX companies on their rendering pipelines I don't see GPU offline rendering going anywhere for complex scenes. And even for simpler scenes where GPUs have an advantage still – with the CPUs AMD is putting out recently the gap is becoming very tight and if you do the math you often pay dearly for having an image a few minutes earlier (not even double digit minutes or hours earlier). Regardless of what Nvidia's marketing and some vendors who IMHO wasted years optimizing their renderers for a moving hardware target may want you to believe.

Regarding the latter: another point to consider is that you need to spend time working with/around the hardware limitations/bottlenecks of GPUs for this very the "scene doesn't fit on device" use case. Someone writing a CPU renderer can spend that time working on the actual renderer itself. This kind of software takes years to develop. Go figure.

Finally, as I expect this to be downvoted because of what I just said: don't take my word for any of the above. Just try it yourself.

The Moana Asset can be downloaded at [3]. A script to convert the entire asset and launch a 3Delight render can be had at [4]. The unlimited core version of the renderer can be downloaded for free, after registering with your email, at [5].

It renders with any number of cores your box has but it adds a watermark if no license is available.

Or you thy their cloud rendering. You get 1,000 free 24 core server minutes. Which is plenty to run this test.

[1] https://www.3delight.com/documentation/display/3DLC/Cloud+Re...

[2] https://www.3delight.com/page/features/2020-10-06-CPUbenchma...

[3] https://disneyanimation.com/resources/moana-island-scene/

[4] https://gitlab.com/3Delight/moana-to-nsi

[5] https://www.3delight.com/download




It seems like you've missed the point. Chris isn't making any claim that the run time is fast compared to an in-core production renderer. He rendered Moana in 8GB of GPU memory, which is smaller than the input data. From the advertisement, it's clear you are a 3Delight fan (employee?), but I bet 3Delight cannot do that: render Moana without using more than 8GB of CPU memory.

The algorithm here to do out of core rendering is the important part, and it doesn't make sense for you to try to compare in-core CPU rendering to out-of-core GPU rendering.

> I would love to see a breakdown how much of these 5hs is owed to the fact that the data doesn't fit on the device.

I already know it's pretty close to 100% of the time spent handling out-of-core requests, that's not surprising, nor is it a bad thing (though it probably can be improved). If it were a CPU renderer doing this - streaming the geometry & BVH, the result would be the same (or much worse if streaming from SSD instead of an external ram of some sort.)




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: