Ask HN: Share your FFmpeg settings for video hosting

visualblind · 2024-07-24T11:58:16 1721822296

Video codec transcoding is very CPU resource expensive. If you do a lot of it, you should be looking into doing hardware-accelerated transcoding. https://trac.ffmpeg.org/wiki/HWAccelIntro

My ffmpeg how-to/examples/scratchfile can be viewed here: https://paste.travisflix.com/?ce12a91f222cc3d7#BQPKtw6sEs9cE...

izacus · 2024-07-24T12:02:58 1721822578

Hardware video encoders all - even in 2024 - produce significantly worse quality at the same filesize.

They're made to be realtime, but for any kind of delayed playback where there's time to encode, software encoders win without any kind of effort. For web delivery especially, hw encoders have no business being used because quality per expended bandwidth is paramount and costs money.

rahimnathwani · 2024-07-24T12:11:30 1721823090

IIRC both x264 and nvenc have multiple profiles for the tradeoff between quality and computing power.

For your comparison, are assuming that the objective is best quality, e.g. that you'd accept 10x the computation even if it gave only a 2% quality improvement?

(I can see how this could make sense, if you're encoding a file once and it will be viewed many times. But I could imagine other situations, e.g. where most files are viewed once or never, and only a few files are very popular.)

izacus · 2024-07-24T12:19:19 1721823559

Having profiles doesn't really change the fact that even Ampere generation encoding block at slowest profile won't come close to visual quality at same output bitrate to x264s slow+ profiles (and we're not even touching on H.265/AV1 here).

> For your comparison, are assuming that the objective is best quality, e.g. that you'd accept 10x the computation even if it gave only a 2% quality improvement?

The difference is more like 150% encoding time for half the filesize at same SSIM - depending on configuration and video type of course. And that ignores the fact that a server machine with 64-core Threadripper or equivalent can handle parallel encoding of many more videos at massively lower dollar cost than using nvenc. Especially at current GPU prices and GPU power consumption.

There's a reason why all online services are encoding in software (usually with x264 & co.) for mainstream most used profiles (that is, SD/HD, many also for 4K).

It just doesn't make sense from product quality, user experience or financial perspective. It only makes sense if you never check the results of your production.

visualblind · 2024-07-24T12:08:09 1721822889

I agree with your statement mostly. However I think if we're dealing with 1080p displays, hardware-accelerated transcoding produces acceptable video quality for non-4k+ watching. That's just my 2 cents.

izacus · 2024-07-24T12:24:42 1721823882

Can you explain your point there though? It won't change the fact that you could deliver the same visual quality at significantly reduced bandiwdth cost FOR YOU and for the user. You're making user experience worse (e.g. on less stable 4G/5G links and limited connections), paying more for your egress bandwidth for what? Using an expensive GPU block (with most of its cores idling) to encode slightly faster than a cheaper CPU core will do?

It just doesn't make sense to save a few minutes of encoding time (a one time operation) when that costs you and users money every single time that file is streamed.

visualblind · 2024-07-24T12:34:12 1721824452

> Using an expensive GPU block (with most of its cores idling) to encode slightly faster than a cheaper CPU core will do?

Slightly faster? Sir, I do think you are mistaken.

There are other variables go that into it such as choosing between constant or variable bitrates & variable or constant frame rate, and profiles obviously. There's a crapload of other directives that you can tweak as well.

I see what you mean though, you're right if you have CPU with lot of cores/threads/horsepower then you should use that instead of hardware-accel transcoding. I don't.

GordonS · 2024-07-24T12:26:52 1721824012

I found this recently too - encoding my video using either AMD or Nvidia hardware encoding resulted in poor quality. But what's the reason for this?

izacus · 2024-07-24T18:03:05 1721844185

In short - on the CPU the encoders spend more time looking for similarities between frames and have access to more frames of history to find repetitions. Finding similarities is where all the compression comes from.

The hardware encoders are designed to be fast (at least realtime), small (in component count) and power efficient so they usually don't have the ability to do extensive searches for similarities in frames to really squeeze out the best compression. Pretty much all of them are also largely fixed in supported operations during encode so you can't really implement better algorithms with software updates on them.

Making them better would mean growing their (sillicon) size which would make the graphics cards more expensive to produce and take space away from actual GPU cores. Most customers don't buy a GeForce for its video encoder. And once you'd grow them enough, you'd get something similar to a CPU anyway.

And this is how we get back full circle - even enterprise "encoder-in-a-box" providers have lately gone from boxes with ASICs in them to essentially seling proprietary servers with normal CPUs and proprietary OS on them. With the current prices of 64-core+ CPUs, it just doesn't make much sense to design ASICs and HW encoding blocks for these types of encodes.

Realtime encoding is, of course, another game.

amelius · 2024-07-24T12:35:16 1721824516

Probably because they're optimized for low-latency.

Anyway, by hardware encoders people typically mean dedicated hardware.

amelius · 2024-07-24T12:09:20 1721822960

Nobody working on GPU algorithms to get a bit of the best of both worlds?

visualblind · 2024-07-24T12:15:02 1721823302

The major companies are such as nvidia Cuda (nvenc encoder/nvdec decoder) and VAAPI in Intel CPUs. FFmpeg has lot of good documentation on this subject.

izacus · 2024-07-24T17:53:50 1721843630

nvenc, VAAPI and pretty much all the others use fixed encoding blocks and don't use programmable cores for video encode.

Apreche · 2024-07-24T12:45:06 1721825106

Generally when you are hosting videos you don’t pick one set of transcoding settings and stick with that. You transcode many many renditions and then serve up the appropriate one depending on the client. It’s quite difficult and expensive in terms of both processing and storage. This is why it is so difficult to unseat the incumbent platforms like YouTube. It’s also why so few people do this on their own.

turkishmonky · 2024-07-24T13:24:12 1721827452

While not at the same level as any large service, I typically spin off a 720p copy of any media as a background process nightly. 720p is pretty small in comparison to 1080p or 4k source files. This way I have an immediate mobile friendly version I can stream from my phone or load up on an ipad without emptying it's storage, even if the quality is markedly lower.

Joeboy · 2024-07-24T13:11:27 1721826687

If it's public facing, I would think content moderation and copyright enforcement would be the bigger challenge.

adamzochowski · 2024-07-24T13:30:06 1721827806

Last 3 years I traveled extensively and had limited and flakey bandwidth.

You should have a low bandwidth setting that also uses new codecs.

Like 64kbit stereo opus is to my ears almost imperceptible to CD audio. I think listening tests by professionals recommend using between 64kbit to 96kbit for perfect audio.

Anything beyond is a waste unless we are talking about more than stereo.

Also if you want, you can use mpeg dash to stream video. Here you encode video into small series of chunks/files. When player can't handle high bandwidth, it can switch to lower bandwidth automatically, and vice versa. This is what YouTube and any professional places do. This will also help prevent users from easily downloading complete video. The trick is that you will need to ensure all videos are split on same key frame, so either use two pass encoding, or define that every ?3? seconds exactly is a new video file.

https://www.cloudflare.com/en-ca/learning/video/what-is-mpeg...

indulona · 2024-07-24T14:46:15 1721832375

there are audio-only and 360p versions. in addition to 720p and 1080p.

joenot443 · 2024-07-24T12:11:37 1721823097

> How did you pick your settings, what issues have you encountered and any tips you can share are certainly appreciated.

The settings are picked based on what format, resolution, bitrate, and codec I'd like. I don't think this is something you need to spend time nitpicking, admittedly. :)

You mention you're working on a site for video hosting. Have you thought about how you're going to deliver video at scale? Sending video over the wire is super expensive and your costs will probably increase faster than your revenue, unless you're charging out the gate. Cloudflare has some plans which let you deliver for nearly free, but your content needs to be fairly static.

Good luck! Don't sweat the small stuff - just keep building.

indulona · 2024-07-24T14:00:18 1721829618

i have my own cdn.

Am4TIfIsER0ppos · 2024-07-21T23:21:09 1721604069

For synchronized group watching I typically encode to h264 and aac in mp4. Video: render subtitles if available, downscale to 720p if larger, ensure yuv420p, encode with: preset slow, crf 24, and tune animation or film as appropriate. Copy audio if aac, 2 channel, and bitrate is less than 160k otherwise: force 2 channel, encode at 128k. Fast start is needed otherwise I'd use a separate program to put the "header" at the start.

As for your command line, what do you think -g 52 does? Why do you give conflicting audio channel settings?

indulona · 2024-07-22T06:22:15 1721629335

-g 52 = at the latest every 52nd frame will be a key-frame(so little over 2 seconds at 24 fps), this influences seeking(how long an image might have artifacts before it renders properly). there are no two conflicting audio channels settings. the -ar is audio sampling rate(data points per second) whereas -maxrate:a is bitrate(actual bits of data per second). the yuv is good point, i guess i is a safe-guard for exotic input. also, copying the audio might be worth it, although then i would end up with mixed output files where one might be original aac and another might be opus. so i think i prefer the uniform output over some saving in processing.

Am4TIfIsER0ppos · 2024-07-22T09:01:59 1721638919

Okay there is some thought put into your choice of keyframe interval.

And there is a conflict. You allow the aformat filter to negotiate for 5.1, 7.1, or stereo but then insist there must be 2 audio channels.

I copy the audio when it is suitable just to save another lossy encode.

indulona · 2024-07-22T12:26:46 1721651206

https://trac.ffmpeg.org/ticket/5718

atoav · 2024-07-24T11:24:58 1721820298

Ensuring yuv410p is the real important addition of this post. If you don't do that, depending on the user input you will end up with a mp4 that won't play.

AAC is more standard for mp4 video than opus, although I agree opus is the superior CODEC. Whether the benefits of opus outweigh the downsides of it being non-standard in your usecase is not mine to decide, but if you are looking to produce a thing that is similar to most other web video things out there I'd go with AAC. The saved bandwith is probably miniscule in comparison to what you could save on the video side.

ccvannorman · 2024-07-24T12:02:02 1721822522

Seconding this: adding "-pix_fmt yuv420p" to your conversion will ensure compatibility, for example if you try attaching a converted MP4 to a WhatsApp chat, it will not show a preview or play unless you include this flag.

indulona · 2024-07-24T11:50:14 1721821814

opus has no issues with support, why go with older aac then?

anotherhue · 2024-07-24T11:59:38 1721822378

If you think opus has no issues you must not have tested many devices.

If your subset handles it, great, but don't be dismissive.

indulona · 2024-07-24T14:08:44 1721830124

like nokia 3110 from 1998? :D

deicidist · 2024-07-24T17:06:09 1721840769

Like Rokus from 2019.

atoav · 2024-07-24T12:48:52 1721825332

Many users of youtube/Vimeo are used to supplying their files like that. Most GUI tools won't have Opus. So if you care about your users delivering already encoded files AAC might be the wiser choice.

This depends on your target audience however

deicidist · 2024-07-24T17:23:30 1721841810

Your choice of codecs is odd. If you decided you need h.264 for compatibility, that also means you need AAC and burned-in subtitles/captions or you lose all of h.264's broad support to missing audio and text support.

If you can restrict support to just current Android, Chrome, and Firefox, you can use VP9, Opus, and SRT. Willfully-outdated platforms like Apple and Roku have screwed over everyone.

indulona · 2024-07-25T12:24:04 1721910244

h264 is video codec and has nothing to do with audio codec. aac is old proprietary codec while opus is modern license-free codec with wide support anywhere. certainly these days. as for vp9.. i have already mentioned my reasoning.

efilife · 2024-07-24T15:29:22 1721834962

There's one thing I learned from asking questions like this. You never get a satisfying answer, and you ultimately have to just pick something. This is not about the quality of answers, this is just how the world works. No definitive solutions, just compromises

indulona · 2024-07-24T17:07:25 1721840845

i was mostly interested about lived-through experiences of people who also host videos. for example i encountered a case where audio file had a video track in it. it baffled me until i found it was the album cover/image as motion-jpeg. so i had to adjust the input validation. or that my own generated mp3 audio file from a video file was not passing a 90% ffprobe reliability threshold so again i had to alter validation... and things like that. like people mentioned that yuv420 profile being important. i also heard about malicious code being in video files or containers in general which can cause some issues. though i do not know much about it.

ddorian43 · 2024-07-25T09:06:10 1721898370

Go on a specialized video forum and not just ask on HN. Example: https://www.video-dev.org

indulona · 2024-07-25T15:25:06 1721921106

thanks. this will be useful.

seper8 · 2024-07-24T12:16:39 1721823399

If you're building a product, maybe use a CDN during MVP phase while you validate?

Don't think your end customers will care who's serving the video?

indulona · 2024-07-24T15:06:39 1721833599

if you mean the ones that do the encoding for you, well, those are not financially viable options. i have my own cdn network anyway.

seper8 · 2024-07-26T09:28:50 1721986130

If you think encoding it all yourself is going to end up cheaper, maybe you havent considered the full cost of:

building maintaining hosting

this piece of work?

anileated · 2024-07-24T12:28:41 1721824121

If you want browser-friendly video, look into HLS, simply MP4 with faststart is not enough. Plenty of ffmpeg snippets around.

Care to share what the site is?

indulona · 2024-07-24T14:03:19 1721829799

this is not for live streaming, hls/mpeg-dash makes no sense over faststarted mp4 container.

website is not ready to be public yet. but this year.

anileated · 2024-07-26T11:30:58 1721993458

You are confusing things. It is not about live streaming, it is about supplying a video with smooth playback, adaptive quality, etc., to the browser. The video would be already made and converted to HLS with ffmpeg. Not sure why am I thanklessly wasting time arguing with you in order to give useful advice against your will, of course.

phantomathkg · 2024-07-24T11:56:41 1721822201

What's the target usage? Is it to allow user to download the converted file? Or for streaming?

indulona · 2024-07-24T14:48:43 1721832523

i do not need to protect the files by chunking them via mpeg-dash/hls or slap some drm over the data, users can download them. this is not for live streaming, obviously.