Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Share your FFmpeg settings for video hosting
72 points by indulona 6 months ago | hide | past | favorite | 45 comments
I am working on a website that has video hosting capability. Users can upload video files and i will generate multiple versions with different qualities or just audio, thumbnails and things like that.

I have chosen the mp4 container because of how widely supported it is. To prevent users having to fetch whole files, i use the fast start option, where the container's metadata is written at the beginning of the file, instead of at the end.

Next, I have picked h264 codec because of how widely supported it is. VP8/VP9/AV1/x265/x266 are certainly better but the h264 software encoding is often beating hardware encoding due to highly optimized and time-proven code and supported hardware. And the uploaded videos are already compressed, users won't be uploading some 8k raw videos where most advanced codes would be useful for preserving "quality".

For audio, i have picked opus codec. Seems like good value over others. Not much else to add.

I run the ffmpeg to convert video with command like this:

ffmpeg -hide_banner -loglevel error -i input.mp4 -g 52 -c:v h264 -maxrate:v vbr -bufsize vbr -s HxW -c:a libopus -af aformat=channel_layouts=7.1|5.1|stereo -maxrate:a abr -ar 48000 -ac 2 -f mp4 -movflags faststart -map 0:v:0 -map 0:a:0 output.mp4

where vbr is video bitrate like 1024k(1mbps), abr is audio bitrate like 190k and HxW is video dimensions in case of resizing.

I wonder how are folks that handle video encoding process and generate their videos?

How did you pick your settings, what issues have you encountered and any tips you can share are certainly appreciated.

Quite a niche segment when it comes to operations and not being merely consumer/customer.




Video codec transcoding is very CPU resource expensive. If you do a lot of it, you should be looking into doing hardware-accelerated transcoding. https://trac.ffmpeg.org/wiki/HWAccelIntro

My ffmpeg how-to/examples/scratchfile can be viewed here: https://paste.travisflix.com/?ce12a91f222cc3d7#BQPKtw6sEs9cE...


Hardware video encoders all - even in 2024 - produce significantly worse quality at the same filesize.

They're made to be realtime, but for any kind of delayed playback where there's time to encode, software encoders win without any kind of effort. For web delivery especially, hw encoders have no business being used because quality per expended bandwidth is paramount and costs money.


IIRC both x264 and nvenc have multiple profiles for the tradeoff between quality and computing power.

For your comparison, are assuming that the objective is best quality, e.g. that you'd accept 10x the computation even if it gave only a 2% quality improvement?

(I can see how this could make sense, if you're encoding a file once and it will be viewed many times. But I could imagine other situations, e.g. where most files are viewed once or never, and only a few files are very popular.)


Having profiles doesn't really change the fact that even Ampere generation encoding block at slowest profile won't come close to visual quality at same output bitrate to x264s slow+ profiles (and we're not even touching on H.265/AV1 here).

> For your comparison, are assuming that the objective is best quality, e.g. that you'd accept 10x the computation even if it gave only a 2% quality improvement?

The difference is more like 150% encoding time for half the filesize at same SSIM - depending on configuration and video type of course. And that ignores the fact that a server machine with 64-core Threadripper or equivalent can handle parallel encoding of many more videos at massively lower dollar cost than using nvenc. Especially at current GPU prices and GPU power consumption.

There's a reason why all online services are encoding in software (usually with x264 & co.) for mainstream most used profiles (that is, SD/HD, many also for 4K).

It just doesn't make sense from product quality, user experience or financial perspective. It only makes sense if you never check the results of your production.


I agree with your statement mostly. However I think if we're dealing with 1080p displays, hardware-accelerated transcoding produces acceptable video quality for non-4k+ watching. That's just my 2 cents.


Can you explain your point there though? It won't change the fact that you could deliver the same visual quality at significantly reduced bandiwdth cost FOR YOU and for the user. You're making user experience worse (e.g. on less stable 4G/5G links and limited connections), paying more for your egress bandwidth for what? Using an expensive GPU block (with most of its cores idling) to encode slightly faster than a cheaper CPU core will do?

It just doesn't make sense to save a few minutes of encoding time (a one time operation) when that costs you and users money every single time that file is streamed.


> Using an expensive GPU block (with most of its cores idling) to encode slightly faster than a cheaper CPU core will do?

Slightly faster? Sir, I do think you are mistaken.

There are other variables go that into it such as choosing between constant or variable bitrates & variable or constant frame rate, and profiles obviously. There's a crapload of other directives that you can tweak as well.

I see what you mean though, you're right if you have CPU with lot of cores/threads/horsepower then you should use that instead of hardware-accel transcoding. I don't.


I found this recently too - encoding my video using either AMD or Nvidia hardware encoding resulted in poor quality. But what's the reason for this?


In short - on the CPU the encoders spend more time looking for similarities between frames and have access to more frames of history to find repetitions. Finding similarities is where all the compression comes from.

The hardware encoders are designed to be fast (at least realtime), small (in component count) and power efficient so they usually don't have the ability to do extensive searches for similarities in frames to really squeeze out the best compression. Pretty much all of them are also largely fixed in supported operations during encode so you can't really implement better algorithms with software updates on them.

Making them better would mean growing their (sillicon) size which would make the graphics cards more expensive to produce and take space away from actual GPU cores. Most customers don't buy a GeForce for its video encoder. And once you'd grow them enough, you'd get something similar to a CPU anyway.

And this is how we get back full circle - even enterprise "encoder-in-a-box" providers have lately gone from boxes with ASICs in them to essentially seling proprietary servers with normal CPUs and proprietary OS on them. With the current prices of 64-core+ CPUs, it just doesn't make much sense to design ASICs and HW encoding blocks for these types of encodes.

Realtime encoding is, of course, another game.


Probably because they're optimized for low-latency.

Anyway, by hardware encoders people typically mean dedicated hardware.


Nobody working on GPU algorithms to get a bit of the best of both worlds?


The major companies are such as nvidia Cuda (nvenc encoder/nvdec decoder) and VAAPI in Intel CPUs. FFmpeg has lot of good documentation on this subject.


nvenc, VAAPI and pretty much all the others use fixed encoding blocks and don't use programmable cores for video encode.


Generally when you are hosting videos you don’t pick one set of transcoding settings and stick with that. You transcode many many renditions and then serve up the appropriate one depending on the client. It’s quite difficult and expensive in terms of both processing and storage. This is why it is so difficult to unseat the incumbent platforms like YouTube. It’s also why so few people do this on their own.


While not at the same level as any large service, I typically spin off a 720p copy of any media as a background process nightly. 720p is pretty small in comparison to 1080p or 4k source files. This way I have an immediate mobile friendly version I can stream from my phone or load up on an ipad without emptying it's storage, even if the quality is markedly lower.


If it's public facing, I would think content moderation and copyright enforcement would be the bigger challenge.


Last 3 years I traveled extensively and had limited and flakey bandwidth.

You should have a low bandwidth setting that also uses new codecs.

Like 64kbit stereo opus is to my ears almost imperceptible to CD audio. I think listening tests by professionals recommend using between 64kbit to 96kbit for perfect audio.

Anything beyond is a waste unless we are talking about more than stereo.

Also if you want, you can use mpeg dash to stream video. Here you encode video into small series of chunks/files. When player can't handle high bandwidth, it can switch to lower bandwidth automatically, and vice versa. This is what YouTube and any professional places do. This will also help prevent users from easily downloading complete video. The trick is that you will need to ensure all videos are split on same key frame, so either use two pass encoding, or define that every ?3? seconds exactly is a new video file.

https://www.cloudflare.com/en-ca/learning/video/what-is-mpeg...


there are audio-only and 360p versions. in addition to 720p and 1080p.


> How did you pick your settings, what issues have you encountered and any tips you can share are certainly appreciated.

The settings are picked based on what format, resolution, bitrate, and codec I'd like. I don't think this is something you need to spend time nitpicking, admittedly. :)

You mention you're working on a site for video hosting. Have you thought about how you're going to deliver video at scale? Sending video over the wire is super expensive and your costs will probably increase faster than your revenue, unless you're charging out the gate. Cloudflare has some plans which let you deliver for nearly free, but your content needs to be fairly static.

Good luck! Don't sweat the small stuff - just keep building.


i have my own cdn.


For synchronized group watching I typically encode to h264 and aac in mp4. Video: render subtitles if available, downscale to 720p if larger, ensure yuv420p, encode with: preset slow, crf 24, and tune animation or film as appropriate. Copy audio if aac, 2 channel, and bitrate is less than 160k otherwise: force 2 channel, encode at 128k. Fast start is needed otherwise I'd use a separate program to put the "header" at the start.

As for your command line, what do you think -g 52 does? Why do you give conflicting audio channel settings?


-g 52 = at the latest every 52nd frame will be a key-frame(so little over 2 seconds at 24 fps), this influences seeking(how long an image might have artifacts before it renders properly). there are no two conflicting audio channels settings. the -ar is audio sampling rate(data points per second) whereas -maxrate:a is bitrate(actual bits of data per second). the yuv is good point, i guess i is a safe-guard for exotic input. also, copying the audio might be worth it, although then i would end up with mixed output files where one might be original aac and another might be opus. so i think i prefer the uniform output over some saving in processing.


Okay there is some thought put into your choice of keyframe interval.

And there is a conflict. You allow the aformat filter to negotiate for 5.1, 7.1, or stereo but then insist there must be 2 audio channels.

I copy the audio when it is suitable just to save another lossy encode.



Ensuring yuv410p is the real important addition of this post. If you don't do that, depending on the user input you will end up with a mp4 that won't play.

AAC is more standard for mp4 video than opus, although I agree opus is the superior CODEC. Whether the benefits of opus outweigh the downsides of it being non-standard in your usecase is not mine to decide, but if you are looking to produce a thing that is similar to most other web video things out there I'd go with AAC. The saved bandwith is probably miniscule in comparison to what you could save on the video side.


Seconding this: adding "-pix_fmt yuv420p" to your conversion will ensure compatibility, for example if you try attaching a converted MP4 to a WhatsApp chat, it will not show a preview or play unless you include this flag.


opus has no issues with support, why go with older aac then?


If you think opus has no issues you must not have tested many devices.

If your subset handles it, great, but don't be dismissive.


like nokia 3110 from 1998? :D


Like Rokus from 2019.


Many users of youtube/Vimeo are used to supplying their files like that. Most GUI tools won't have Opus. So if you care about your users delivering already encoded files AAC might be the wiser choice.

This depends on your target audience however


Your choice of codecs is odd. If you decided you need h.264 for compatibility, that also means you need AAC and burned-in subtitles/captions or you lose all of h.264's broad support to missing audio and text support.

If you can restrict support to just current Android, Chrome, and Firefox, you can use VP9, Opus, and SRT. Willfully-outdated platforms like Apple and Roku have screwed over everyone.


h264 is video codec and has nothing to do with audio codec. aac is old proprietary codec while opus is modern license-free codec with wide support anywhere. certainly these days. as for vp9.. i have already mentioned my reasoning.


There's one thing I learned from asking questions like this. You never get a satisfying answer, and you ultimately have to just pick something. This is not about the quality of answers, this is just how the world works. No definitive solutions, just compromises


i was mostly interested about lived-through experiences of people who also host videos. for example i encountered a case where audio file had a video track in it. it baffled me until i found it was the album cover/image as motion-jpeg. so i had to adjust the input validation. or that my own generated mp3 audio file from a video file was not passing a 90% ffprobe reliability threshold so again i had to alter validation... and things like that. like people mentioned that yuv420 profile being important. i also heard about malicious code being in video files or containers in general which can cause some issues. though i do not know much about it.


Go on a specialized video forum and not just ask on HN. Example: https://www.video-dev.org


thanks. this will be useful.


If you're building a product, maybe use a CDN during MVP phase while you validate?

Don't think your end customers will care who's serving the video?


if you mean the ones that do the encoding for you, well, those are not financially viable options. i have my own cdn network anyway.


If you think encoding it all yourself is going to end up cheaper, maybe you havent considered the full cost of:

building maintaining hosting

this piece of work?


If you want browser-friendly video, look into HLS, simply MP4 with faststart is not enough. Plenty of ffmpeg snippets around.

Care to share what the site is?


this is not for live streaming, hls/mpeg-dash makes no sense over faststarted mp4 container.

website is not ready to be public yet. but this year.


You are confusing things. It is not about live streaming, it is about supplying a video with smooth playback, adaptive quality, etc., to the browser. The video would be already made and converted to HLS with ffmpeg. Not sure why am I thanklessly wasting time arguing with you in order to give useful advice against your will, of course.


What's the target usage? Is it to allow user to download the converted file? Or for streaming?


i do not need to protect the files by chunking them via mpeg-dash/hls or slap some drm over the data, users can download them. this is not for live streaming, obviously.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: