Hacker News new | past | comments | ask | show | jobs | submit login
Vdx – An intuitive CLI for processing video, powered by FFmpeg (github.com/yuanqing)
152 points by cheeaun on Oct 24, 2020 | hide | past | favorite | 51 comments



I anticipate some much more savvy than myself saying this wrapper is unnecessary as you can do all of this “easily” with ffmpeg. I welcome tools like this that have a more common user in mind, evidenced immediately in the beginning of the README with common use cases.

Ffmpegs documentation on the other hand forces me to feel as though it’s normal to assume knowledge about a point in a video and what it means if my encoders are different from input to output.


Even if what the abstraction is doing is not really the best way of doing things?

For example not the best - re-encoding video or audio tracks when it could just copy them, picking the defaults when their is a better quality option?

Video and audio processing is one of things that if you want good results with wide flexibility you have to learn it and also try things out.


I consider myself pretty savvy and have used ffmpeg many times over the years. I struggle through it like hell every time. I'd gladly use something like this.


> Ffmpegs documentation

    ffplay --help
8826 lines


That's ffplay.

  $ man ffmpeg-all 2> /dev/null | wc -l
  33160


why are you redirecting stderr? does `man ffmpeg` produce error output on your system?

anyway, on my system (macOS, ffmpeg 4.3.1 from Homebrew) the situation is even more dire:

    $ man ffmpeg-all | wc -l
    38738


On my system (Kubuntu 20.04) man/troff produces errors when its output is redirected:

  $ man ffmpeg-all | wc -l
  troff: <standard input>:6148: warning [p 42, 3.5i, div 'an-div', 0.0i]: can't break line
  troff: <standard input>:6150: warning [p 42, 3.7i, div 'an-div', 0.0i]: can't break line
  troff: <standard input>:11396: warning [p 68, 8.7i]: can't break line
  troff: <standard input>:11398: warning [p 68, 9.2i]: can't break line
  troff: <standard input>:13562: warning [p 79, 6.0i]: can't break line
  troff: <standard input>:13564: warning [p 79, 6.5i]: can't break line
  troff: <standard input>:26366: warning [p 132, 17.2i]: can't break line
  troff: <standard input>:38313: warning [p 174, 27.5i]: can't break line
  33160
No idea what it means.


oh sweet summer child


There are a number of issues with this program. First, there are a large number of bugs and general implementation issues. For example, "vdx --help" does not print help output, but instead raises "Need a glob pattern for input files". Invalid options are ignored, and debug output is non-existent: "vdx myfile.mp4 --asdf garbage" results in "error build/myfile.garbage" with no other explanation. It turns out that it is interpreted as "vdx myfile.mp4 -f garbage". The list goes on and on.

More fundamentally, however, FFmpeg supports both simple and complex operations. This program takes simple operations and does them wrong, and takes complex operations and throws them out. For example, the command to halve the audio volume of an mp4 file containing audio and video would be:

  ffmpeg -i infile.mp4 -c:v copy -af volume=.5 outfile.mp4
There are a number of problems with this command. Most importantly, it always encodes audio output with the built-in AAC encoder at 128 kbps. A better command would take into account the input quality and desired output quality and codec, then specify something like "-c:a libopus -b:a 80k" to select the desired result. vdx solves this complexity by simply ignoring it and using the FFmpeg default. A similar problem applies to its video codec settings, which always uses libx264 with the default settings. This is a poor choice. Almost always, either a slow setting (for better quality and smaller file size) or a fast setting (for worse quality and larger file size) is desired. Also, in my experience, either a lower CRF (for high-quality film content) or a higher CRF (for low-quality phone/ripped content) is desired. If you're going to punt on doing it properly and just use the defaults, you might as well just use ffmpeg directly.

The issues go far deeper than that, however. The entire design of the program is flawed. It appears to try to process the file twice: once for video, and once for audio. However, the author seems not to know that ffmpeg will re-encode video by default. Therefore, a simple command like "vdx myfile.mp4 --volume 0.5" unnecessarily re-encodes the entire video, twice. It applies operations in what seems to be undefined order, which is a problem when (for example) cropping and also resizing a video.


I am wondering whether you can put this as an issue into github so others can see? I would hate to see other projects start using this.

That being said I need to dig into this library some more to review.


I use FFmpeg for many years, and I did wonder many times why isn't there a CLI that would be simpler to use but still useful for most tasks.

To my knowledge, no project succeeded in doing that. It's not difficult to code a simple reader and a basic encoder, but then you have many cases not supported correctly, as you wrote.


IMO, the main problem with ffmpeg is that the official documentation is overly intimidating to newcomers. While the command syntax isn't great, it's not terrible either. Probably one of the contributing factors is overcomplicated poorly written internet tutorials copying out-of-date technical memes, like how people complain about tar -zxcvbnwhatever despite tar -xf file covering 99.9% of decompression cases.


What do you expect from a programmer who chooses JavaScript as the language of a cli tool?


It's sad that Basic isn't so popular anymore and people are forced to use JavaScript to workout their inner Dijkstra.

on edit: somehow I posted this twice. but deleted one.


This is great! I'd love to see more wrappers like these for ffmpeg. Despite 5 years of using it, I can rarely run any ffmpeg commands without looking at StackOverflow. I wrote a similarly-inspired wrapper because of this as well, with a different subset of features, some of which rely on moviepy: https://github.com/achalddave/vid

Would be great if vdx learned more "common" ffmpeg features (creating a slideshow from a set of images, speeding up videos, simple drawing, etc.) while maintaining its simplicity!


There's the rub isn't it. There's a reason ffmpeg is so hard to use: it does so much. It does so much because efficient processing of media often requires you to do more than one thing in a single pass, or end up taking wayyy more resources than needed.

I'm still all for simpler frontends being available but there's going to be a limit of how many things can be covered before it is just as hard to use as ffmpeg.


There's Handbrake-CLI which is easy to use, but very powerful nonetheless https://handbrake.fr/docs/en/latest/cli/command-line-referen...


Does handbrake use ffmpeg?


Both ffmpeg and handbrake are front-ends for libavcodec.


The FFmpeg project has 8 libraries. libavcodec only deals with decoding/encoding and bitstream parsing & filtering. Demuxing/muxing, frame filtering, scaling.. are all handled by other separate libraries.


Handbrakes own docs say "HandBrake uses FFmpeg under the hood".

https://handbrake.fr/docs/en/1.3.0/technical/source-formats....


Yes, it does.


I'm wondering why ffmpeg is more popular than other media libraries like gstreamer.

I'm currently learning gstreamer and it seems to cover more features than ffmpeg. Its CLI command, gst-launch is fantastic to prototype. Is there a specific reasons why more people choose ffmpeg over gstreamer for go-to media processing library?


My experience with gstreamer consists almost entirely of battling with the system's package manager trying to install whichever codec packs I need and then convincing a gstreamer-powered media player to recognize said plugins and actually play my media. This process ended in frustration more often than it should've.

By contrast ffmpeg-powered projects tend to just work out of the box with practically any set of files imaginable… much more pleasant.


Wow this happened to me literally today.


It's what I know and it's never let me down. It powers other software I like, particularly mpv. I have very little experience with gstreamer and don't presently see a need to learn it when ffmpeg already has my needs filled.


Can you give a couple of examples of non-trivial tasks that you can do in gstreamer but not ffmpeg?



When gstreamer first came out it was hilariously buggy. Constant crashes. Maybe it's improved but I bet that put a lot of people off.

Plus does gstreamer even have a CLI targeted at users rather than developers? If they do they don't advertise it every well!


Gstreamer is great for multimedia pipelines and more complex tasks. But ffmepg is better for simpler tasks.


Can gstreamer apply LUT as a filter?


These are, in majority, ffmpeg abstractions. WHY in tarnation would you need Node.js as a basic dependency?


I love the concept. But seriously, Node.js for an FFMPEG wrapper? Really?


Maybe it was made to scratch an itch and the developer is most comfortable in JS. Why judge someone else who made something and shared it without asking for anything in return?


Why judge someone who sh*t in the middle of a park without asking for anything in return?


This is a bs analogy.

You can just ignore this if you don't care for how it's built. This kind of comment is just gatekeeping and makes people less likely to listen to you.

If you think there's a better way and you want people to get there, give constructive feedback. That's the only valid option.


Given that it’s a wrapper to make ffmpeg more user friendly, I don’t see the problem.

npm has a lot of tools that make creating CLI tools a breeze. For example there’s Listr and the more generic Ink (“React for CLI”)


Here are the node dependencies. Am I ignorant to assume these make CLI interaction more developer friendly?

    "execa": "^4.0.3",
    "glob": "^7.1.6",
    "mkdirp": "^1.0.4",
    "moment": "^2.29.1",
    "nopt": "^5.0.0",
    "npmlog": "^4.1.2",
    "nyc": "^15.1.0",
    "p-all": "^3.0.0",
    "rimraf": "^3.0.2",
    "tape": "^5.0.1",
    "unique-slug": "^2.0.2",
    "which": "^2.0.2"
Sorry for any basterdized formatting. OM



hehehe this comment made me chuckle. Thanks for being kind and sharing. It’s just that’s the most obvious one


If you’re okay to run things in the cloud, https://transloadit.com made an abstraction over FFmpeg that chunks files up, and parallely encodes them on many machines to speed things up (“turbo mode”). 5GB free each month. Disclosure: I’m a founder :)


How did you handle chunks with audio? Every time I try that the audio and video desync.


That was a hard part yes, an FFmpeg core dev works for us and figured it out. I don’t know exactly how but I can look it up and blog about it.


I suspect it's a timestamp flag or using something developed for HLS. I'd definitely appreciate the write up!


Slightly off topic, but are there easy-to-use GUI wrappers for FFmpeg you would recommend? I find myself using FFmpeg about five times a year, and each time I had to google what I wanted to do. I feel a good GUI application will be extremely helpful.



ff-works is pretty fantastic in my experience. A decent GUI with tons of the options exposed plus a place to add additional terminal commands if needed. But as simple as permute if you don’t wanna dig.


Definitely Handbrake!


This is a great idea. FFMPEG is as arcane as it is amazing. This should have been written in a compiled language, however.


I wonder if the author would consider something like pkg[0] to make their node program into a single executable

[0]: https://github.com/vercel/pkg


can i process each frame of the video as a picture (so i can filter the frames by whether they have a particular logo in them, and make a video of just those frames)




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: