Can you elaborate what it means to remove the chunks of silence? Aren't there valid cases where there's no sound but you're actually showing/doing something on the video.
Yep there are! This is a harder problem to solve but I have plans to handle it down the road.
For now, I’m targeting it at folks who make videos in the “egghead style” [0] - short, tightly-edited code screencasts where there’s very little dead air.
i'd think it wouldn't be too hard to use something like opencv to detect frames that don't have any/much change between them and then correlate that with the audio detection to figure out what can be safely culled like in that fashion.