Basically, you create a video, dump the video's frames with ffmpeg, run each frame through the RNN, and stitch them back together. It took me several hours to produce just ten seconds of video. Unfortunately, unless you have a Titan X GPU the max size of each image is quite small (certainly less than 1080p), which may be why the frames in this video are split into four quadrants.
Nice result! About 5 years ago, I did the same technique of frame dump->process->join but with a custom algorithm instead of a NN for the process stage. A more manual effort to tweak the filter, but it was fun and came out nice.
I got a new GPU recently and have been doing mostly text RNN these days, but it's fun to look back on this occasionally:
My iMac has 2GB of GPU memory because it’s for home-use.
It’s insufficient size for larger image output.
It can output a little bit larger result when its display resolution set down to 640px x 480px.
It still needed to split images into 4 parts for 720p high quality video.
It caused a side effect which each frame has a border at the joint of the parts."
You seem to be correct on why the frame is split into four quadrants.
http://i.imgur.com/rb0GJvQ.gifv
Basically, you create a video, dump the video's frames with ffmpeg, run each frame through the RNN, and stitch them back together. It took me several hours to produce just ten seconds of video. Unfortunately, unless you have a Titan X GPU the max size of each image is quite small (certainly less than 1080p), which may be why the frames in this video are split into four quadrants.