Books3 dataset
700B text-image pairs from Laion-2B-en, filtered to only keep images with at least 256 resolution
400M text-image pairs from COYO-700M, filtered to only keep images with at least 256 resolution
10M text-video pairs from WebVid10M
3M text-video pairs from a subset of InternVid10M
73K text-video chat pairs from Valley-Instruct-73K
100K text-video chat pairs from Video-ChatGPT
wouldn’t that have been a cleaner explanation than the sentence provided? books and videos, see model card. the redundant language is a smell whether the emitter wishes to acknowledge or not. the point still stands, the hot mess of a sentence didn’t need to be that way.
> …so petty and pedantic…
if nothing else, think of the language models that need to digest this. sure you can send in gobbledygook and get out plausibly sense, but why?
llms will push pedantry to the forefront. or suffer from it. who knows. have fun.
You've never decided to rewrite a sentence and forgot to check the entire sentence again after an incomplete refactoring? I'd say you're in the minority. This is a v1 draft on Arxiv. I don't expect the final paper to have that sentence.
Books3 dataset 700B text-image pairs from Laion-2B-en, filtered to only keep images with at least 256 resolution 400M text-image pairs from COYO-700M, filtered to only keep images with at least 256 resolution 10M text-video pairs from WebVid10M 3M text-video pairs from a subset of InternVid10M 73K text-video chat pairs from Valley-Instruct-73K 100K text-video chat pairs from Video-ChatGPT