Large-Scale Artificial Intelligence Open Network

Stevvo · on April 24, 2022

Projects like this are inevitable and necessary; 'OpenAI' make such a mockery of their name that it's an invitation to others to try and build an alternative that is actually open.

visarga · on April 25, 2022

Maybe that's their secret game - tease us to death, to make us follow along the path and contribute. The waiting period for Dall-E 2 is hard to bear, maybe we'll invent something better before they release it.

I mean, diffusion models became the best generative image model in this fashion after Dall-E 1 and now Dall-E 2 is already adopting the idea in favor of auto-regressive image generation.

In the last few years OpenAI took the AI crown with GPT-2, 3 and CLIP and Dall-E. They rarely shared, but they kick-started everyone else into replication mode.

mdp2021 · on April 25, 2022

Yes, but there's an issue with the name, with that 'Open' bludgeoning.

Affronts against "language", and semantics in the broadest, are, in general, getting Orwellian, and possibly sarcastic.

Yesterday I found myself in front of a website with a front layer: "If you want to visit this website anonymously, please register".

visarga · on April 25, 2022

There is a recent Yannic Kilcher interview about LAION.

> LAION-5B: 5 billion image-text-pairs dataset (with the authors)

https://www.youtube.com/watch?v=AIOE1l1W0Tw

A nice recent result (DeepMind) is that you can either make the dataset 4x larger or the network 4x larger to get the same result. So a large dataset could create a more efficient/smaller model and in turn it could be easier to distribute and use.

https://www.deepmind.com/publications/an-empirical-analysis-...

hcks · on April 25, 2022

Their marketing is so bad. Terrible website, they present themselves first by opposing OpenAI, they name their datasets the way established orgs name their models. Their only project is a non-curated filtering of already open source data using CLIP (they just looped over it and dropped the image-text pairs with cosine similarity below 0.3).

rom1504 · on April 25, 2022

Let me start by saying that laion is a non profit, open to anyone that want to contribute.

Agreed about the website css. Do you want to contribute?

What's the problem with the dataset name exactly? Seems to work pretty well.

Yes the dataset is an extract of common crawl, this is an accessible to all method to produce valuable dataset. This is unlike supervised dataset which are reserved to organization with millions of dollars to spend on annotation and do not scale.

Non annotated datasets are the base of self supervised learning, which is the future of machine learning. Image/text with no human label is a feature, not a bug. We provide safety tags for safety concerns and watermark tags to improve generations.

It also so happens that this dataset collection method has been proven by using laion400m to reproduce clip model. (And by a bunch of other models trained on it)

pabs3 · on April 25, 2022

Hmm, some portions of their data sets seem to be under non-commercial licenses.

aperrien · on April 25, 2022

Does this organization work with the EleutherAI team?

rom1504 · on April 25, 2022

Yeah we have regular contact with many EAI members. EleutherAI is pretty great at open research.