Implementing the Goodfellow GANs paper

3abiton · 2024-06-05T21:59:35 1717624775

This is a blast from the past, I still remember the StyleGAN demos and how cool it was for its time. https://www.youtube.com/watch?v=Ps7bmdxy0Xc

Two_hands · 2024-06-06T07:58:32 1717660712

Right, even though the paper is almost 10 years old I still found it fascinating. I hope you enjoyed the post!

toxik · 2024-06-06T09:22:55 1717665775

        # shuffle the combined batch to prevent the model from learning order
        indices = torch.randperm(combined_images.size(0))
        combined_images = combined_images[indices]
        combined_labels = combined_labels[indices]

You don’t need to do this

Two_hands · 2024-06-06T09:29:46 1717666186

Is it better to train without the shuffling or shuffling has negligible effects?

Doxin · 2024-06-06T12:39:35 1717677575

I'd assume there's no real state the network can "remember" between iterations, so shuffling will at best just waste time.

Two_hands · 2024-06-06T12:49:08 1717678148

My thoughts had been related to the ordering, but it makes sense that it doesn’t matter. I have read that it is actually better to train the model in separate batches with generated and real images in their own batches before the gradient step.

countvonbalzac · 2024-06-05T22:47:08 1717627628

Are GANs useful for synthetic data generation for transformer based models?

rgovostes · 2024-06-05T23:46:06 1717631166

Probably. Apple published a paper back in 2017 about improving synthetic data for the purposes of training models (though not transformers).

The examples they give are for eye and hand tracking -- which not coincidentally are used for navigating the Apple Vision Pro user interface.

https://machinelearning.apple.com/research/gan

Two_hands · 2024-06-06T08:08:23 1717661303

It'd be cool to run some tests where you train a model with data and then supplement the training data with generated stuff.

HanClinto · 2024-06-06T18:26:25 1717698385

Yes, the concept is still powerful and in use today.

As I understand the RLHF method of training LLMs, this involves the creation of an internal "reward model" which is a secondary model that is trained to try to predict the score of an arbitrary generation. This feels very analogous to the "discriminator" half of a GAN, because they both critique the generation created by the other half of the network, and this score is fed back in to train the primary network through positive and negative rewards.

I'm sure it's an oversimplification, but RLHF feels like GANs applied to the newest generation of LLMs -- but I rarely hear people talk about it in these terms.

Two_hands · 2024-06-06T07:57:05 1717660625

I think diffusion models are useful too, I’m currently working on a project to use them to generate medical type data. It seems they'd both be useful as they are both targeted towards generation of data, especially in areas where data is hard to come by. Doing this blog made me wonder of the application in finance too.

HanClinto · 2024-06-06T18:27:29 1717698449

I agree -- I would love to see diffusion models applied to more types of data. I would love to see more experiments done with text generation using a diffusion model, because it would have an easier time looking at the "whole text" rather than the myopia that can occur from simple next-token prediction.

GaggiX · 2024-06-06T09:57:16 1717667836

Adversarial loss is used in many cases like when training a VAE, and a VAE can use a transformer architecture.

eru · 2024-06-06T08:46:27 1717663587

Compare https://gwern.net/gan

HanClinto · 2024-06-06T18:28:23 1717698503

Great writeup, thank you! Nicely done!

Two_hands · 2024-06-07T00:42:02 1717720922

Thank you, I appreciate the kind comments!

nothrowaways · 2024-06-05T23:24:15 1717629855

Two_hands · 2024-06-06T07:58:41 1717660721

Thank you