exist generative text unfortunately with the current recognition of its creation which uses the chatgpt api which can confirm the media has weirdly hyped the upcoming surge of ai generated content its hard to keep things similar results without any chatgpt to do much better signalto-noise.
ChatGPT needs a language model and a selection model. The language model is a predictive model that given a state generates tokens. For chatGPT it's a decoder model (meaning auto-regressive / causal transformer). The state for the language model is the fixed length window.
For a Markov chain, you need to define what "state" means. In the simplest case you have a unigram where each next token is completely independent of all previously seen tokens. You can have a bi-gram model, where the next state is dependent on the last token, or an n-gram model that uses the last N-1 tokens.
The problem with creating a markov chain with n-token state is that it simply doesn't generalize at all.
The chain may be missing states and can't produce a probability distribution. e.g. since we use a fixed window for the state, our training data can have a state like "AA" that transitions to B, thus the sentence is "AAB". The model however may keep producing stuff, thus we need to get the new state, which is "AB". If "AB" is out of the dataset, well... tough luck, you need to improvise on how to deal with this. Approaches exist but nowhere near as good of a performance as a basic RNN let alone LSTMs and transformers.
As a mathematical model, it's almost completely unhelpful, like saying that all computers are technically state machines because they have a finite amount of memory.
Treating every combination of 4k tokens as a separate state with independent probabilities is useless for making probability estimates.
Better to say that it's a stateless function that computes probabilities for the next token and leave Markov out of it.
ChatGPT and Markov Chain are both text-generating models, but they use different approaches and technologies. Markov Chain generates text based on probabilities of word sequences in a given text corpus, while ChatGPT is a neural network-based model.
Compared to Markov Chain, ChatGPT is more advanced and capable of producing more coherent and contextually relevant text. It has a better understanding of language structure, grammar, and meaning, and can generate longer and more complex texts.
RLHF uses Markov chains as its backbone, at least theoretically (deep NN function approximations inside might override any theoretical Markov chain effect though).
It's not a Markov chain because by definition a Markov chain only looks at the previous word. ChatGPT looks at a long sequence of previous words. But the general idea is still broadly the same.
That's not correct. In a Markov chain, the current state is a sufficient characteristic of the future. For all intents and purposes you can create a state with sufficiently long history to look at a long sequence of words.
Also fair, but then the "current" state would also be a long window/sequence. Maybe that interpretation is valid if you look at the activations inside the network, but I wouldn't know about that.
Yes, the state for both is a long window / sequence. Under this view, for the transformer we do not need to compute anything for the previous tokens as due to the causal nature of the model, the tokens at [0, ... N-1] are oblivious to the token N. For token N we can use the previous computations since they do not change.
exist generative text unfortunately with the current recognition of its creation which uses the chatgpt api which can confirm the media has weirdly hyped the upcoming surge of ai generated content its hard to keep things similar results without any chatgpt to do much better signalto-noise.
--
Is ChatGPT just an improved Markov Chain?