I don’t see this in the article. Has Anthropic explained the mechanism by which ...

sanxiyn · on May 12, 2023

No. As far as I know, they haven't said anything about this. Neither did OpenAI about gpt-4-32k.

MosaicML did say something about MPT-7B-StoryWriter-65k+: https://www.mosaicml.com/blog/mpt-7b. They are using ALiBi (Attention with Linear Biases): https://arxiv.org/abs/2108.12409.

I think OpenAI and Anthropic are using ALiBi or their own proprietary advances. Both seem possible.

maxutility · on May 12, 2023

Interesting. Does the decision to use ALiBi have to be done before the model weights are first trained, or is there a way that these models could have incorporated ALiBi instead or in addition to an alternate positional encoding method to ALiBi after they were first trained?

sanxiyn · on May 12, 2023

The decision needs to be made before starting training. Maybe there is a clever way to add it after the fact in the style of LoRA? First, that would be a different method in its own right (just as LoRA is), second, I can't see how to do so easily. But then I just thought about it for a minute.

swyx · on May 12, 2023

a lot of people are speculating online (https://twitter.com/search?q=anthropicai%20alibi&src=typed_q...) but i'm guessing it's ALiBi, which was also used by MPT-7B to get up to 85k long context

ukuina · on May 12, 2023

No, they are playing this close to the chest, similar to how OpenAI achieved 32k context limit.