A large array of uniquely-set floating point values. (AKA "parameters".)
In a language model, a word is put in one end (as a numerical index to a wordlist), and then it and the weights multiplied together, and then a new word comes out (again as an index).
Numbers in, numbers out, and a small bit of logic that maps words to numbers and back at either end. ("Encodings".)
"Training" is the typically expensive process of feeding huge amounts of data into the model, to get it to choose the magic values for its weights that allow it to do useful stuff that looks and feels like that training data.
Something else that can be done with weights is they can be "fine-tuned", or "tweaked" slightly to give different overall results out of the model, therefore tailored to some new use-case. Often the model gets a new name after.
In this case, what's been released is not actually the weights. It's a set of these tweaks ("deltas"), which are intended to be added to Meta's LLaMA model weights to end up with the final intended LLaMA-based model, called "Vicuna".
Essentially a computer neural network is just a lot of addition (and matrix multiplication) of floating point numbers. The parameters are the "strength" or "weights" of the connections between neurons on different layers and the "bias" of each neuron. If neuron Alice is connected to neuron Bob and Alice has a value of 0.7, and the weight of Alice's connection to bob is 0.5, then the value sent from Alice to Bob is 0.35. This value (and the values from all the other incoming connections) are summed at added to the neuron's negative bias.
I highly recommend checking out 3blue1brown series on how neural nets, gradient descent, and the dot product (implemented as a matrix multiplication) all tie together: https://www.youtube.com/watch?v=aircAruvnKk
To add to this excellent reply, I'll also point out that the reason folks want the weights is that they are the result of a massive search operation, akin to finding the right temperature to bake a cake from all possible floats. It takes a lot of wall clock time, and a lot of GPU energy, and a lot of input examples and counter-examples to find the "right" numbers. Thus, it really is better -- all things being equal -- to publish the results of that search to keep everyone else from having to repeat the search for themselves
> a massive search operation, akin to finding the right temperature to bake a cake from all possible floats
...for each of 13 billion (for a model with that many parameters) different cakes, except that they aren’t like cakes because the “best" temperature for each depends on the actual temperatures chosen for the others.
My lay-person's understanding is that it's due to the problem one is trying to solve with a deep learning model: draw a curve through the dimensions which separates "good" from "bad" activation values. The lower resolution the line, the higher likelihood the line will fit sometimes and veer off into erroneous space others
They basically encapsulate what a model has "learned." ML models without their weights are useless because the output is essentially random noise. You then train the model on data, and it changes the weights into numbers that cause the whole thing to work. Training data and processing power are usually very expensive so the resulting weights are valuable.