> Given both the competitive landscape and the safety implications of large-scale models like GPT-4, this report contains no further details about the architecture (including model size), hardware, training compute, dataset construction, training method, or similar.
My guess is they used Chinchilla scaling rules and the parameter count for GPT-4 is either barely larger or maybe even smaller than GPT-3. Look as what Meta was able to accomplish with llama using much less parameters.
My guess is they used Chinchilla scaling rules and the parameter count for GPT-4 is either barely larger or maybe even smaller than GPT-3. Look as what Meta was able to accomplish with llama using much less parameters.