Typically, multilingual capabilities consume 20-30% of model parameters in small...

		ethan_smith 65 days ago \| parent \| context \| favorite \| on: Smollm3: Smol, multilingual, long-context reasoner... Typically, multilingual capabilities consume 20-30% of model parameters in small LLMs, primarily in token embeddings and early transformer layers. Monolingual variants of similar models often perform better on English benchmarks with the same parameter count.