Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
mmoskal
3 months ago
|
parent
|
context
|
favorite
| on:
Qwen3: Think deeper, act faster
Spec decoding only depends on the tokenizer used. It's transfering either the draft token sequence or at most draft logits to the main model.
jasonjmcghee
3 months ago
|
next
[–]
Could be an lm studio thing, but the qwen3-0.6B model works as a draft model for the qwen3-32B and qwen3-30B-A3B but not the qwen3-235B-A22B model
jasonjmcghee
3 months ago
|
prev
[–]
I suppose that makes sense, for some reason I was under the impression that the models need to be aligned / have the same tuning or they'd have different probability distributions and would reject the draft model really often.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: