QWEN CHAT GitHub Hugging Face ModelScope Kaggle DEMO DISCORD
Introduction Today, we are excited to announce the release of Qwen3, the latest addition to the Qwen family of large language models. Our flagship model, Qwen3-235B-A22B, achieves competitive results in benchmark evaluations of coding, math, general capabilities, etc., when compared to other top-tier models such as DeepSeek-R1, o1, o3-mini, Grok-3, and Gemini-2.5-Pro. Additionally, the small MoE model, Qwen3-30B-A3B, outcompetes QwQ-32B with 10 times of activated parameters, and even a tiny model like Qwen3-4B can rival the performance of Qwen2.
Context length is disappointing, but the fact that it trades blows with R1 despite being 30B MoE is insane. I’ll wait and see if real-world performance matches up to benchmarks, but it sounds like a big deal.
Context length is disappointing, but the fact that it trades blows with R1 despite being 30B MoE is insane. I’ll wait and see if real-world performance matches up to benchmarks, but it sounds like a big deal.
Some kind of presentation talks about longer context: https://www.reddit.com/media?url=https%3A%2F%2Fi.redd.it%2F1nos591czhxe1.jpeg
Maybe its a work in progress, with Qwen 2.5 14B 1M (really 256K in that case) being the first test?