GLM-5.1 (Fully Tested): THE BEST OPEN / AGENTIC MODEL IS HERE! This is CRAZY! Summary — AICodeKing

Summary

The video presents an early access review of GLM 5.1, a post-training update of the GLM5 model that maintains the same parameters but focuses heavily on improving long-running and agentic tasks. The speaker notes that while GLM 4.7 struggled with long-running tasks, both GLM5 and the new 5.1 version excel in this area, with 5.1 showing even better performance. However, the model has developed a problematic tendency to use code unnecessarily in regular conversations, often creating HTML files or code blocks even for simple questions like riddles, making it less pleasant for general chat purposes. The speaker attributes this behavior to increased training on code data and reinforcement learning for coding tasks. On the positive side, GLM 5.1 shows remarkable improvement in agentic applications, demonstrating excellent instruction following, debugging capabilities, and focus without deviation from objectives. Unlike GLM5, which would sometimes overdo reasoning and slow down simple tasks, the new version is more efficient and snappy. The model also shows better planning abilities and context understanding. In benchmark tests, GLM 5.1 performs exceptionally well on coding tasks like floor plans, SVG generation, 3D graphics, games, and various applications, but struggles with general math and chat questions. For agentic tasks specifically, the model ranks second on agentic leaderboards and is compared favorably to Opus 4.6 and CodeX, with the speaker considering switching to this model due to its performance relative to its low cost.

Key Insights

GLM 5.1 has been trained more heavily on code which causes it to unnecessarily create HTML files and code blocks even for simple questions like riddles, making regular chat experiences less pleasant

Unlike GLM5 which would do excessive reasoning that slowed down simple tasks, GLM 5.1 has been optimized to not over-reason where unnecessary, making it feel much snappier

GLM 5.1 ranks second position on agentic leaderboards despite being an open model, performing comparably to Opus 4.6 and better than CodeX while being significantly cheaper

GLM-5.1 (Fully Tested): THE BEST OPEN / AGENTIC MODEL IS HERE! This is CRAZY!

Summary

Key Insights

Topics

Get AI summaries delivered to your inbox