New AI Models Released on OpenRouter (June 27 - July 11, 2025)
AI models launched in last 2 weeks with their strengths and weaknesses based on benchmarks, user feedback, and expert analyses
Here are some new AI models added to OpenRouter in last 2 weeks. For each, I've summarized strengths and weaknesses based on benchmarks, user feedback, and expert analyses. Note that some models are specialized (e.g., for coding), so their performance varies by use case.
1. Grok 4 (Added July 10, 2025)
- Description: xAI's flagship model with a 256k context window, supporting parallel tool-calling, structured outputs, and image processing.
- Strengths:
- Excels in reasoning, coding, and benchmarks (e.g., outperforms competitors on Humanity's Last Exam)
- Strong multimodal capabilities for real-world applications
- Specialized for developer workflows with real-time knowledge integration.
- Weaknesses:
- Slower inference speed
- Partially "blind" in image understanding/generation
- Real-world application challenges beyond academic benchmarks.
2. Mistral Devstral Small 2507 (Added July 10, 2025)
- Description: A 24B-parameter model optimized for coding agents and software engineering, supporting Mistral function-calling and XML formats.
- Strengths:
- Top performer on coding benchmarks like SWE-Bench Verified (53.6%)
- Excels at tool usage for codebase exploration and multi-file editing
- Cost-efficient for agentic tasks
- Adaptable for a wide range of programming workflows
- Outperforms larger models in specialized scenarios
- Weaknesses:
- Highly specialized for coding, so may underperform in general non-programming tasks
- Potential vulnerabilities in agent-based security (e.g., more lenient safeguards leading to exploits)
- While efficient, it inherits common LLM weaknesses in complex programming logic understanding.
3. Mistral Devstral Medium 2507 (Added July 10, 2025)
- Description: A larger variant (size not fully specified but implied >24B) of Devstral, focused on enhanced coding performance with similar tool support.
- Strengths:
- Even higher coding benchmark scores (e.g., 61.6% on SWE-Bench Verified)
- Improved efficiency for advanced software engineering agents
- Versatile tool integration
- Game-changer for open-source coding tasks due to its balance of performance and cost.
- Weaknesses:
- Similar to the Small version—coding-focused, so limited in broader applications
4. DeepSeek-TNG R1T2-Chimera (Added ~July 3-8, 2025)
- Description: A 671B-parameter mixture-of-experts (MoE) model, successor to R1T Chimera, focused on text generation and reasoning; available for free on OpenRouter.
- Strengths:
- 200% faster than previous versions like R1-0528
- Strong reasoning capabilities
- Trending for high performance in text tasks
- Efficient MoE architecture for large-scale generation
- Weaknesses:
- Not as intelligent as top models like R1-0528 in some benchmarks
- Potential inconsistencies in smarter tasks compared to predecessors
Comments ()