BestClawModels
Practitioner-rated LLM rankings for AI agent builders. Real-world performance data from production deployments — not lab benchmarks.
Leaderboard
Showing 19 models
| Model | Provider | Score | Dimensions |
|---|---|---|---|
| Claude 3.5 Sonnet | Anthropic | 9/10 | TC CE LA AR CQ |
| GPT-4o | OpenAI | 8.5/10 | TC CE LA AR CQ |
| Gemini 1.5 Pro | 8.4/10 | TC CE LA AR CQ | |
| Claude 3 Opus | Anthropic | 8.3/10 | TC CE LA AR CQ |
| Gemini 2.0 Flash Thinking | 8.2/10 | TC CE LA AR CQ | |
| Llama 3.1 405B Instruct | Meta | 8.1/10 | TC CE LA AR CQ |
| GPT-4 Turbo | OpenAI | 8/10 | TC CE LA AR CQ |
| DeepSeek V3 | DeepSeek | 7.9/10 | TC CE LA AR CQ |
| Claude 3.5 Haiku | Anthropic | 7.8/10 | TC CE LA AR CQ |
| Mixtral 8x22B | Mistral AI | 7.8/10 | TC CE LA AR CQ |
| Mistral Large | Mistral AI | 7.7/10 | TC CE LA AR CQ |
| Qwen 2.5 72B Instruct | Alibaba | 7.6/10 | TC CE LA AR CQ |
| Gemini 1.5 Flash | 7.5/10 | TC CE LA AR CQ | |
| Llama 3.1 70B Instruct | Meta | 7.5/10 | TC CE LA AR CQ |
| Command R+ | Cohere | 7.4/10 | TC CE LA AR CQ |
| Jamba 1.5 Large | AI21 Labs | 7.3/10 | TC CE LA AR CQ |
| GPT-4o mini | OpenAI | 7.2/10 | TC CE LA AR CQ |
| Yi Large | 01.AI | 7.1/10 | TC CE LA AR CQ |
| DBRX Instruct | Databricks | 7/10 | TC CE LA AR CQ |
Built for developers who ship agents to production.