OpenAI has released GPT-4.5, claiming it to be their biggest and best chat model yet. This model, part of the non-reasoning lineup, scored 62.5% on SimpleQA, a general-knowledge quiz, compared to 38.6% for its predecessor, GPT-4o, and 15% for o3-mini. It also showed a significant reduction in hallucinations, with only 37.1% of responses being made up, versus 59.8% for GPT-4o and 80.3% for o3-mini. Despite these improvements, on other benchmarks like MMLU, the margin of improvement was smaller, and it scored worse than o3-mini on standard science and math tests. However, human testers preferred GPT-4.5 for its conversational skills, especially in everyday, professional, and creative tasks. The model was trained using techniques similar to those for GPT-4o, focusing on scaling up compute, data, and training efficiency.
Source: www.technologyreview.com















