Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
The final day of OpenAI’s “12 Days of Shipmas” came with the presentation of o3, a new chain of thought “reasoning” model that the company claims is its most advanced. The model is not yet available for general use, but security researchers can sign up for a preview leaving today.
OpenAI and others hope the reasoning models will go a long way toward solving the pernicious problem of chatbots frequently producing wrong answers. Chatbots fundamentally do not “think” like humans and different techniques are needed to try and create the best simulacrum of a human thought process.
When asking a question, reasoning patterns pause and consider related suggestions that might help produce an accurate answer. For example, if you ask the o3 model, “can they grow habaneros in the Pacific Northwest,” the model might set up a series of questions that it will research to arrive at a conclusion, such as “where do you grow the habaneros typically”, “. what are the ideal conditions for growing habaneros”, and “what type of climate does the Pacific Northwest have”. Anyone who has used chatbots knows that sometimes you need to send a chatbot with additional follow-ups until you get the right result. Reasoning models are supposed to do this additional work for you.
o3 is the successor to o1, OpenAI’s first thought chain reasoning model. The representatives said that they decided to skip the name convention “o2” “out of respect” for the British telecommunications company, but it certainly does not hurt that it makes the sound of the most advanced product. The company says that the new model comes with the ability to adjust its reasoning time. Users can choose low, medium or high reasoning time; the bigger the compute, the better o3 is supposed to do. OpenAI says it will spend time “red-teaming” the new model with researchers to prevent it producing potentially harmful responses (again, he is not human and does not know right vs. wrong).
Reasoning is the buzzword of the day in the field of generative AI, as industry insiders believe it is the next unlock needed to improve the performance of major language models. More compute eventually does not offer equivalent performance gains, so new techniques are needed. Google DeepMind recently presented its own model of reasoning called Gemini Deep Researchwhich can take 5-10 minutes to generate a report that analyzes several sources on the web to come to its results.
OpenAI is confident in o3, and offers impressive benchmarks – it says that in a Codeforcing test, which measures coding ability, o3 has a score of 2727. For context, a score of 2400 puts an engineer in the 99th percentile of the programmers. Get a score of 96.7% in the 2024 American Invitational Mathematics Exam, missing only one question. We will see how the model holds up in tests in the real world; Sora was recently released by OpenAI still need work. But the optimists are confident that the accuracy problem is solved. However, be somewhat confident with AI models for important work where precision is required.
AI modeling companies like OpenAI and Perplexity are in a race to become the next Google, collecting knowledge from the world and helping users understand everything. They also have search products now that are intended to more directly replicate Google with access to real-time web results.
However, all these players seem to outdo each other with each passing day. The feeling is somewhat reminiscent of the late 90s when there were a myriad of search engines to choose from—Google, Yahoo and AltaVista, Ask Jeeves, to name a few—all vacuuming up data from the Internet and presenting only with a different UX. Most of them disappeared after one came that was supremely better than the rest – Google.
OpenAI clearly has a strong advantage now with hundreds of millions of monthly active users and a partnership with Apple, but Google has received a lot of plaudits recently for advancements in its Gemini models. The Verge reports that the company will soon integrate Gemini deeper into its search interface.