#alphageometry | Explore Tumblr posts and blogs

jcmarchi · 4 months ago

Text

What the Launch of OpenAI’s o1 Model Tells Us About Their Changing AI Strategy and Vision

New Post has been published on https://thedigitalinsider.com/what-the-launch-of-openais-o1-model-tells-us-about-their-changing-ai-strategy-and-vision/

What the Launch of OpenAI’s o1 Model Tells Us About Their Changing AI Strategy and Vision

OpenAI, the pioneer behind the GPT series, has just unveiled a new series of AI models, dubbed o1, that can “think” longer before they respond. The model is developed to handle more complex tasks, particularly in science, coding, and mathematics. Although OpenAI has kept much of the model’s workings under wraps, some clues offer insight into its capabilities and what it may signal about OpenAI’s evolving strategy. In this article, we explore what the launch of o1 might reveal about the company’s direction and the broader implications for AI development.

Unveiling o1: OpenAI’s New Series of Reasoning Models

The o1 is OpenAI’s new generation of AI models designed to take a more thoughtful approach to problem-solving. These models are trained to refine their thinking, explore strategies, and learn from mistakes. OpenAI reports that o1 has achieved impressive gains in reasoning, solving 83% of problems in the International Mathematics Olympiad (IMO) qualifying exam—compared to 13% by GPT-4o. The model also excels in coding, reaching the 89th percentile in Codeforces competitions. According to OpenAI, future updates in the series will perform on par with PhD students across subjects like physics, chemistry, and biology.

OpenAI’s Evolving AI Strategy

OpenAI has emphasized scaling models as the key to unlocking advanced AI capabilities since its inception. With GPT-1, which featured 117 million parameters, OpenAI pioneered the transition from smaller, task-specific models to expansive, general-purpose systems. Each subsequent model—GPT-2, GPT-3, and the latest GPT-4 with 1.7 trillion parameters—demonstrated how increasing model size and data can lead to substantial improvements in performance.

However, recent developments indicate a significant shift in OpenAI’s strategy for developing AI. While the company continues to explore scalability, it is also pivoting towards creating smaller, more versatile models, as exemplified by ChatGPT-4o mini. The introduction of ‘longer thinking’ o1 further suggests a departure from the exclusive reliance on neural networks’ pattern recognition capabilities towards sophisticated cognitive processing.

From Fast Reactions to Deep Thinking

OpenAI states that the o1 model is specifically designed to take more time to think before delivering a response. This feature of o1 seems to align with the principles of dual process theory, a well-established framework in cognitive science that distinguishes between two modes of thinking—fast and slow.

In this theory, System 1 represents fast, intuitive thinking, making decisions automatically and intuitively, much like recognizing a face or reacting to a sudden event. In contrast, System 2 is associated with slow, deliberate thought used for solving complex problems and making thoughtful decisions.

Historically, neural networks—the backbone of most AI models—have excelled at emulating System 1 thinking. They are quick, pattern-based, and excel at tasks that require fast, intuitive responses. However, they often fall short when deeper, logical reasoning is needed, a limitation that has fueled ongoing debate in the AI community: Can machines truly mimic the slower, more methodical processes of System 2?

Some AI scientists, such as Geoffrey Hinton, suggest that with enough advancement, neural networks could eventually exhibit more thoughtful, intelligent behavior on their own. Other scientists, like Gary Marcus, argue for a hybrid approach, combining neural networks with symbolic reasoning to balance fast, intuitive responses and more deliberate, analytical thought. This approach is already being tested in models like AlphaGeometry and AlphaGo, which utilize neural and symbolic reasoning to tackle complex mathematical problems and successfully play strategic games.

OpenAI’s o1 model reflects this growing interest in developing System 2 models, signaling a shift from purely pattern-based AI to more thoughtful, problem-solving machines capable of mimicking human cognitive depth.

Is OpenAI Adopting Google’s Neurosymbolic Strategy?

For years, Google has pursued this path, creating models like AlphaGeometry and AlphaGo to excel in complex reasoning tasks such as those in the International Mathematics Olympiad (IMO) and the strategy game Go. These models combine the intuitive pattern recognition of neural networks like large language models (LLMs) with the structured logic of symbolic reasoning engines. The result is a powerful combination where LLMs generate rapid, intuitive insights, while symbolic engines provide slower, more deliberate, and rational thought.

Google’s shift towards neurosymbolic systems was motivated by two significant challenges: the limited availability of large datasets for training neural networks in advanced reasoning and the need to blend intuition with rigorous logic to solve highly complex problems. While neural networks are exceptional at identifying patterns and offering possible solutions, they often fail to provide explanations or handle the logical depth required for advanced mathematics. Symbolic reasoning engines address this gap by giving structured, logical solutions—albeit with some trade-offs in speed and flexibility.

By combining these approaches, Google has successfully scaled its models, enabling AlphaGeometry and AlphaGo to compete at the highest level without human intervention and achieve remarkable feats, such as AlphaGeometry earning a silver medal at the IMO and AlphaGo defeating world champions in the game of Go. These successes of Google suggest that OpenAI may adopt a similar neurosymbolic strategy, following Google’s lead in this evolving area of AI development.

o1 and the Next Frontier of AI

Although the exact workings of OpenAI’s o1 model remain undisclosed, one thing is clear: the company is heavily focusing on contextual adaptation. This means developing AI systems that can adjust their responses based on the complexity and specifics of each problem. Instead of being general-purpose solvers, these models could adapt their thinking strategies to better handle various applications, from research to everyday tasks.

One intriguing development could be the rise of self-reflective AI. Unlike traditional models that rely solely on existing data, o1’s emphasis on more thoughtful reasoning suggests that future AI might learn from its own experiences. Over time, this could lead to models that refine their problem-solving approaches, making them more adaptable and resilient.

OpenAI’s progress with o1 also hints at a shift in training methods. The model’s performance in complex tasks like the IMO qualifying exam suggests we may see more specialized, problem-focused training. This ability could result in more tailored datasets and training strategies to build more profound cognitive abilities in AI systems, allowing them to excel in general and specialized fields.

The model’s standout performance in areas like mathematics and coding also raises exciting possibilities for education and research. We could see AI tutors that provide answers and help guide students through the reasoning process. AI might assist scientists in research by exploring new hypotheses, designing experiments, or even contributing to discoveries in fields like physics and chemistry.

The Bottom Line

OpenAI’s o1 series introduces a new generation of AI models crafted to address complex and challenging tasks. While many details about these models remain undisclosed, they reflect OpenAI’s shift towards deeper cognitive processing, moving beyond mere scaling of neural networks. As OpenAI continues to refine these models, we may enter a new phase in AI development where AI performs tasks and engages in thoughtful problem-solving, potentially transforming education, research, and beyond.

0 notes

lifetechweb · 5 months ago

Text

IA na Olimpíada Internacional de Matemática: como AlphaProof e AlphaGeometry 2 alcançaram o padrão de medalha de prata

O raciocínio matemático é um aspecto vital das habilidades cognitivas humanas, impulsionando o progresso em descobertas científicas e desenvolvimentos tecnológicos. À medida que nos esforçamos para desenvolver inteligência artificial geral que corresponda à cognição humana, equipar a IA com capacidades avançadas de raciocínio matemático é essencial. Embora os sistemas de IA atuais possam lidar…

View On WordPress

#AI #AlphaGeometry #AlphaGeometry 2 #AlphaProof #AlphaZero #IA Neuro-simbólica #IMO #Olimpíada Internacional de Matemática #Raciocínio Matemático #Resolução de Problemas Matemáticos

0 notes

ibmathsresources · 11 months ago

Text

AI Masters Olympiad Geometry

AI Masters Olympiad Geometry The team behind Google’s Deep Mind have just released details of a new AI system: AlphaGeometry This has been specifically trained to solve classical geometry problems – and already is now at the level of a Gold Medalist at the International Olympiad (considering only geometry problems). This is an incredible achievement – as in order to solve classical geometry…

View On WordPress

#alphageometry #artificial intelligence #deepmind #google

0 notes

appslookup · 1 year ago

Text

AlphaGeometry: DeepMind’s AI Masters Geometry Problems at Olympiad Levels

#AlphaGeometry is indeed an impressive feat in the advancement of AI! It represents a significant leap in the capability of AI to tackle complex geometry problems at the level of Olympiad-level competitions like the International Mathematical Olympiad (IMO). By AppsLookup

#appslookup #ai #artificialintelligence #deepmind #alphageometry #AlphaGeometry

0 notes

definitelytzar · 1 year ago

Link

#ai #AlphaGeometry #artificialintelligence #DeepMind #geometry #Google #GoogleDeepMind #Olympiad

0 notes

govindhtech · 6 months ago

Text

AlphaProof: Google AI Systems To Think Like Mathematicians

AlphaProof and AlphaGeometry 2

Google AI systems advance towards thinking by making strides in maths. One question was answered in minutes, according to a blog post by Google, but other questions took up to three days to answer longer than the competition’s time limit. Nevertheless, the scores are among the highest achieved by an Al system in the competition thus far.

Google, a division of Alphabet, showcased two artificial intelligence systems that showed improvements in generative Al development the ability to solve challenging mathematical problems.

The current breed of AI models has had difficulty with abstract arithmetic since it demands more reasoning power akin to human intellect. These models operate by statistically anticipating the following word.

The company’s Al division, DeepMind, released data demonstrating that its recently developed Al models, namely AlphaProof and AlphaGeometry 2, answered four of every six questions in the 2024 International Math Olympiad, a well-known tournament for high school students.

One question was answered in minutes, according to a blog post by Google, but other questions took up to three days to answer longer than the competition’s time limit. Nevertheless, the scores are among the highest achieved by an Al system in the competition thus far.

AlphaZero

The business said that AlphaZero, another Al system that has previously defeated humans at board games like chess and go, and a version of Gemini, the language model underlying its chatbot of the same name, were combined to produce AlphaProof, a reasoning-focused system. Only five out of the more than 600 human competitors were able to answer the most challenging question, which was one of the three questions that AlphaProof answered correctly.

AlphaGeometry 2

AlphaGeometry 2 solved another math puzzle. It was previously reported in July that OpenAI, supported by Microsoft, was working on reasoning technology under the code name “Strawberry.” As Reuters first revealed, the project, originally known as Q, was regarded as such a breakthrough that several staff researchers warned OpenAI’s board of directors in a letter they wrote in November, stating that it could endanger humankind.

The top choice for document editing and proofreading is AlphaProof. The demand for accurate and efficient services is growing in the digital age. It stands out as a leading option, offering excellent services to guarantee your documents are flawless. In order to show why AlphaProof is unique in the industry, this article explores its features, advantages, and user experiences.

How does AlphaProof work?

AlphaProof a feature-rich online tool, handles all editing and proofreading needs. It offers specialized services to increase the quality and readability of your documents for professionals, students, and company owners. AlphaProof publishes technical documentation, corporate reports, creative writing, and academic essays.

Essential Elements of AlphaProof

Expert Proofreading

To fix typographical, punctuation, and grammar flaws in your documents, AlphaProof has a team of highly skilled proofreaders who carefully go over them. This guarantees that your text looks professional and is free of common mistakes.

Complex Editing

It provides sophisticated editing services in addition to basic proofreading. This entails streamlining the sentence structure, boosting readability overall, and strengthening coherence and flow. Better word selections and stylistic enhancements are also suggested by the editors.

Editors with specific expertise

AlphaProof recognizes that varying documents call for varying levels of competence. It boasts a diverse team of editors with skills in technical writing, business communication, academic writing, and creative writing. This guarantees that an individual possessing pertinent expertise and experience will evaluate your material.

Quick Resolution

Quick turnaround times are provided by AlphaProof to help you meet deadlines. You can choose 24-hour express service to ensure your document is available when you need it.

Easy-to-use interface

The AlphaProof platform boasts an intuitive interface that facilitates the uploading of documents, selection of services, and tracking of order status. From beginning to end, the procedure is simplified to offer a hassle-free experience.

Secrecy and Protection

The security and privacy of your papers are very important to it. The platform uses cutting-edge encryption technology to safeguard your data, and every file is handled with the highest care.

The Advantages of AlphaProof Use

Better Document Quality

The quality of your documents can be greatly improved by utilising it’s services. This can result in more professionalism in corporate communication, higher grades, and a more positive impression on your readers.

Reduce Effort and Time

Editing and proofreading can be laborious processes. With AlphaProof, you can focus on your primary responsibilities while professionals optimize your papers, saving you time and effort.

Customized Offerings

To address the unique requirements of various document formats, It offers customized services. AlphaProof may provide you with comprehensive editing for a research paper or expeditious proofreading for an email.

Knowledgeable Perspectives

The editor’s comments and recommendations on it can give you important information about your writing style and areas that need work. With time, this can assist you in improving as a writer.

A Boost in Self-Assurance

You may feel more confident in the calibre of your work if you know it has been expertly edited and proofread. For high-stakes papers like published articles, commercial proposals, and theses from academic institutions, this is especially crucial.

Customer Experiences

Scholars and Students

AlphaProof has proven to be a useful resource for numerous academics and students. A postgraduate student said, “AlphaProof enabled me to refine my thesis to the ideal level.” The final draft was error-free, and the editors’ suggestions were wise.”

Composers and Novelists

The specialized editing services provided by AlphaProof are valued by authors and creative writers. A budding writer said, “it’s editors understood my voice and style, providing feedback that improved my manuscript without altering my unique voice.”

In conclusion

With a variety of features and advantages to meet a wide range of demands, AlphaProof stands out as a top option for document editing and proofreading. It guarantees that your documents are flawless, saving you time and improving the calibre of your work. It does this through its skilled staff, quick return times, and intuitive interface.

Read more on govindhtech.com

#AISystems #AlphaProof #AlphaGeometry 2 #generativeAl #artificialintelligence #AImodels #AlphaZero #OpenAI #news #technews #technology #technologynews #technologytrends #govindhtech

0 notes

gamingavickreyauction · 6 months ago

Text

I haven't seen anyone talk yet about the fact that an AI solved 4/6 of this year's IMO problems. Is there some way they fudged it so that it's not as big a deal as it seems? (I do not count more time as fudging- you could address that by adding more compute. I also do not count giving the question already formalised as fudging, as AIs can already do that).

I ask because I really want this not to be a big deal, because the alternative is scary. I thought this would be one of the last milestones for AI surpassing human intelligence, and it seems like the same reasoning skills required for this problem would be able to solve a vast array of other important problems. And potentially it's a hop and a skip away from outperforming humans in research mathematics.

I did not think we were anywhere near this point, and I was already pretty worried about the societal upheaval that neural networks will cause.

4 notes · View notes

librarianrafia · 11 months ago

Text

"There’s no one thing called “AI”

The question of what AI can and can’t do is made very challenging to navigate by a frustrating tendency that I’ve observed among many commentators to blur the lines between hierarchical levels of AI technology. ....

AI is too broad and fuzzy to cleanly decompose into a proper hierarchy, but there are a few ways to impose a messy order on it. ...

Frequently, reporting on new technology will collapse this huge category into a single amorphous entity, ascribing any properties of its individual elements to AI at large. .... All of this really makes it seem like “an AI” is a discrete kind of thing that is manning chat bots, solving unsolved math problems, and beating high schoolers at geometry Olympiads. But this isn’t remotely the case. FunSearch, AlphaGeometry, and ChatGPT are three completely different kinds of technologies which do three completely different kinds of things are are not at all interchangeable or even interoperable. You can’t have a conversation with AlphaGeometry and and ChatGPT can’t solve geometry Olympiad problems.

... I believe that this property, where there are many ways to appear to have done it (by outputting a million random digits, for example), but only a very small number of ways to actually do it (by outputting the correct million digits), is characteristic of things that Generative AI systems will generally be bad at. ChatGPT works by making repeated guesses. At any given point in its attempt to generate the decimal digits of π, there are 10 digits to choose from, only one of which is the right one. The probability that it’s going to make a million correct guesses in a row is infinitesimally small, so small that we might as well call it zero. For this reason, this particular task is not one that’s well suited to this particular type of text generation.

...We can see this same pattern in other generative AI systems as well, where the system seems to perform well if the success criteria are quite general, but increasing specificity causes failures.

#artificial intelligence #machine learning

2 notes · View notes

moko1590m · 16 days ago

Quote

2024年12月25日 09時45分 OpenAIのo3モデルが数学の超難問データセット「FrontierMath」で25.2％のスコアを獲得した衝撃を数学者が語るイ��ペリアル・カレッジ・ロンドンで純粋数学の教授を務める数学者のケビン・バザード氏が、OpenAIのo3モデルがFrontierMath問題データセットで25.2％のスコアを獲得したことについて解説するブログ記事を投稿しました。 Can AI do maths yet? Thoughts from a mathematician. | Xena https://xenaproject.wordpress.com/2024/12/22/can-ai-do-maths-yet-thoughts-from-a-mathematician/ 2024年12月20日に、OpenAIは新たな推論モデル「o3」シリーズを発表しました。OpenAIはo3モデルについて「これまで開発した中で最も高度な推論能力を持つ」と述べ、2025年の公開に向けて準備を進めています。 OpenAIが推論能力を大幅に強化した「o3」シリーズを発表、推論の中でOpenAIの安全ポリシーについて「再考」する仕組みを導入 - GIGAZINE o3モデルはFrontierMathという問題データセットで25.2％のスコアを獲得したことが明らかになっています。FrontierMathは数百個の難しい数学の問題のデータセットで、問題そのものだけでなくデータセット全体の問題数なども秘密であり、AIが事前に問題をトレーニングしないよう注意深く設計されています。 FrontierMathの全ての問題は計算問題で、「証明せよ」などの形式の問題は含まれていないとのこと。公開されている5つのサンプル問題では答えが全て正の整数となっており、その他の問題についても「自動的に検証できる明確で計算可能な答えがある」とされています。問題の難易度はかなり高く、数学者のバザード氏でもサンプル問題のうち解けたのは2問だけで、別の1問については「取り組めば解けるかも」と思えたものの、残りの2問は「解けない」と思ったそうです。 FrontierMathの論文にはフィールズ賞受賞者などの著名な数学者による問題の難易度評価が記載されていますが、「極めて難しい」とコメントした上で、それぞれの問題の分野の専門家でなくては解答できないことを示唆しています。実際、バザード氏が解けた2問はバザード氏の専門分野の問題でした。なお、実際の数学者は計算ではなく証明や証明のためのアイデアを考え出すことにほとんどの時間を使用するため、「計算で数値的な答えを出すことは独自の証明を思いつくことと完全に異なる」として数学力の計測に不適だとする数学者も存在します。しかし証明を採点するのはコストがかかるため、モデルが提出した答えが正答と一致するかどうかを確認するだけで採点できる計算問題が採用されているとのこと。そうしたFrontierMathのテストに対し、OpenAIのo3モデルが25.2％ものスコアを獲得したことについてバザード氏は「衝撃を受けた」と述べました。これまでAIは優秀な高校生が解くような「数学オリンピック形式」を得意としていることが明らかになっており、バザード氏は「多くの典型問題が出題される」という点で似ている大学の学部生レベルの数学の問題をAIが解けるようになることは疑っていませんでした。しかし、典型問題のレベルを超えて博士課程の初期レベルの問題に対し革新的なアイデアで対応するレベルの数学力をAIが獲得していることに対し、バザード氏は「かなり大きな飛躍が起きたように見える」とコメントしています。ただし、FrontierMathを組み上げたEpoch AIのエリオット・グレイザー氏はデータセット内の問題の25％は数学オリンピック形式だと発言しています。公開されている5つのサンプル問題はいずれも数学オリンピックの形式とは全く異なるため、バザード氏はo3モデルがFrontierMathで25.2％のスコアを獲得したと聞いて非常に興奮したものの、25％が数学オリンピック形式と知って興奮は収まったとのこと。「今後、AIがFrontierMathで50％のスコアを獲得することを楽しみにしている」とバザード氏はコメントを残しました。現在、AIの進歩は急速に進んでいますが、まだまだ道のりは遠く、やるべき事は山ほどあります。バザード氏はAIが「この定理を正しく証明し、その証明がなぜ機能するのかを人間が理解できる方法で説明せよ」というレベルの問題に対応できるほどの数学力を身につけることを期待しているとブログを締めくくりました。この記事のタイトルとURLをコピーする・関連記事 Microsoftが軽量なのにGPT-4oを圧倒的に上回る数学性能を発揮するAIモデル「Phi-4」をリリース - GIGAZINE AppleのAI研究者らが「今のAI言語モデルは算数の文章題への推論能力が小学生未満」と研究結果を発表 - GIGAZINE 数学オリンピックの問題で銀メダルレベルのスコアを残すAIを開発したとGoogle DeepMindが発表 - GIGAZINE OpenAIが「GPT-4o」を発表、人間と同等の速さでテキスト・音声・カメラ入力を処理可能で「周囲を見渡して状況判断」「数学の解き方を教える」「AI同士で会話して作曲」など多様な操作を実行可能 - GIGAZINE 数学を解ける言語モデル「Qwen2-Math」が登場、GPT-4o超えの数学性能 - GIGAZINE ・関連コンテンツ GPT-4oがAIベンチマークのARC-AGIで50％のスコアに到達、これまでの最高記録である34％を大幅に更新 Google DeepMindが数学オリンピックレベルの幾何学問題を解けるAI「AlphaGeometry」を発表、人間の金メダリストに近い性能を発揮日本語対応マルチモーダルAI「Claude 3」はわずか2つのプロンプトで量子アルゴリズムを設計可能 DeepMindが開発したAIの「AlphaCode」がプログラミングコンテストで「平均」評価を獲得 Metaの大規模言語モデル「LLaMA」がChatGPTを再現できる可能性があるとさまざまなチャットAI用言語モデルのベンチマーク測定で判明画像生成AI・Stable Diffusionのエンコーダーに見つかった致命的な欠陥とは？ OpenAIが4度目のブレイクスルーとなる数学ができるAI「Q*(キュースター)」で汎用人工知能開発の飛躍を目指す、アルトマンCEO解任騒動の一因か OpenAIが複雑な推論能力をもつAIモデル「OpenAI o1」と「OpenAI o1-mini」を発表、プログラミングや数学で高い能力を発揮

OpenAIのo3モデルが数学の超難問データセット「FrontierMath」で25.2％のスコアを獲得した衝撃を数学者が語る - GIGAZINE

0 notes

education30and40blog · 3 months ago

Text

AI achieves silver-medal standard solving International Mathematical Olympiad problems

See on Scoop.it - Education 2.0 & 3.0

Breakthrough models AlphaProof and AlphaGeometry 2 solve advanced reasoning problems in mathematics

0 notes

ai-news · 6 months ago

Link

In a groundbreaking achievement, AI systems developed by Google DeepMind have attained a silver medal-level score in the 2024 International Mathematical Olympiad (IMO), a prestigious global competition for young mathematicians. The AI models, named #AI #ML #Automation

#Blockchain #Crypto

0 notes

jcmarchi · 5 months ago

Text

The AI Scientist

New Post has been published on https://thedigitalinsider.com/the-ai-scientist/

The AI Scientist

A model that can produce novel AI papers plus some really cool papers and tech releases this week.

Next Week in The Sequence:

Edge 423: We explore the fundamentals of state space models including the fmaous S4 paper. The tech section provides an overview of NVIDIA’s NIM framework.

Edge 424: We dive into the DeepMind’s amazing AlphaProof and AlphaGeometry-2 that achieved silver medal in the latest international math olympiad.

You can subscribe to The Sequence below:

TheSequence is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

📝 Editorial: The AI Scientist

If you read this newsletter, you know that I firmly believe discovering new science might be the ultimate test for AGI. While we are still far from having AI that can formulate something like the Riemann Hypothesis or the Theory of General Relativity, we have made tremendous progress in proving and validating scientific ideas across disciplines such as mathematics, physics, biology, chemistry, and others.

The reason science presents such a challenging bar for AI is that it involves aspects like long-term planning, creativity, multidisciplinary knowledge, multi-step fact-checking, and many other components that are still in the very early stages of development in generative AI.

However, progress is being made.

This week, the Japanese AI startup Sakana AI, in collaboration with several other AI labs, published a paper detailing The AI Scientist, a framework for open-ended scientific discovery. The AI Scientist is capable of conducting open-ended research, executing experiments, generating code, visualizing results, and even presenting them in full reports. In the initial demonstrations, The AI Scientist made several contributions across different areas of AI research, including diffusion models, transformers, and grokking.

The core ideas behind The AI Scientist resemble models such as DeepMind’s Alpha Geometry, AlphaProof, or the NuminaMath model that recently won first prize in the AI Math Olympiad. These models use an LLM for idea formulation, combined with more symbolic models for experimentation. The biggest challenge with this approach is whether the idea-generation portion will quickly hit its limits. Some of the most groundbreaking scientific discoveries in history seem to involve a component of human ingenuity that doesn’t yet appear to be present in LLMs. However, this path holds great potential for exploring new ideas in scientific research.

For now, The AI Scientist represents an exciting advancement in open-ended scientific research.

🔎 ML Research

The AI Scientist

Researchers from Sakana AI, Oxford, University of British Columbia and several other institutions published a paper unveiling the AI Scientist, a pipeline for open ended scientific research using LLMs. The AI Scientist injects AI in different area of scientific research such as ideation, a literature search, experiment planning, experiment iterations, manuscript writing, and peer reviewing —> Read more.

Imagen 3

Google published the technical report of Imagen 3, their marquee text-to-image model. The paper details the training and evaluation details behind Imagen 3 as well as some of the challenges around safety —> Read more.

Mitigating Hallucinations

Google Research published a paper detailing HALVA, a contrastive tuning method that can mitigate hallucinations in language and image assistants. Like other contrastive learning methods, HALVA generates alternative representations of factual tokens with the objective of boosting the probability of the model identifying the correct token —> Read more.

Your Context is Not an Array

Qualcomm Research published a paper that explores the limitations of transformers. The paper suggest that some of the generalization challenges of transformers are related with the inability to perform random memory access within its context window —> Read more.

Mutual Reasoning in LLMs

Microsoft Research published a paper introducing rStar, a self-play multi reasoning approach that seems to improve reasoning capabilities in small language models. rStar uses a generation-discrimination process to decouple the different steps in the reasoning process —> Read more.

Pretraining vs. Fine Tuning

Researchers from Johns Hopkins University published a paper exploring the relationship between pretraining and fine-tuning in LLMs. The paper explores the diminishing returns of fine-tuning after certain scale —> Read more.

🤖 AI Tech Releases

Grok-2

xAI unveiled a new version of Grok that matches the performance of top open source models —> Read more.

SWE-Bench

OpenAI released a subset of the famous SWE-Bench benchmark with human verification —> Read more.

Claude Prompt Caching

Anthropic unveiled prompt caching capabilities for Claude 3.5 Sonnet and Claude 3 Haiku —> Read more.

Airflow 2.10

Apache Airflow 2.10 arrived with a strong focu on AI workflows —> Read more.

AI Risks Database

MIT open sourced a database of over 700 AI risks across different categories —> Read more.

🛠 Real World AI

Image Animation at Meta

Meta discusses the AI techniques used for image animation at scale —> Read more.

Model Reliability at Salesforce

Salesforce discusses the methods used to ensure AI model reliability and performance in their internal pipelines —> Read more.

📡AI Radar

Fei-Fei Li’s World Labs raised $100 million at a $1 billion valuation.

Decentralized AI startup Sahara AI raised $43 million in new funding.

Snowflake announced its Cortex Analyst solution to power self-service analytics with AI.

AI observaility platform Goodfire raised $7 million in new funding.

AI-focused VC Radical Ventures raised a new $800 million fund.

Raunway Gen-3 Turbo showcased very impressive capabilities.

AI-based stock evaluator TipRanks was acquired for $200 million.

Real Estate AI company EliseAI raised $75 million at $1 billion valuation.

Encord, an AI data development platform, raised a $30 million Series B.

RAG as a service platform Ragie raised $5.5 million.

CodeRabbit raised $16 million for using AI to automate code reviews.

AI-based scientific research platform Consensus raised an $11.5 million Series A.

TheSequence is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

0 notes

news2024news · 6 months ago

Text

IA en las Olimpiadas Matemáticas: AlphaProof y AlphaGeometry http://dlvr.it/TB7pnx

0 notes

iwan1979 · 6 months ago

Text

Google DeepMind: AI achieves silver-medal standard solving International Mathematical Olympiad problems

0 notes

monterplant · 6 months ago

Text

Google claims math breakthrough with proof-solving AI models

AlphaProof and AlphaGeometry 2 solve problems, with caveats on time and human assistance Continue reading Google claims math breakthrough with proof-solving AI models

#News

0 notes

thebourisbox · 9 months ago

Text

First AI outperforming international math olympiad gold medalist

See on Scoop.it - Design, Science and Technology

Proving geometric theorems constitutes a hallmark of visual reasoning combining both intuitive and logical skills. Therefore, automated theorem proving of Olympiad-level geometry problems is considered a notable milestone in human-level automated reasoning. The introduction of AlphaGeometry, a neuro-symbolic model trained with 100 million synthetic samples, marked a major breakthrough. It solved 25 of 30 International Mathematical Olympiad (IMO) problems whereas the reported baseline based on Wu’s method solved only ten.

In this paper, the IMO-AG-30 Challenge introduced with AlphaGeometry was revisited, and the researchers found that Wu’s method is surprisingly strong. Wu’s method alone can solve 15 problems, and some of them are not solved by any of the other methods. This leads to two key findings: (i) Combining Wu’s method with the classic synthetic methods of deductive databases and angle, ratio, and distance chasing solves 21 out of 30 methods by just using a CPU-only laptop with a time limit of 5 minutes per problem. Essentially, this classic method solves just 4 problems less than AlphaGeometry and establishes the first fully symbolic baseline, strong enough to rival the performance of an IMO silver medalist. (ii) Wu’s method even solves 2 of the 5 problems that AlphaGeometry failed to solve. Thus, by combining AlphaGeometry with Wu’s method a new state-of-the-art for automated theorem proving on IMO-AG-30, solving 27 out of 30 problems, the first AI method which outperforms an IMO gold medalist is finally achieved.

Read the full article at: arxiv.org

0 notes