#AI reasoning models
Explore tagged Tumblr posts
jcmarchi · 14 days ago
Text
From OpenAI’s O3 to DeepSeek’s R1: How Simulated Thinking Is Making LLMs Think Deeper
New Post has been published on https://thedigitalinsider.com/from-openais-o3-to-deepseeks-r1-how-simulated-thinking-is-making-llms-think-deeper/
From OpenAI’s O3 to DeepSeek’s R1: How Simulated Thinking Is Making LLMs Think Deeper
Large language models (LLMs) have evolved significantly. What started as simple text generation and translation tools are now being used in research, decision-making, and complex problem-solving. A key factor in this shift is the growing ability of LLMs to think more systematically by breaking down problems, evaluating multiple possibilities, and refining their responses dynamically. Rather than merely predicting the next word in a sequence, these models can now perform structured reasoning, making them more effective at handling complex tasks. Leading models like OpenAI’s O3, Google’s Gemini, and DeepSeek’s R1 integrate these capabilities to enhance their ability to process and analyze information more effectively.
Understanding Simulated Thinking
Humans naturally analyze different options before making decisions. Whether planning a vacation or solving a problem, we often simulate different plans in our mind to evaluate multiple factors, weigh pros and cons, and adjust our choices accordingly. Researchers are integrating this ability to LLMs to enhance their reasoning capabilities. Here, simulated thinking essentially refers to LLMs’ ability to perform systematic reasoning before generating an answer. This is in contrast to simply retrieving a response from stored data. A helpful analogy is solving a math problem:
A basic AI might recognize a pattern and quickly generate an answer without verifying it.
An AI using simulated reasoning would work through the steps, check for mistakes, and confirm its logic before responding.
Chain-of-Thought: Teaching AI to Think in Steps
If LLMs have to execute simulated thinking like humans, they must be able to break down complex problems into smaller, sequential steps. This is where the Chain-of-Thought (CoT) technique plays a crucial role.
CoT is a prompting approach that guides LLMs to work through problems methodically. Instead of jumping to conclusions, this structured reasoning process enables LLMs to divide complex problems into simpler, manageable steps and solve them step-by-step.
For example, when solving a word problem in math:
A basic AI might attempt to match the problem to a previously seen example and provide an answer.
An AI using Chain-of-Thought reasoning would outline each step, logically working through calculations before arriving at a final solution.
This approach is efficient in areas requiring logical deduction, multi-step problem-solving, and contextual understanding. While earlier models required human-provided reasoning chains, advanced LLMs like OpenAI’s O3 and DeepSeek’s R1 can learn and apply CoT reasoning adaptively.
How Leading LLMs Implement Simulated Thinking
Different LLMs are employing simulated thinking in different ways. Below is an overview of how OpenAI’s O3, Google DeepMind’s models, and DeepSeek-R1 execute simulated thinking, along with their respective strengths and limitations.
OpenAI O3: Thinking Ahead Like a Chess Player
While exact details about OpenAI’s O3 model remain undisclosed, researchers believe it uses a technique similar to Monte Carlo Tree Search (MCTS), a strategy used in AI-driven games like AlphaGo. Like a chess player analyzing multiple moves before deciding, O3 explores different solutions, evaluates their quality, and selects the most promising one.
Unlike earlier models that rely on pattern recognition, O3 actively generates and refines reasoning paths using CoT techniques. During inference, it performs additional computational steps to construct multiple reasoning chains. These are then assessed by an evaluator model—likely a reward model trained to ensure logical coherence and correctness. The final response is selected based on a scoring mechanism to provide a well-reasoned output.
O3 follows a structured multi-step process. Initially, it is fine-tuned on a vast dataset of human reasoning chains, internalizing logical thinking patterns. At inference time, it generates multiple solutions for a given problem, ranks them based on correctness and coherence, and refines the best one if needed. While this method allows O3 to self-correct before responding and improve accuracy, the tradeoff is computational cost—exploring multiple possibilities requires significant processing power, making it slower and more resource-intensive. Nevertheless, O3 excels in dynamic analysis and problem-solving, positioning it among today’s most advanced AI models.
Google DeepMind: Refining Answers Like an Editor
DeepMind has developed a new approach called “mind evolution,” which treats reasoning as an iterative refinement process. Instead of analyzing multiple future scenarios, this model acts more like an editor refining various drafts of an essay. The model generates several possible answers, evaluates their quality, and refines the best one.
Inspired by genetic algorithms, this process ensures high-quality responses through iteration. It is particularly effective for structured tasks like logic puzzles and programming challenges, where clear criteria determine the best answer.
However, this method has limitations. Since it relies on an external scoring system to assess response quality, it may struggle with abstract reasoning with no clear right or wrong answer. Unlike O3, which dynamically reasons in real-time, DeepMind’s model focuses on refining existing answers, making it less flexible for open-ended questions.
DeepSeek-R1: Learning to Reason Like a Student
DeepSeek-R1 employs a reinforcement learning-based approach that allows it to develop reasoning capabilities over time rather than evaluating multiple responses in real time. Instead of relying on pre-generated reasoning data, DeepSeek-R1 learns by solving problems, receiving feedback, and improving iteratively—similar to how students refine their problem-solving skills through practice.
The model follows a structured reinforcement learning loop. It starts with a base model, such as DeepSeek-V3, and is prompted to solve mathematical problems step by step. Each answer is verified through direct code execution, bypassing the need for an additional model to validate correctness. If the solution is correct, the model is rewarded; if it is incorrect, it is penalized. This process is repeated extensively, allowing DeepSeek-R1 to refine its logical reasoning skills and prioritize more complex problems over time.
A key advantage of this approach is efficiency. Unlike O3, which performs extensive reasoning at inference time, DeepSeek-R1 embeds reasoning capabilities during training, making it faster and more cost-effective. It is highly scalable since it does not require a massive labeled dataset or an expensive verification model.
However, this reinforcement learning-based approach has tradeoffs. Because it relies on tasks with verifiable outcomes, it excels in mathematics and coding. Still, it may struggle with abstract reasoning in law, ethics, or creative problem-solving. While mathematical reasoning may transfer to other domains, its broader applicability remains uncertain.
Table: Comparison between OpenAI’s O3, DeepMind’s Mind Evolution and DeepSeek’s R1
The Future of AI Reasoning
Simulated reasoning is a significant step toward making AI more reliable and intelligent. As these models evolve, the focus will shift from simply generating text to developing robust problem-solving abilities that closely resemble human thinking. Future advancements will likely focus on making AI models capable of identifying and correcting errors, integrating them with external tools to verify responses, and recognizing uncertainty when faced with ambiguous information. However, a key challenge is balancing reasoning depth with computational efficiency. The ultimate goal is to develop AI systems that thoughtfully consider their responses, ensuring accuracy and reliability, much like a human expert carefully evaluating each decision before taking action.
3 notes · View notes
redactedshapes · 3 months ago
Text
Tumblr media Tumblr media Tumblr media Tumblr media
PARTY IN THE OLDEST HOUSE GUUUYYYYYS
There it is, eight months in the making.
Given the size of this file and the amount of details, I've included more close-ups and a download link to a 2k file over here:
big thanks to @wankernumberniiiiiiiiine, she's the reason this painting exists 🥰
805 notes · View notes
creature-wizard · 2 months ago
Note
This article is from October (don’t love the URL but the Mo Gawkdat is reputable) and the argument for potentiality is still a very real concern by Nobel Laureates. It’s not even a matter of if but when and it appears to already be happening. Embrace the new era, friend!
Yeah, this Google dude is just straight-up wrong. Literally go search Qwant or DuckDuckGo for "why AI is not sentient" and you will find plenty of articles explaining why.
16 notes · View notes
sweetened-condensed-rage · 27 days ago
Text
"Oh but I'm not good at drawing so I-"
SHUT UP
You know what I did when I wanted to make art but it didn't quite compute in my brain????
I LEARNED AN ART STYLE THAT WOULD
You don't need a fucking computer to do it for you.
7 notes · View notes
dontmindmejustafangirl · 2 months ago
Text
There is another kind of rage and despair you will feel when you are an artistic person studying and working in technological fields and you keep seeing your peers and people you are supposed to look up to blatantly using artists' hard work for training text-to-speech models.
8 notes · View notes
pigeonphd · 1 year ago
Text
nooooo dude you dont get it when i draw upon my lifetime's worth of observations of other peoples art and then synthesize the patterns i noticed into an original yet derived work of art its beautiful but when a computer does it its copyright infringement
34 notes · View notes
cielosuerte · 2 days ago
Text
one of the most annoying outcomes/wins we've handed techbros about AI is buying in to letting them talk about technology that has existed & been stagnant for 10+ years like they just invented it in their sleep last night.
4 notes · View notes
chewwytwee · 22 days ago
Text
maybe if you find yourself liking AI art all the time and that makes you sad for some reason you just don' t have very good taste in art lol
#.txt#idk I just haven't really found it hard to weed out 'ai' or whatever#because I don't actively follow and reblog art I don't like#like idk I'm tired of artists on here preaching about AI as if its crypto#and everyones jumping through hoops to make ai the same as crypto#'its art theft! Its electronically wasteful! Those are the only things that were wrong with crypto so AI is JUST AS BAD AS CRYPTO!!!!!!'#way to show your ass and demonstrate you have no clue what the actual problem with crypto was#like yeah the art theft and energy waste are bad things but the real issue with crypto was its attempt to economize everything#make everything online a possible area for wealth extraction via spectulative currencies#AI is just... not that#and even the comparisons with theft and energy use are tenuous at best#but why investigate the things you believe when you could go on an outrage fueled crusade against some random tech you don't like#When are we gonna get over it and talk about the actual issues facing artists online? because its not ai#I have not seen a single case where an artist is actively being... taken advantage of by ai?#except in the vague sense of 'I think my art might maybe be in the training set for this... so its stealing from me >:('#I still have yet to see a compelling reason that AI is 'anti art'#aside from reactionary whining about how AI users have a 'lazy corrupt soul' and are evil crooks who want all artists to starve to death#like cmon guys its actually embarassing#im actually BEGGING you if youre reading this to rethink what exactly their problem with AI is and if thats legitimate#because theres a lot of noise out there so you can basically just say whatever you want and find someone whos gonna support it#that fucking '6 cups of water per query' thing? Blatantly untrue and unfounded but now its the standard argument people make cuz its scary#the entirety of a supercomputer does in fact use a lot of water to cool it but AI isnt consuming 100% of the bandwith of those computers#especially not 100% of the time#you can just average the amount of water the computer uses over the average time it takes a query to generate#and then get some random number and claim the query 'used all that water!'#but it didnt and it would be misleading if not an actual lie to say that#additionally training is the only computationally expensive part of AI development#the queries are put through a pre existing model the expensive part is building that model by parsing unfathomable amounts of data#and yeah you can have your problems with super computer water use but its not because of AI#they didnt create these computers just to build AI on theyre fucking supercomputers
3 notes · View notes
frank-olivier · 3 months ago
Text
Tumblr media
Rethinking AI Research: The Paradigm Shift of OpenAI’s Model o1
The unveiling of OpenAI's model o1 marks a pivotal moment in the evolution of language models, showcasing unprecedented integration of reinforcement learning and Chain of Thought (CoT). This synergy enables the model to navigate complex problem-solving with human-like reasoning, generating intermediate steps towards solutions.
OpenAI's approach, inferred to leverage either a "guess and check" process or the more sophisticated "process rewards," epitomizes a paradigm shift in language processing. By incorporating a verifier—likely learned—to ensure solution accuracy, the model exemplifies a harmonious convergence of technologies. This integration addresses the longstanding challenge of intractable expectation computations in CoT models, potentially outperforming traditional ancestral sampling through enhanced rejection sampling and rollout techniques.
The evolution of baseline approaches, from ancestral sampling to integrated generator-verifier models, highlights the community's relentless pursuit of efficiency and accuracy. The speculated merge of generators and verifiers in OpenAI's model invites exploration into unified, high-performance architectures. However, elucidating the precise mechanisms behind OpenAI's model and experimental validations remain crucial, underscoring the need for collaborative, open-source endeavors.
A shift in research focus, from architectural innovations to optimizing test-time compute, underscores performance enhancement. Community-driven replication and development of large-scale, RL-based systems will foster a collaborative ecosystem. The evaluative paradigm will also shift, towards benchmarks assessing step-by-step solution provision for complex problems, redefining superhuman AI capabilities.
Speculations on Test-Time Scaling (Sasha Rush, November 2024)
youtube
Friday, November 15, 2024
2 notes · View notes
jcmarchi · 19 days ago
Text
DeepSeek-R1: Transforming AI Reasoning with Reinforcement Learning
New Post has been published on https://thedigitalinsider.com/deepseek-r1-transforming-ai-reasoning-with-reinforcement-learning/
DeepSeek-R1: Transforming AI Reasoning with Reinforcement Learning
Tumblr media Tumblr media
DeepSeek-R1 is the groundbreaking reasoning model introduced by China-based DeepSeek AI Lab. This model sets a new benchmark in reasoning capabilities for open-source AI. As detailed in the accompanying research paper, DeepSeek-R1 evolves from DeepSeek’s v3 base model and leverages reinforcement learning (RL) to solve complex reasoning tasks, such as advanced mathematics and logic, with unprecedented accuracy. The research paper highlights the innovative approach to training, the benchmarks achieved, and the technical methodologies employed, offering a comprehensive insight into the potential of DeepSeek-R1 in the AI landscape.
What is Reinforcement Learning?
Reinforcement learning is a subset of machine learning where agents learn to make decisions by interacting with their environment and receiving rewards or penalties based on their actions. Unlike supervised learning, which relies on labeled data, RL focuses on trial-and-error exploration to develop optimal policies for complex problems.
Early applications of RL include notable breakthroughs by DeepMind and OpenAI in the gaming domain. DeepMind’s AlphaGo famously used RL to defeat human champions in the game of Go by learning strategies through self-play, a feat previously thought to be decades away. Similarly, OpenAI leveraged RL in Dota 2 and other competitive games, where AI agents exhibited the ability to plan and execute strategies in high-dimensional environments under uncertainty. These pioneering efforts not only showcased RL’s ability to handle decision-making in dynamic environments but also laid the groundwork for its application in broader fields, including natural language processing and reasoning tasks.
By building on these foundational concepts, DeepSeek-R1 pioneers a training approach inspired by AlphaGo Zero to achieve “emergent” reasoning without relying heavily on human-labeled data, representing a major milestone in AI research.
Key Features of DeepSeek-R1
Reinforcement Learning-Driven Training: DeepSeek-R1 employs a unique multi-stage RL process to refine reasoning capabilities. Unlike its predecessor, DeepSeek-R1-Zero, which faced challenges like language mixing and poor readability, DeepSeek-R1 incorporates supervised fine-tuning (SFT) with carefully curated “cold-start” data to improve coherence and user alignment.
Performance: DeepSeek-R1 demonstrates remarkable performance on leading benchmarks:
MATH-500: Achieved 97.3% pass@1, surpassing most models in handling complex mathematical problems.
Codeforces: Attained a 96.3% ranking percentile in competitive programming, with an Elo rating of 2,029.
MMLU (Massive Multitask Language Understanding): Scored 90.8% pass@1, showcasing its prowess in diverse knowledge domains.
AIME 2024 (American Invitational Mathematics Examination): Surpassed OpenAI-o1 with a pass@1 score of 79.8%.
Distillation for Broader Accessibility: DeepSeek-R1’s capabilities are distilled into smaller models, making advanced reasoning accessible to resource-constrained environments. For instance, the distilled 14B and 32B models outperformed state-of-the-art open-source alternatives like QwQ-32B-Preview, achieving 94.3% on MATH-500.
Open-Source Contributions: DeepSeek-R1-Zero and six distilled models (ranging from 1.5B to 70B parameters) are openly available. This accessibility fosters innovation within the research community and encourages collaborative progress.
DeepSeek-R1’s Training Pipeline The development of DeepSeek-R1 involves:
Cold Start: Initial training uses thousands of human-curated chain-of-thought (CoT) data points to establish a coherent reasoning framework.
Reasoning-Oriented RL: Fine-tunes the model to handle math, coding, and logic-intensive tasks while ensuring language consistency and coherence.
Reinforcement Learning for Generalization: Incorporates user preferences and aligns with safety guidelines to produce reliable outputs across various domains.
Distillation: Smaller models are fine-tuned using the distilled reasoning patterns of DeepSeek-R1, significantly enhancing their efficiency and performance.
Industry Insights Prominent industry leaders have shared their thoughts on the impact of DeepSeek-R1:
Ted Miracco, Approov CEO: “DeepSeek’s ability to produce results comparable to Western AI giants using non-premium chips has drawn enormous international interest—with interest possibly further increased by recent news of Chinese apps such as the TikTok ban and REDnote migration. Its affordability and adaptability are clear competitive advantages, while today, OpenAI maintains leadership in innovation and global influence. This cost advantage opens the door to unmetered and pervasive access to AI, which is sure to be both exciting and highly disruptive.”
Lawrence Pingree, VP, Dispersive: “The biggest benefit of the R1 models is that it improves fine-tuning, chain of thought reasoning, and significantly reduces the size of the model—meaning it can benefit more use cases, and with less computation for inferencing—so higher quality and lower computational costs.”
Mali Gorantla, Chief Scientist at AppSOC (expert in AI governance and application security): “Tech breakthroughs rarely occur in a smooth or non-disruptive manner. Just as OpenAI disrupted the industry with ChatGPT two years ago, DeepSeek appears to have achieved a breakthrough in resource efficiency—an area that has quickly become the Achilles’ Heel of the industry.
Companies relying on brute force, pouring unlimited processing power into their solutions, remain vulnerable to scrappier startups and overseas developers who innovate out of necessity. By lowering the cost of entry, these breakthroughs will significantly expand access to massively powerful AI, bringing with it a mix of positive advancements, challenges, and critical security implications.”
Benchmark Achievements DeepSeek-R1 has proven its superiority across a wide array of tasks:
Educational Benchmarks: Demonstrates outstanding performance on MMLU and GPQA Diamond, with a focus on STEM-related questions.
Coding and Mathematical Tasks: Surpasses leading closed-source models on LiveCodeBench and AIME 2024.
General Question Answering: Excels in open-domain tasks like AlpacaEval2.0 and ArenaHard, achieving a length-controlled win rate of 87.6%.
Impact and Implications
Efficiency Over Scale: DeepSeek-R1’s development highlights the potential of efficient RL techniques over massive computational resources. This approach questions the necessity of scaling data centers for AI training, as exemplified by the $500 billion Stargate initiative led by OpenAI, Oracle, and SoftBank.
Open-Source Disruption: By outperforming some closed-source models and fostering an open ecosystem, DeepSeek-R1 challenges the AI industry’s reliance on proprietary solutions.
Environmental Considerations: DeepSeek’s efficient training methods reduce the carbon footprint associated with AI model development, providing a path toward more sustainable AI research.
Limitations and Future Directions Despite its achievements, DeepSeek-R1 has areas for improvement:
Language Support: Currently optimized for English and Chinese, DeepSeek-R1 occasionally mixes languages in its outputs. Future updates aim to enhance multilingual consistency.
Prompt Sensitivity: Few-shot prompts degrade performance, emphasizing the need for further prompt engineering refinements.
Software Engineering: While excelling in STEM and logic, DeepSeek-R1 has room for growth in handling software engineering tasks.
DeepSeek AI Lab plans to address these limitations in subsequent iterations, focusing on broader language support, prompt engineering, and expanded datasets for specialized tasks.
Conclusion
DeepSeek-R1 is a game changer for AI reasoning models. Its success highlights how careful optimization, innovative reinforcement learning strategies, and a clear focus on efficiency can enable world-class AI capabilities without the need for massive financial resources or cutting-edge hardware. By demonstrating that a model can rival industry leaders like OpenAI’s GPT series while operating on a fraction of the budget, DeepSeek-R1 opens the door to a new era of resource-efficient AI development.
The model’s development challenges the industry norm of brute-force scaling where it is always assumed that more computing equals better models. This democratization of AI capabilities promises a future where advanced reasoning models are not only accessible to large tech companies but also to smaller organizations, research communities, and global innovators.
As the AI race intensifies, DeepSeek stands as a beacon of innovation, proving that ingenuity and strategic resource allocation can overcome the barriers traditionally associated with advanced AI development. It exemplifies how sustainable, efficient approaches can lead to groundbreaking results, setting a precedent for the future of artificial intelligence.
0 notes
r3dblccd · 5 months ago
Text
Wait, so is the voice behind Naevis was created based on differents voice actors? And it's not an actual trainee's voice that's debuting?
3 notes · View notes
mellorocket · 1 year ago
Text
Man I feel like I could write a whole essay about Nope rn and I am so tempted!
8 notes · View notes
dihalect · 9 months ago
Text
anti 'ai' people make 1 (one) well-informed, good-faith argument challenge(impossible)
3 notes · View notes
chipped-chimera · 1 year ago
Text
Sometimes I contemplate making smut fanart/content, but then I remember we live in a puritan tech dystopia and where the everloving fuck would I post it these days?
11 notes · View notes
lwoorl · 2 years ago
Text
I'll say it: "Oh all AI artists do is write a stupid description and immediately get an image with no effort, there's no art in that" is the new "Digital painting doesn't count as art because it takes no effort"
#Look I'm aware there're moral reasons to criticize AI art such as how corporations will use it#and the fact lots of models (not all however) use stolen content#But all you have to do is visit a forum dedicated to AI art to quickly realize it actually takes some effort to make quality images#And honestly from what I've seen those guys are often very respectful of traditional artists if not traditional artists themselves#Not a single bit of 'haha those idiots are working hard when they could simply use AI!' that Tumblr likes to strawman them as#Lots of 'So I did the base with AI and then painted over it manually in Photoshop' and 'I trained this model myself with my own drawings'#And I'm not saying there aren't some guys that are being assholes over it on Twitter#But when you go to an actual community dedicated to it. Honestly these guys are rather nice#I've seen some truly astounding projects#like there was this guy that was using people's scars to create maps of forests and mointains to sort of explore the theme of healing#And this one that took videos of his city and overlayed them with some solarpunk kind of thing#And this one that was doing a collection of dreams that was half AI amd half traditional painting#Anyway the point is you guys are being way too mean to a group of people that genuinely want to use the technology to create cool art#And while I'm aware there are issues related to its use#it's actually really fucked up you're attacking the individual artists instead of corporations???#It's as if you were attacking the chocolate guy over the systemic problems related to the chocolate industry!#And also tumblrs always like 'Oh AI is disgusting I hate AI art so I'll just hate in it without dealing with the issue'#While AI art forums often have posts with people discussing how go use it ethically when applied to commercial use!!#Honestly these guys are doing way more about tackling the issue than tumblr and you should feel bad!!!
15 notes · View notes
wachi-delectrico · 2 years ago
Text
Tbh i don't know what to think of AI art anymore. I don't find any utility, personally, in centring the discussion on law and copyright; there are far more interesting things to discuss on the topic beyond its use as a replacement for human artists/workforce by the upper class
#rambling#i am not saying i think using AI image generation to replace human artists and leave them jobless is a good thing - i do think that is bad#there are real concern on the ethics of its use and creation of image generation models#but i think focusing only on things like how ''off'' or ''inhuman'' it looks or how ''soulless'' it is are not only surface level complaint#but also call to question again the age old debate of what is art and what isn't and why some art is and why some isn't#and also the regard of painting and other forms of visual art production as somehow above photography in the general conscience#i would love to really talk about these things with people but talking about ai art and image generation is a gamble between talking to#an insufferable techbro who only sees profits and an artist who shuts the whole idea off without nuisance#i have seen wonderful projects by human artists using ai image generation software in creative ways for example#are those projects not art? if they are are they only art because they were made by someone already regarded as an artist?#there are also cool ai-generated images by random people who don't regard themselves as artists. are they art? why or why not?#the way AI image generation works - using vast arrays of image samples to create a new image with - has been cited#as a reason why ai-generated images aren't ''real art''. but is that not just a computer-generated collage? is it not real because it was#made by an algorithm?#if i - a human artist - get a bunch of old magazines and show them to an algorithm to generate new things from them#or to suggest ways in which new things could be made#and then i took those suggestions and cut the magazines and made the collage by hand. is that still art? did it at some point become art#or cease to be art?#i think these things are far more intriguing and important to get to the root of ethical AI usage in the 21st century than focusing on laws
7 notes · View notes