#benchmarks
Explore tagged Tumblr posts
Text
Sweet, I can post up to 30 images in one post now! (that is a lot) BEHOLD: The clockwork weapons set, which I believe landed last year during the queens gambit events. As well as the Zephyrite Festival. The prompt was centered around having fancy clockwork surrounded by craftsman like framing. The gears move on the weapons too! Shown with permission. (c) ArenaNet, all rights reserved. Thank you for the opportunity with Volta to make such a fun set.
62 notes
·
View notes
Text
You guys I love her so much. I was so excited for the graphics update and when I spawned into character creator and saw her (her eyes!!!!! So pretty!!!) I literally cried, from joy! Then I continued to cry as I watched the benchmark and saw her with all the new environments and details and… Just *WOW.* She’s missing her hat (Matoya’s Hat) that she never takes off. But that can be a surprise for launch I suppose!!
I feel for the people who aren’t happy with their characters, I do, but these changes really brought Loran to life and I am *so* excited to see her in her proper attire, come 7.0!
It’s full speed ahead on *this* hype train! I am gonna be trying to make it through a full NG+ playthrough and enjoying the events as they come and go until June 28th!! (And probably popping back into the Benchmark to see my future girl on occasion!)
See you all in Dawntrail everyone!!! 💜✨
#ffxiv#final fantasy 14#ff14#ffxiv dawntrail#dawntrail#benchmarks#FFXIV benchmark#dawntrail benchmark
8 notes
·
View notes
Text
Greetings, everyone! I'd like to introduce the updated Athena! For a while, Athena's author has been treating her as half Miqo'te, half Hrothgar. They have done a lot of work to get the character to be a bit more in line with their mental image of Athena.
Now, that image is crystal clear.
Say hello again to Athena Natlho, daughter of a Hrothgar and Miqo'te, this time taking much more after her father.
Of note, we will not be changing any of the past screenshots, stories, or posts, but we will use this version of the character moving forward.
Stay tuned in Dawntrail for more cool screenshots of all the girls together again, and thanks for staying tuned this far.
<3
#ffxiv#ffxiv roleplay#ffxiv oc#ff14#final fantasy 14#ffxiv rp#gpose#gposers#ff14 gpose#ffxiv gpose#Athena Natlho#original character#final fantasy gpose#final fantasy xiv#final#dawntrail#ffxiv benchmark#benchmarks#hrothgar#hrothgal#hrothgirl#female hrothgar#fem hrothgar
15 notes
·
View notes
Text
Hellsguard women are one of the biggest losers of the graphical update. The nose spot was a charm point, and it looks like they tried to wipe it off with a washcloth
#ffxiv#ffxiv roegadyn#roegadame#roegadyn#final fantasy xiv#final fantasy 14#benchmarks#benchmark#ffxiv benchmark#dawntrail#ffxiv dawntrail
6 notes
·
View notes
Text
Qwen2-Math: A new era for AI maths whizzes
New Post has been published on https://thedigitalinsider.com/qwen2-math-a-new-era-for-ai-maths-whizzes/
Qwen2-Math: A new era for AI maths whizzes
.pp-multiple-authors-boxes-wrapper display:none; img width:100%;
Alibaba Cloud’s Qwen team has unveiled Qwen2-Math, a series of large language models specifically designed to tackle complex mathematical problems.
These new models – built upon the existing Qwen2 foundation – demonstrate remarkable proficiency in solving arithmetic and mathematical challenges, and outperform former industry leaders.
The Qwen team crafted Qwen2-Math using a vast and diverse Mathematics-specific Corpus. This corpus comprises a rich tapestry of high-quality resources, including web texts, books, code, exam questions, and synthetic data generated by Qwen2 itself.
Rigorous evaluation on both English and Chinese mathematical benchmarks – including GSM8K, Math, MMLU-STEM, CMATH, and GaoKao Math – revealed the exceptional capabilities of Qwen2-Math. Notably, the flagship model, Qwen2-Math-72B-Instruct, surpassed the performance of proprietary models such as GPT-4o and Claude 3.5 in various mathematical tasks.
“Qwen2-Math-Instruct achieves the best performance among models of the same size, with RM@8 outperforming Maj@8, particularly in the 1.5B and 7B models,” the Qwen team noted.
This superior performance is attributed to the effective implementation of a math-specific reward model during the development process.
Further showcasing its prowess, Qwen2-Math demonstrated impressive results in challenging mathematical competitions like the American Invitational Mathematics Examination (AIME) 2024 and the American Mathematics Contest (AMC) 2023.
To ensure the model’s integrity and prevent contamination, the Qwen team implemented robust decontamination methods during both the pre-training and post-training phases. This rigorous approach involved removing duplicate samples and identifying overlaps with test sets to maintain the model’s accuracy and reliability.
Looking ahead, the Qwen team plans to expand Qwen2-Math’s capabilities beyond English, with bilingual and multilingual models in the pipeline. This commitment to inclusivity aims to make advanced mathematical problem-solving accessible to a global audience.
“We will continue to enhance our models’ ability to solve complex and challenging mathematical problems,” affirmed the Qwen team.
You can find the Qwen2 models on Hugging Face here.
See also: Paige and Microsoft unveil next-gen AI models for cancer diagnosis
Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.
Explore other upcoming enterprise technology events and webinars powered by TechForge here.
Tags: ai, alibaba cloud, artificial intelligence, maths, models, qwen, qwen2, qwen2-math
#2023#2024#ai#ai & big data expo#AI models#Alibaba#alibaba cloud#amp#approach#Articles#artificial#Artificial Intelligence#automation#benchmarks#Big Data#Books#Cancer#cancer diagnosis#claude#claude 3#claude 3.5#Cloud#code#Companies#Competitions#comprehensive#conference#contamination#cyber#cyber security
3 notes
·
View notes
Text
So uh
I think SE's min requirements for Dawntrail might have been a tiny bit overzealous
2 notes
·
View notes
Photo
So taking the BJU faculty salary average and comparing it to their benchmarks, you can see that there was a bump for 2020-21, but then things leveled out.
#Bob Jones University#Financial Crisis#Charts and Graphs#Revenue#Expenses#990s#501c3#IPEDS#Salary#Benchmarks#Faculty
1 note
·
View note
Link
0 notes
Text
Phi 4 is just 14B But Better than llama 3.1 70b for several tasks.
0 notes
Audio
Listen to: Next Year by Benchmarks
every morning when I wake up it's the same old thing I'm always waiting for the summer, I'm waiting for spring I can't stay awake in daylight and God knows I can't sleep at night I'm tired of this room, it's always just like the last one always less of a home and more like a place to crash so can I stay the night at your place till the weekend or the winter ends I know next year things will be better
1 note
·
View note
Text
Weekly output: Snapdragon Windows software compatibility, Qualcomm's connected-car ambitions, Snap Spectacles '24, Mark Vena podcast, Bluesky business plans, Qualcomm 8-core Snapdragon X Plus benchmarking
Before I get to my usual list of what got published under my name this week, I need to vent about what did not get published by the Washington Post this week: the endorsement of Kamala Harris that, by multiple accounts, was quashed by imported-from-London publisher Will Lewis at the direction of owner Jeff Bezos. The insultingly vapid explanation by Lewis can only be read as Bezos attempting to…
View On WordPress
#benchmarks#Bluesky#connected cars#election security#Hawaii#Mark Vena#Maui#poll worker#Qualcomm#Snap Spectacles#SnapChat#Snapdragon Summit#Snapdragon X#X#Xitter
0 notes
Text
Doubt I'll post many of these, but here's a side-by-side of current settings in current patch, vs current settings in the benchmark.
#ffxiv#ff14#final fantasy 14#gpose#carmen weaver#dawntrail#benchmarks#final fantasy xiv#FFXIV benchmark
7 notes
·
View notes
Link
The arrival of laptops powered by Qualcomm's Snapdragon X Elite processor and Microsoft's Copilot+ system has sparked excitement in the tech world. However, early reviews paint a mixed picture, with impressive performance in some areas overshadowed by disappointing gaming capabilities. Snapdragon X Elite Initial Impressions: A Promising Chipset with a Gaming Hiccup Market Debut and Early Reviews: Laptops featuring the Snapdragon X Elite and Copilot+ have hit the market, and reviewers are sharing their initial impressions. Focus on Productivity: Qualcomm's primary focus for the Snapdragon X Elite appears to be productivity tasks, as evidenced by their marketing approach. Gaming Performance Concerns: Unfortunately, early reviews reveal underwhelming performance in 3D games. This raises questions about the platform's suitability for serious gamers. While the Snapdragon X Elite boasts impressive features for productivity and multitasking, its gaming capabilities fall short of expectations. Navigating Emulation Challenges A key hurdle for Snapdragon X Elite laptops in the gaming realm is emulation: x86/x64 Emulation Issues: Some games struggle to run due to compatibility issues when emulating the x86/x64 architecture on the ARM-based Snapdragon X Elite. Limited configurability: Even for games that do launch, reviewers encountered limitations in adjusting resolution and achieving acceptable frame rates with upscaling technologies. These emulation hurdles create frustration for gamers accustomed to a wider range of graphical options and smoother gameplay. Benchmarking the Performance Gap Let's delve into specific benchmarks showcased in early reviews: The Witcher 3: Wild Hunt: A popular reviewer struggled to run the game at a playable frame rate. On "low" settings and 720p resolution, TechTablets reported an average of 35 fps with drops as low as 20 fps in Novigrad, a demanding area of the game. Dirt 5: Another reviewer, Matthew Moniz, tested the racing simulator Dirt 5. The Snapdragon X graphics delivered a meager average of 23 fps on "medium" settings at 1280x1080 resolution. These benchmark results highlight the current performance limitations of Snapdragon X Elite laptops in running modern 3D video games. Qualcomm's Priorities: Power Beyond Games It's important to consider Qualcomm's stated priorities for the Snapdragon X Elite: Focus on Productivity: Several reviewers emphasize that Qualcomm didn't prioritize gaming during the processor's marketing and development. AI and Performance: The Snapdragon X excels in other areas, with a powerful NPU (Neural Processing Unit) designed for AI tasks and solid performance for non-gaming workloads, even while emulating x86-based systems. For users seeking an all-in-one device that prioritizes productivity, battery life, and AI capabilities, the Snapdragon X Elite may still be a compelling option. A Future of Improved Gaming Performance? While the current situation might appear bleak for gamers, there's a potential ray of hope: Adreno Control Panel Anticipation: Reviewers suggest that future improvements could come from a dedicated Adreno control panel from Qualcomm. Software Optimization: Further software optimization could also enhance gaming performance on Snapdragon X Elite devices. These potential advancements could significantly alter the overall gaming experience on Snapdragon X Elite laptops. Frequently Asked Questions: Q: Are Snapdragon X Elite laptops good for gaming? A: Early reviews suggest current performance isn't ideal for demanding 3D games. However, future software and driver updates could improve gaming capabilities. Q: Why do some games not work on Snapdragon X Elite laptops? A: Emulation issues can prevent certain games designed for x86/x64 architecture from running smoothly on the ARM-based Snapdragon X Elite. Q: Does the Snapdragon X Elite processor have any strengths? A: Yes, the Snapdragon X Elite excels in areas like productivity tasks, battery life, and AI applications due to its powerful NPU. Q: Should I buy a Snapdragon X Elite laptop for gaming? A: If your primary focus is high-performance gaming, it may be best to consider laptops with traditional x86/x64 processors. However, if portability, battery life, and strong productivity features are your priorities, a Snapdragon X Elite laptop.
#ARMarchitecture#benchmarks#Copilot#Dirt5#Emulation#Laptopgaming#microsoft#Qualcomm#SnapdragonXElite#SnapdragonXEliteGamingPerformance#TheWitcher3#x86x64Architecture
0 notes
Text
Beyond Chain-of-Thought: How Thought Preference Optimization is Advancing LLMs
New Post has been published on https://thedigitalinsider.com/beyond-chain-of-thought-how-thought-preference-optimization-is-advancing-llms/
Beyond Chain-of-Thought: How Thought Preference Optimization is Advancing LLMs
A groundbreaking new technique, developed by a team of researchers from Meta, UC Berkeley, and NYU, promises to enhance how AI systems approach general tasks. Known as “Thought Preference Optimization” (TPO), this method aims to make large language models (LLMs) more thoughtful and deliberate in their responses.
The collaborative effort behind TPO brings together expertise from some of the leading institutions in AI research.
The Mechanics of Thought Preference Optimization
At its core, TPO works by encouraging AI models to generate “thought steps” before producing a final answer. This process mimics human cognitive processes, where we often think through a problem or question before articulating our response.
The technique involves several key steps:
The model is prompted to generate thought steps before answering a query.
Multiple outputs are created, each with its own set of thought steps and final answer.
An evaluator model assesses only the final answers, not the thought steps themselves.
The model is then trained through preference optimization based on these evaluations.
This approach differs significantly from previous techniques, such as Chain-of-Thought (CoT) prompting. While CoT has been primarily used for math and logic tasks, TPO is designed to have broader utility across various types of queries and instructions. Furthermore, TPO doesn’t require explicit supervision of the thought process, allowing the model to develop its own effective thinking strategies.
Another key difference is that TPO overcomes the challenge of limited training data containing human thought processes. By focusing the evaluation on the final output rather than the intermediate steps, TPO allows for more flexible and diverse thinking patterns to emerge.
Experimental Setup and Results
To test the effectiveness of TPO, the researchers conducted experiments using two prominent benchmarks in the field of AI language models: AlpacaEval and Arena-Hard. These benchmarks are designed to evaluate the general instruction-following capabilities of AI models across a wide range of tasks.
The experiments used Llama-3-8B-Instruct as a seed model, with different judge models employed for evaluation. This setup allowed the researchers to compare the performance of TPO against baseline models and assess its impact on various types of tasks.
The results of these experiments were promising, showing improvements in several categories:
Reasoning and problem-solving: As expected, TPO showed gains in tasks requiring logical thinking and analysis.
General knowledge: Interestingly, the technique also improved performance on queries related to broad, factual information.
Marketing: Perhaps surprisingly, TPO demonstrated enhanced capabilities in tasks related to marketing and sales.
Creative tasks: The researchers noted potential benefits in areas such as creative writing, suggesting that “thinking” can aid in planning and structuring creative outputs.
These improvements were not limited to traditionally reasoning-heavy tasks, indicating that TPO has the potential to enhance AI performance across a broad spectrum of applications. The win rates on AlpacaEval and Arena-Hard benchmarks showed significant improvements over baseline models, with TPO achieving competitive results even when compared to much larger language models.
However, it’s important to note that the current implementation of TPO showed some limitations, particularly in mathematical tasks. The researchers observed that performance on math problems actually declined compared to the baseline model, suggesting that further refinement may be necessary to address specific domains.
Implications for AI Development
The success of TPO in improving performance across various categories opens up exciting possibilities for AI applications. Beyond traditional reasoning and problem-solving tasks, this technique could enhance AI capabilities in creative writing, language translation, and content generation. By allowing AI to “think” through complex processes before generating output, we could see more nuanced and context-aware results in these fields.
In customer service, TPO could lead to more thoughtful and comprehensive responses from chatbots and virtual assistants, potentially improving user satisfaction and reducing the need for human intervention. Additionally, in the realm of data analysis, this approach might enable AI to consider multiple perspectives and potential correlations before drawing conclusions from complex datasets, leading to more insightful and reliable analyses.
Despite its promising results, TPO faces several challenges in its current form. The observed decline in math-related tasks suggests that the technique may not be universally beneficial across all domains. This limitation highlights the need for domain-specific refinements to the TPO approach.
Another significant challenge is the potential increase in computational overhead. The process of generating and evaluating multiple thought paths could potentially increase processing time and resource requirements, which may limit TPO’s applicability in scenarios where rapid responses are crucial.
Furthermore, the current study focused on a specific model size, raising questions about how well TPO will scale to larger or smaller language models. There’s also the risk of “overthinking” – excessive “thinking” could lead to convoluted or overly complex responses for simple tasks.
Balancing the depth of thought with the complexity of the task at hand will be a key area for future research and development.
Future Directions
One key area for future research is developing methods to control the length and depth of the AI’s thought processes. This could involve dynamic adjustment, allowing the model to adapt its thinking depth based on the complexity of the task at hand. Researchers might also explore user-defined parameters, enabling users to specify the desired level of thinking for different applications.
Efficiency optimization will be crucial in this area. Developing algorithms to find the sweet spot between thorough consideration and rapid response times could significantly enhance the practical applicability of TPO across various domains and use cases.
As AI models continue to grow in size and capability, exploring how TPO scales with model size will be crucial. Future research directions may include:
Testing TPO on state-of-the-art large language models to assess its impact on more advanced AI systems
Investigating whether larger models require different approaches to thought generation and evaluation
Exploring the potential for TPO to bridge the performance gap between smaller and larger models, potentially making more efficient use of computational resources
This research could lead to more sophisticated AI systems that can handle increasingly complex tasks while maintaining efficiency and accuracy.
The Bottom Line
Thought Preference Optimization represents a significant step forward in enhancing the capabilities of large language models. By encouraging AI systems to “think before they speak,” TPO has demonstrated improvements across a wide range of tasks, potentially revolutionizing how we approach AI development.
As research in this area continues, we can expect to see further refinements to the technique, addressing current limitations and expanding its applications. The future of AI may well involve systems that not only process information but also engage in more human-like cognitive processes, leading to more nuanced, context-aware, and ultimately more useful artificial intelligence.
#ai#AI development#AI models#AI research#AI systems#Algorithms#analyses#Analysis#applications#approach#arena#Art#artificial#Artificial Intelligence#benchmarks#bridge#chain of thought reasoning#challenge#chatbots#collaborative#complexity#comprehensive#content#customer service#data#data analysis#datasets#development#domains#efficiency
3 notes
·
View notes
Text
Ģ̷̧̻̙̮̣̗̺̞̖̟̙͎̫̒ I̴̡̡̝͔̼̗̼̟̥̣͍̙̯͒̒̄̍̒̄͆̚͜͝ V̴̢̛̤̬͔̤͙̖͙͇͔̺̫̬͎͈̱̆̃̈́̐̌̔̐̈̽͜ Ê̷̳͍̗̩̪̯̘̯̣̮̥̤͙̱̟̜͗́̍̊͗̊́̉̈́̈́̓̇̀͘͝͝ ̸̧̜͚̱͔̺̗̠̤̼͍̼̳̽͛̏̀́̾̉̍͛̓̕͝M̵̧̨͇̝͔͔͍̟̫̗̦͒̂͋̇̾̌͐ Ȩ̷̪̯̼͐̈͋̏̈́̐̀͝ ̸̡͙͚̺̤̳́ Ḅ̵̘̲͇͙̳̀̃ À̴̡̨͈̲͔̤̝͚̻̬͕͕̘̪̳̺̽͂ͅ C̷̢̛̟̯̱̩̖̖̝̓̉͛̏̒̈́̉͑̽͂̌͝ Ķ̷̳̠͔̥̟̥͚̭͖͓̻̿̀̐̐̿̂̅͑̃͋͋́̀́͘͘͜͝ͅ ̴̢̨̛͉̪̲͇̮̺̝̼͕̩̬̱M̸̧̥͔͍̘̦̞̻̯̮̼̗̻̫̉͊̋͋̔̆̆̊͛̉̈́̊͆̕͜͜͝ Ỳ̶̢̨̝̭̼͕̹̭̻̮̥͚̳̳̳̑͑͜͠ ̶̢̮͙̬̭̥̫͈͉̥͖̖̰͋̈́̇̾̅̌͂̓̕͘͜Ę̵̰̈̽̇͗͐͐͝ Y̸̧̧̨̲̩̣̗̏͆̏ Ẽ̸̡̨̛̛̜̖͔̿̅̄͋͐̉͒̀͛͝͠͝ B̶̧̨̢̝͍̗̖̱̺͚̝̼̣̬̪͊̈́̋̾̀̀̎̏̊͑̈̋͆͑͋ͅ Ą̵̳͍̥͇̪̀͛̈́̃̍̂̀͗̄̑̀͝ L̷͓̙͖͇̪͇̤͔̓̆̾̾̿̎̌͋̉̾̐͂̕͜͝ L̶̲͇͚̇͑̈́̈́ S̴̨̮̪͔͉̦͖̹̞̯͍̭̎̏́͠ͅ
0 notes