#chain of thought reasoning | Explore Tumblr posts and blogs

jcmarchi · 4 months ago

Text

Beyond Chain-of-Thought: How Thought Preference Optimization is Advancing LLMs

New Post has been published on https://thedigitalinsider.com/beyond-chain-of-thought-how-thought-preference-optimization-is-advancing-llms/

Beyond Chain-of-Thought: How Thought Preference Optimization is Advancing LLMs

A groundbreaking new technique, developed by a team of researchers from Meta, UC Berkeley, and NYU, promises to enhance how AI systems approach general tasks. Known as “Thought Preference Optimization” (TPO), this method aims to make large language models (LLMs) more thoughtful and deliberate in their responses.

The collaborative effort behind TPO brings together expertise from some of the leading institutions in AI research.

The Mechanics of Thought Preference Optimization

At its core, TPO works by encouraging AI models to generate “thought steps” before producing a final answer. This process mimics human cognitive processes, where we often think through a problem or question before articulating our response.

The technique involves several key steps:

The model is prompted to generate thought steps before answering a query.

Multiple outputs are created, each with its own set of thought steps and final answer.

An evaluator model assesses only the final answers, not the thought steps themselves.

The model is then trained through preference optimization based on these evaluations.

This approach differs significantly from previous techniques, such as Chain-of-Thought (CoT) prompting. While CoT has been primarily used for math and logic tasks, TPO is designed to have broader utility across various types of queries and instructions. Furthermore, TPO doesn’t require explicit supervision of the thought process, allowing the model to develop its own effective thinking strategies.

Another key difference is that TPO overcomes the challenge of limited training data containing human thought processes. By focusing the evaluation on the final output rather than the intermediate steps, TPO allows for more flexible and diverse thinking patterns to emerge.

Experimental Setup and Results

To test the effectiveness of TPO, the researchers conducted experiments using two prominent benchmarks in the field of AI language models: AlpacaEval and Arena-Hard. These benchmarks are designed to evaluate the general instruction-following capabilities of AI models across a wide range of tasks.

The experiments used Llama-3-8B-Instruct as a seed model, with different judge models employed for evaluation. This setup allowed the researchers to compare the performance of TPO against baseline models and assess its impact on various types of tasks.

The results of these experiments were promising, showing improvements in several categories:

Reasoning and problem-solving: As expected, TPO showed gains in tasks requiring logical thinking and analysis.

General knowledge: Interestingly, the technique also improved performance on queries related to broad, factual information.

Marketing: Perhaps surprisingly, TPO demonstrated enhanced capabilities in tasks related to marketing and sales.

Creative tasks: The researchers noted potential benefits in areas such as creative writing, suggesting that “thinking” can aid in planning and structuring creative outputs.

These improvements were not limited to traditionally reasoning-heavy tasks, indicating that TPO has the potential to enhance AI performance across a broad spectrum of applications. The win rates on AlpacaEval and Arena-Hard benchmarks showed significant improvements over baseline models, with TPO achieving competitive results even when compared to much larger language models.

However, it’s important to note that the current implementation of TPO showed some limitations, particularly in mathematical tasks. The researchers observed that performance on math problems actually declined compared to the baseline model, suggesting that further refinement may be necessary to address specific domains.

Implications for AI Development

The success of TPO in improving performance across various categories opens up exciting possibilities for AI applications. Beyond traditional reasoning and problem-solving tasks, this technique could enhance AI capabilities in creative writing, language translation, and content generation. By allowing AI to “think” through complex processes before generating output, we could see more nuanced and context-aware results in these fields.

In customer service, TPO could lead to more thoughtful and comprehensive responses from chatbots and virtual assistants, potentially improving user satisfaction and reducing the need for human intervention. Additionally, in the realm of data analysis, this approach might enable AI to consider multiple perspectives and potential correlations before drawing conclusions from complex datasets, leading to more insightful and reliable analyses.

Despite its promising results, TPO faces several challenges in its current form. The observed decline in math-related tasks suggests that the technique may not be universally beneficial across all domains. This limitation highlights the need for domain-specific refinements to the TPO approach.

Another significant challenge is the potential increase in computational overhead. The process of generating and evaluating multiple thought paths could potentially increase processing time and resource requirements, which may limit TPO’s applicability in scenarios where rapid responses are crucial.

Furthermore, the current study focused on a specific model size, raising questions about how well TPO will scale to larger or smaller language models. There’s also the risk of “overthinking” – excessive “thinking” could lead to convoluted or overly complex responses for simple tasks.

Balancing the depth of thought with the complexity of the task at hand will be a key area for future research and development.

Future Directions

One key area for future research is developing methods to control the length and depth of the AI’s thought processes. This could involve dynamic adjustment, allowing the model to adapt its thinking depth based on the complexity of the task at hand. Researchers might also explore user-defined parameters, enabling users to specify the desired level of thinking for different applications.

Efficiency optimization will be crucial in this area. Developing algorithms to find the sweet spot between thorough consideration and rapid response times could significantly enhance the practical applicability of TPO across various domains and use cases.

As AI models continue to grow in size and capability, exploring how TPO scales with model size will be crucial. Future research directions may include:

Testing TPO on state-of-the-art large language models to assess its impact on more advanced AI systems

Investigating whether larger models require different approaches to thought generation and evaluation

Exploring the potential for TPO to bridge the performance gap between smaller and larger models, potentially making more efficient use of computational resources

This research could lead to more sophisticated AI systems that can handle increasingly complex tasks while maintaining efficiency and accuracy.

The Bottom Line

Thought Preference Optimization represents a significant step forward in enhancing the capabilities of large language models. By encouraging AI systems to “think before they speak,” TPO has demonstrated improvements across a wide range of tasks, potentially revolutionizing how we approach AI development.

As research in this area continues, we can expect to see further refinements to the technique, addressing current limitations and expanding its applications. The future of AI may well involve systems that not only process information but also engage in more human-like cognitive processes, leading to more nuanced, context-aware, and ultimately more useful artificial intelligence.

3 notes · View notes

protoslacker · 2 years ago

Quote

It will be possible to create better teachers out of ChatGPT tech. What led me to that is that it certainly could help people write better bug reports. It would have far more patience than a busy developer, whose job is to fix bugs, not decypher a user's (justifiably) imprecise understanding of how software works. But a chatbot could help, patiently asking the user questions. Then, of course, over time -- the user would learn how to do it themselves, and get this -- they would also learn how software works. There might be some great programmers out there who don't know they are. 😄

Dave Winer at Scripting News

#ai #learning #chain of thought reasoning

6 notes · View notes

lotus-pear · 1 year ago

Text

regret

#literally excuse the shitty anatomy and cell shading i was thinking abt chuuyas reaction to what he'd done and i decided to make it skk #bc skk copium :')#the way i've hated dazai so fucking much but i still cried like a bitch when he died #he's not dead the bsd fandom has this phase like the elevator chapter where we're like ''dazai's not gonna make it he's done for!!''#and then he comes back next chapter like surprise bitches yall thought i was dead lmao #this chapter fucking HURT for skk shippers tho like we rly lost this time around huh #deluding myself into thinking that chuuya used gravity manipulation to slow the bullet #bc we didn't see a bullet hole behind dazais head like when chuuya shot his shoulder even though the bullet to his skull was fired at close #the reason theres a wound is bc the compressed air that was still fired was enough to wound him #and the shock wave that followed caused him to pass out bc of the sudden tension to his head intermingled with the blood loss and poison #we also know dazai can control his heart rate at will so maybe he can drop his pulse to zero for like thirty secs #enough to make fyodor believe he's dead #in the event that all of this is untrue and dazai rly does die the way my entire being will go numb and cold and dead #knowing that fyodor will most likely use dazai's death as a weapon against chuuya effectively chaining him to his side #like bffr chuuya may dislike dazai but that's his partner his reflection the boy that makes him desperately want to be human #dazai is the embodiment of chuuyas humanity and once chuuya loses that tether to his human side he will snap and the facade will shatter #and we will truly see chuuya unhinged with nothing more keeping him bound to his mortal shell #this wasn't the skk reunion we wanted asigiri what the fuck :(#bungou stray dogs #bungo stray dogs #bsd #nakahara chuuya #chuuya nakahara #osamu dazai #dazai osamu #skk #soukoku #lotus draws

2K notes · View notes

waxalas · 12 days ago

Text

What if the reason L got depressed in the Yotsuba arc is because he is one massively emotionally detached doofus in love, and Light losing his memories is actually L losing his excuse not to let himself *be* in love?

Now reinterpret this (ch38 - about L being depressed):

L: I'm human. That's not allowed? Light: No, it's not. The way you talk, it's like you won't be satisfied unless I'm Kira.

Do you think Light is saying: "Can't you just love me for me, and not for the fucking game you twisted coward?!?"

I just like how Kira is what brings them together, but also what keeps them apart. Who doesn't love a good contradiction like that?

#Light telling L he's not allowed to be human is such a boss move #i like how afterwards L says he's realizing Light is right but he does not change his ways #(lowkey the reason they both agreed to the chain is bc they wanted a way to try out the relationship “safely”)#subconsciously ofc #we don't admit our feelings around here #death note #light yagami #l lawliet #lawlight #DN thoughts

65 notes · View notes

quixoticanarchy · 3 months ago

Text

please. “there is no ethical consumption under late stage capitalism” does not mean that all consumption is EQUALLY unethical. it’s a structural critique of systems of production it’s not a get-out-of-guilt-free carte blanche to invoke while carelessly consuming whatever you want

#knowing that there’s no perfect or accessible choice for many things doesn’t mean every choice is equally justified #i don’t think medical supply chains are v ethical but i still consume a lot of plastic for medical reasons #but where i am able to make choices i at least want to think abt the ethics yknow? even if it’s all bad?#i thought y’all loved lesser evils #consumerism

67 notes · View notes

the0maski · 1 year ago

Text

Not necessarily LU based, but in the overall “canon” -verse the Hero of Time doesn’t exist in two timelines, right? He died in the Downfall Timeline and never came to be a hero in the Child Timeline.

Legend and Hyrule probably only heard about the title Fallen Hero, never his true title. Same goes for Twilight, only knowing Time as the Hero’s Shade or Cursed Swordsman. Which means, only Wind knows about Time, and he is the only one with a legend about him.

Funny detail, if Hyrule Warriors would fall under the Child Timeline that would mean that Time was present during the whole timeline, it gets better if Mask was only dragged into the war, because the goddesses pulled a: You broke it, you fix it! Making it more fun, Time becomes just history’s biggest mystery and meme under historians. To the point where there are huge debates about him, because some records say that he lived after the Hyrulean civil war, but at the same time he is mention being at the War of Ages which was two whole Eras later! Was he ever a hero? Why are they no family records about him? Was he really just a forest spirit, was he even hylian?

Flora would absolutely have a field day, if the chain ever stepped only a foot in Wild’s Hyrule. Seeing how she is extremely interested in history.

For real: there is to little mention, in the fandom, that the Hero of Time is only known to one person (Or two if Mask had been in the war). Everyone else had never heard of him, less knew that there was a hero that came after Four. How had they all found each other? I know that, there is a fanfic troupe of the chain slowly forming, while hopping through portals ending in a new Link’s Hyrule. But in comic, the first time they all walked through a portal together, was after visiting Malon. Meaning they all met in Time’s Hyrule, in the Timeline were he is no “hero”. How did they find Time, since asking for a hero would not worked? What makes me also believe, that Time is only leader of the group, because he is the oldest and apparently has a high rank among Hyrule’s military. Maybe he showed the Triforce mark on his hand? But less likely since he hides it most times.

My money goes to Wind or Twilight. Wind talking randomly to this soldier, about the legend of his time, not knowing he is speaking directly to said person of the legend. The rancher only because, he got flashbacks of Shade, and he needed to find out more. Bonus would be Warriors lost at words, because that one deity that sometimes possessed his little brother, became the Milkman!

#the only reason Time can be cryptic is literally only because NO ONE knows his story #except Wind but than he just pulls Termina out of his leaves #hyrule historians hate him so much for time traveling #linked universe #lu time #lu thoughts #lu chain #linkeduniverse #lu four #lu wild #lu legend #lu theory #lu wind #lu mask #lu headcanons

172 notes · View notes

losver07 · 1 month ago

Text

was working on my wip and realised this scene is so wolfstar coded so ummm here ya go (sorry in advance for the awful translation lol)

also this is veeery long so i'm putting most of it under the cut

tw: mention of death, harsh(ish) lenguage

★

"Then came the ambulance and the police,” he murmurs, his eyes fixed somewhere in the room, mind showing him once again the image of Sirius' tired smile. "They gave me a blanket. I felt stupid in it."

John, observing him with deep eyes, full of compassion, nods. Remus figures he can't show it, the pity. That it's part of his job not being able to say Oh, you poor thing and that, instead, he must be professional. And it's not that John is bad at it, at hiding what he thinks; it's just the eyes.

It's impossible to lie with one's eyes. Sirius' always shine, even if he insists on wearing the blackest clothes.

Shined. Not anymore. And he doesn't dress in black anymore, it's Remus who has to mourn now, instead of him. And for him.

"How are you feeling?" the psychologist asks, and Remus makes an effort not to cry.

"I don't know," he answers, honest. He doesn't know what words to use. "Bad."

Not enough. John gestures at him to keep talking, to elaborate. He always does that. It's cruel.

Remus looks down at a ring he takes off his finger, and proceeds to watch it turn in his hands as he fidgets with it. It was Sirius'. Everything he owns was either his or reminds him of him in some way. Even the smallest of things, the silliest of details.

If only he could get rid of it all. If only he knew that'd make him forget.

"It's like I don't really believe this is real,” he says, without lifting his gaze from the steel ring. It's carved in a checkered pattern, a chess board that extends and hugs the owner's finger like a ribbon. It's not excessively visible but, if you brush your finger against the metal, you can feel the shapes against your skin, kissing your fingertips like he once did. That feels like so long ago, though. “I... I'm sad, obviously, but also angry. I think it was selfish of him."

Before it had been his, Sirius', the ring had belonged to Regulus. It had been silver then. Sirius turned it into steel when he'd received it from his brother, who got it from their father, whose father had gifted him it, and so on. It must be hundreds of years old.

"Selfish?"

He'll probably ask to be buried with it. If it's not worn on his left hand, it will be trapped on a chain around his neck.

"Yeah, I dunno," he shrugs. He doesn't know how to explain himself. He knows how he feels, he just finds it difficult to believe that anyone could understand it. He tries anyway. "He's gone and he's left us all here as if we didn't have enough problems of our own," he says. "Like, now I have to be myself, which is already tough enough, and also be him for James and Peter and Harry and... Oh, God, Harry..." He shuts his eyes. He needs to breathe. He closes his hand over the ring, and looks at John. "But I need him too. And I don't have him. I don't have anyone to treat me like he did. So, I don't know."

The therapist nods again. When he started the sessions, Remus thought it was weird that John didn't take any notes, like in the movies. It might sound stupid, but he imagined someone constantly writing on a notepad, making a record of every word that came out of his mouth.

It turns out John only uses his notebook to write dates and appointments down; that he actually listens to what he says, instead of analysing every sentence as if it were a mathematical problem.

He's been lucky, and he knows it. At least in this, he's been lucky.

"Do you feel responsible for what happened?" He asks, and Remus thinks about it for a second. Now that the unease has lessened, he's left with just the cold on the tip of his nose and the metal on his fingers. He misses hugging Sirius on cold days like this one.

"Yes," he answers. No point in lying.

John stares at him. Elaborate, he's probably thinking. He always looks at him like that when he wants to make Remus talk.

"I'm the one who was supposed to take care of him," he says then. "And, instead of that, what I did was use him to make him help me with my shit. And even after he's gone I'm still whining about him not being here to give me cuddles. I don't know. Maybe I'm the one who's being selfish.

The psychologist, whose diploma is Remus now observing, makes a face.

"Wanting love is not selfish, Remus," he says, so soft and kind it almost makes Remus feel small, vulnerable and about to break. Or already broken.

"But taking the love away from someone and keeping it to yourself is," he objects.

"You think that's what you did? Taking the love from him?"

"I don't know," he says, and before John can ask him to explain, he continues: "I think maybe if I'd made things right he'd still be here."

The air is still for a few seconds, both in the room and in the street across the window, as well as inside of Remus' lungs, who holds his breath in an attempt to make the ache on his chest go away. It doesn't work.

"It wasn't your fault that he suffered," Josh tells him, but he's been told so many lies he doesn't need to think to detect the lie.

"But it was that he didn’t stop suffering," he tells Mr Too Good For Taking Notes. He should've had that noted. "I should've done something. It's what I'd to have done."

John, wanting to understand but being apparently incapable of it, furrows his brows a bit. The expressions only last a second, and is not even that exaggerated, but Remus sees it anyway. The doubt.

"You think it was your purpose?" He asks. He acts interested. Sometimes he almost even makes Remus forget that he's paid for what he does. That he wouldn't be there if it weren't for the money. That he's got better things to do than...

"Helping him?" Remus asks, trying not to sound too aggressive, but probably failing. "Yes."

"And do you think you were, say, destined to save him?"

"Yes," he agrees. A bit cheesy his personal taste, but, yes, that's what he believes. Why lie, if he's not going to write it down, even.

"But, if it was destiny, how could you have avoided it?"

That feels like a boot to the stomach. He doesn't quite know wether it confuses him or it makes him angry but, either way, he doesn't know what to answer. Perhaps not having thought of it earlier is what irritates him and puts him, once again, in front of a mirror in which a disappointment shines.

He thinks for a bit. Then speaks.

"Trying harder," he says. "Being better."

"No, Remus; is not about trying," his confidant tells him, with a smile that could either indicate complicity or compassion. "You did all that you could, and more. And, still, you couldn't change it, nor can you now."

For some reason, that hurts. Rather, it stings. Both in his open wounds and his sore eyes.

"And what do I do?" He asks. His voice doesn't seem to want to know the answer, as it doesn't cooperate in making itself heard. He swallows and takes a deep breath, letting Sirius' ring slide back into his finger, where it should always have stayed.

"Think about what you did achieve," John offers, so careful it seems almost meticulous. "You made him happy for a time, you gave him peace. You made him feel safe, too. Confident. You helped each other. That's good."

"But he's dead," Remus says. He's not sure he's used that word since it happened. It's not likely, seeing how much it hurts pronouncing it. He's spent over a month circling around and avoiding one of those damned words, the ones that feel like mines in an already ruined field. He presses his lips and looks at John, cheeks wet with rivers of salty water. "That's bad."

"Yes," the therapist agrees. "That is bad."

#crazy about the therapists name being john #cause the original isnt wolfstar so john is a perfectly normal name (and its actually jon)#but in this context it could be interpreted as a conversation with himself and omg aaaaa #also the “if i cant have his ring on my finger i will have it on a chain around my neck” HELLO? im fangirling to my own writing i know #but like #that translates to “if i cant marry him/live a life with him i will at least hold whats left of him close to my heart”#AND THUS I DIE #sorry i just love this scene so much im going nuts #also um ignore the james & lily & peter & harry mention cause i was lazy & didnt know how to make the context make sense lol #wolfstar #remus lupin #sirius black #marauders era #the marauders #dead gay wizards #dead gay wizards from the 70s #moony wormtail padfoot and prongs #tw death #tho if youre sentive to death maybe rethink being on this fandom (for your own personal good)#uhhh idk what else to tag #enjoy the angst #losver fangirls #losver writes for some reason #losver is sad #btw just thought it would have been better if it was sirius talking about finding james dead??? like aaaaa but anyway whats done is done #wolfstar brainrot #wolfstar supremacy #wolfstar microfic #not so micro lol #weirdly i love writing therapy sessions (might be cause i need one)

26 notes · View notes

dudeshusband · 1 month ago

Text

idk why it's suddenly fine to some people to ship with a canon rapist if he's from some, quite frankly ugly looking, popular-on-tumblr horror game

#typewriter dings #with a dumbass name i might add #yeah i'm just tired of seeing it in general #but folks shipping with this guy yanks my chain more #(this is not a vague post. i had this thought after a server i'm in finally banned it)#(not putting the name as to not end up in the tag. i don't want to be a hater in front of all the fans for no damn reason)

25 notes · View notes

mirensiart · 2 months ago

Note

So, funny story, I just discovered KeyChain and I love them very dearly, saw a bunch of other people drawing cute pictures, wanted to get in on it-

The thing is, I saw that original post about them but not the reblog with bullet points, originally. I read that they were post Spirit Tracks (my beloved I love st so much omg) and went “oh. Trains exist. I need them to argue on top of a train right now.” Now, you might see the problem: train surfing does not fit a character who hates riding trains. Which I did not realize was the case until I’d already started sketching. I cannot finish this drawing.

But I already had Chain so I shopped him onto a surfboard instead. Does it make sense? No. But does he have a shark tooth necklace? Yes.

OK BUT THIS IS THE BEST THING EVER⁉️⁉️⁉️

The great sea is still around in this hyrule along with the trains, so surfer chain COULD BE POSSIBLE

Anyway this made my entire week LMAO THANK YOU SO MUCH !!!!!!!!

#fun fact #the reason why he hates trains is cause he TRIED riding one with his chain failes miserably and got very injured #so u nailed his thought process actually riding trains is something he's done before (but failed)#miry's ask box #key&chain loz ocs

31 notes · View notes

necrotic-nephilim · 6 months ago

Note

this is an invitation to ramble about slade/batboy ships: sladick, sladejay, sladetim, sladedami, and other batfam member/villain ships, especially jayroman and ra'stim :)

AAAAAA this is so delightful oh my god thank you. adding a read more just because this one is going to get Long to cover all the ships and all my opinions. because my god do i love Slade.

firstly, the original Robin/villain ship, SlaDick. Slade Wilson, literally created to be a Teen Titans villains, with the original Robin he cannot be normal about ever. i'm so sad there's not much interest in Slade aside from making him a generic Evil Guy who canonically likes teenagers because i think to just boil down his complex with Dick to 'weird attraction' robs them of SUCH nuance. Slade *trusts* Dick, he trusts Dick enough to ask Dick to train his own daughter Rose. and initially Slade's complex over Dick isn't sexual, it's seeing Dick as a replacement for his dead son, Grant. that's messy as hell and i love them for it. i don't think there's a single villain that has the respect for Dick that Slade has. i'm always of the opinion Dick's attraction to Slade is rooted in daddy issues and Slade's attraction to Dick is rooted in dead son issues. do i think they could end up as an old married couple? yes but only in a world where Dick is completely broken and feels alone. my favorite SlaDick flavor is post-Jason's death. Dick and Bruce are arguably at their worst during that era to begin with so Dick is pretty isolated and emotionally unstable. and Slade would take such advantage of that, swooping in to offer Dick emotional stability and fucked up sex to get out pent up emotions. (i'm big a big fan of Dick fucking out his feelings tbh) and Slade is just. this sort of bad habit Dick will kick for a year or two then come crawling back to. you can directly track how well Bruce and Dick are getting along based on how many times Dick has slept with Slade recently. and that's the prize, for Slade. knowing Dick will come back to him, eventually. it's all about patience. and if something really extreme happened to Dick (like Bruce's fake death) i think they'd even date briefly. it's not entirely impossible for Dick to date someone he disagrees with morally (see: his flings with Helena) and i think Dick would keep trying to 'save' Slade, using the upper hand he has of filling in this role of Slade's dead son to try to domesticate him. would it work? who knows but if anyone is going to try over and over, it's going to be Dick. it's practically self-harm for Dick yet the only thing keeping him sane. i love them.

SladeJay is... an interesting one for me. because i like the *potential*. but they have no significant interactions pre-Flashpoint. and while usually i can forgive New-52 and Rebirth for their grievances if it has ship fodder i just... can't do that for Jason. Judd Winick's Jason is the only Jason that exists to me so even Slade and Jason's canon interactions matter little to me because it's not the version of Jason i care for. the upside of that though, is it's more of a sandbox to explore what they could be and there are no limitations. i can just run wild. which is fun bc. you're telling me Slade wouldn't be so drawn in by the idea of a dead Robin who's come back and is now the antithesis of Bruce's morality? i think at some point Slade would want to poke the bear, really see what Red Hood is made of. do i see them working long-term? no but i do think Jason would have zero qualms working with Slade if he got something out of it. and if he could fuck with Bruce or Dick by having a short, fucked up relationship with Slade? that's even better. i don't think Slade could ever truly respect Jason, at the end of the day the Dick Grayson standard is too high and Slade would sneer at the idea of a legacy who fucked it up so bad he got blown up. but, he'd see that as Bruce's failure more than Jason's. and for Jason to have someone look him in the eye and say that Bruce *failed* him? i think that'd just *do* something to Jason. and Slade has lost a son, he knows what that loss feels like, how you feel you failed as a father. would he have interest in being fatherly to Jason? no but i think he'd have fun momentarily manipulating Jason and seeing what reactions he gets out of what jeers. Jason's been calling himself a failure this whole time, so to have someone else say it is no real big deal, but to have someone else say it's Bruce's fault and voice Jason's feelings? they'd have the most fucked up sex with the most unhealthy dirty talk that's both gentle and degrading. i don't think Jason would ever let himself get too close, he's far too emotionally guarded. but for a second, i think he'd fantasize about having even *half* the amount of attention that Slade gives Dick. bc what has Jason always been, but in Dick's shadow.

SladeTim. my two blorbos. in one place. somewhere in my drafts i have a half-started longfic about SladeTim that's one half really fucked up porn and one half slowburn feelings. arguably Tim and Slade don't have many canon interactions, but it's fun to me that when they do, Slade always seems sort of startled by how well Tim fights back and Tim's willingness to fight dirty in a way even Dick doesn't. and to me, that's the crux of this ship. as far as Robins go, Tim should sort of slip under the radar for Slade. he's not the dead one turned villain, he's not the grandson of Ra's al Ghul, hell he's not even the child of a second-rate villain like Steph, he's not *the* Dick Grayson, he's just... the other one. grew up pretty rich and normal and fell for all of Bruce's wax poetic nonsense. so when Tim puts himself on the map as a hero, makes himself a worthy opponent against Slade that's interesting. even to Tim, Slade isn't a particularly remarkable villain since Slade cares to stay more on Dick's radar. so when they cross paths there's a lot of unexpected. neither of them have thought about the other too hard. so there's this interest and intrigue about it i love. i'm a big fan of the idea Tim is a massive masochist, both physically and emotionally and Slade is The Sadist Ever so. i like them falling into bed together and having the most fucked up sex. like Tim just being a Weird Little Freak so fucked up even Slade raises an eyebrow. because this isn't what you *expect* of a kid like Tim, who's had a pretty easy life before tangling with vigilantes. he should be like a fish out of water, but instead he's matching Slade's energy in ways even Dick doesn't. and of course, how smart he is, that's an asset. it takes a special kind of kid to have the audacity to poison Lady Shiva with hotel chocolates and pull it *off* no less. it earns a begrudging respect, and it's rare to get Slade to respect someone. i really like the idea of Tim seeking Slade out only for fucked up sex and somehow Slade falls for this weird little freak who's cold and clinical outside of sex and keeps him guessing.

i'll be honest i've only considered SladeDami in the context of seeing antis say 'omg Slade has been predatory toward Damian ewww' and going 'no the fuck he hasn't but if you want that so bad i'll ship it just to spite you all' but their canon interactions do fascinate me. a lot of how they interact is predicated on Slade as a father, even more so than SlaDick. like Slade will fight Damian and then be like 'hey be good to your old man fathers need their sons' and fucking dip. and then with the whole Respawn thing and Shadow War? that was extra crunchy. for a brief moment Slade had a son who was a brother to Damian and then he goes and *dies*? talk about the complex that would give him with Damian, the spitting image of Respawn. Make Slade Weird About Batkids That Remind Him of His Son 2024. Damian holds an utter contempt for Slade that is simply unmatched. so Slade not leaving that kid alone because of his weird issues, making sure that Bruce doesn't screw up with Damian the way he screwed up with Respawn is very fun. and Damian slowly building up a tolerance to Slade's annoying antics could be fun. Damian is, at his core, still just a kid who needs the approval of something father-shaped and he will Take What He Can Get. are they ever healthy or long lasting? no but i do think Damian would cling to Slade during his teen years for something incredibly fucked up and codependent until either Slade dumped him or he forced himself to get over it.

JayRoman. i will not lie love these two but i don't think i've read many Black Mask comics when he's not interacting with Jason. which is funny because my entire conception of Roman is him just getting humiliated by Jason and really what more is there to know about the man. Jason is so unserious in how he handles Roman and the best part is you can tell it's truly because he doesn't see Roman as a threat. Roman's just a pawn in the game of getting Bruce's attention and sure, Jason is aiming to kill Roman by the end of it, but he'll always have bigger fish to fry. and that's so *infuriating* for Roman. this new guy who's *clearly* a fucking teenager shows up, owns you so badly it shatters your empire, and then you only live bc he seems to have gotten bored of you. JayRoman is my particular favorite ship for the flavor of 'the sub in bed is in control of every other aspect of their relationship and their submission is a gift that can be revoked at any time' which we don't get enough. fucked up power dynamics always have the sub being the one lacking control. and whilst i enjoy when Roman is able to absolutely control and manipulate Jason through various means, i think in canon, it makes far more sense he's pathetic and begging Jason for even a *chance*. and Jason very specifically picking who he subs for based on someone who he could kill or destroy at the drop of the hat if he needed to is a very Jason thing to do. there will never be trust between these two. they will fuck nasty and Roman will be in love with Jason. but they are both carrying a gun during sex. the gun is probably involved during the sex.

Ra'sTim. my everything. Red Robin (2009) you will always be famous to me. what *don't* they have. forced proximity. enemies to lovers. forced partnership. one-sided obsession. ridiculously large age gap. deep unforgivable betrayal. i will never evacuate these two from my brain dear god. Ra's is another one of those villains who gets painted with one broad stroke of being cartoonishly evil with no exploration of his interesting nuance. making him nothing but a villain is boring. where is the Ra's who loves so deeply and fully and has to lose his loved ones over and over and will not let that happen to Tim. he wants to consume Tim in a 'cannibalism as a metaphor for love but also probably literal cannibalism' way. the amount of trust put in Ra's in order for Tim to be able to betray him as spectacularly as he did? that's glorious. Tim had full unfiltered access to Ra's' computers even when he was advised against trusting Tim so much. and then Tim wins against Ra's and willingly lets Ra's kill him. (obviously Dick saves him, but I'm of the opinion Tim was just committed to dying in that moment and he was Okay With That) 'i will betray you if it's the last thing i do' as an act of love. Tim is to Ra's what Dick is to Slade. you will never convince me Tim and Ra's didn't hatefuck at least once during RR (2009) with a questionable level of consent. i'm so serious i will never shut up about them. the way Tim talks about working with Ra's as if he's making a deal with the devil and Ra's talks about Tim like he's the precious, once in a life time thing, one of the only people worthy to produce an heir for Ra's. how's that not gay. what other ship involved one of them literally trying to have the other's baby to raise as an heir. Ra's would probably carry the baby himself if he could. memes aside they're just so. they're so it. i love when Tim is forced into a Situation where he has to work with Ra's and confronts the darker aspects of himself that Ra's wants to bring out but Tim wants to squash. it is The corruption kink. whether Ra's succeeds or not in corrupting Tim doesn't even matter because the real crux of this ship is the chase. it's the way the heart pounds when they reach out for each other and you don't know if it's for a kiss or a killing blow. it's very Hannigram to me, in that i don't even need or want them to kiss to know they're in love. love to them is not true love's kiss, it's the thoughtful place they decide to stab the other in. be the sheath to my dagger type ship. hold all this bloody violence i know you're capable of inside of you. let me cut the violence out of you ship. what more can you ask for from a ship. Ra's would tie Tim down and torture him both as foreplay and as a love language and Tim would be too fucked up and self-sacrificial to stop him. always playing the dangerous game of how far will the other let them go until someone tries to die or kill. listen i think i lost the plot here but my point is they're unwell about each other. Tim will make Ra's regret the day he met Tim Drake not just for the betrayal but because Ra's can never go back to a time Before Tim. before knowing what the chase felt like. they're so. them.

#necrotic answerings #sladick #sladejay #sladetim #sladedami #jayroman #ra'stim #i was going to include timlonnie for my own indulgent reasons but this already got so long.#also i've been having some timulysses thoughts as of recent.#aghhhh #sorry this took me a second to answer #i was writing a fic for omega dick week #it ended up 11k words long god somebody help me.#seriously thank you so much for this ask this just makes me so soft ppl wanna ask my opinions on ships #like oh my god ppl care about my weird thoughts. wtf /pos #i was worried when i started this blog that like. no one would care.#but i'm thriving.#yeah in case you can't tell i'm a big fan of tim.#he's just so.#rastim will be like. the peak of peak for me.#but i love all the others just as much #slade wilson deserves more nuance than ppl just calling him a predator/loser. bc yeah he is duh but he's also complicated as hell.#also i'm so serious i saw someone say damian was a 'victim' of slade's #and their proof was a single cover where damian is chained up upsidedown and happens to stick his tongue out at slade.#like. oh my god read their actual interactions you walnuts.#this is a common sentiment on tiktok. the idea damian and dick are victims of slade on the level terra was #which. like blatantly no. they fucking were not.#also the judas contract is just a complicated ass storyline that deserves more nuance than it gets #btw for sladejay i know there's some interactions in the arkhamverse that seem pretty interesting #but i don't know the arkhamverse all too well so i didn't comment

27 notes · View notes

the-art-of-sanshoku · 1 month ago

Text

My second and last Toku Holiday Special treat for @rosemirmir

I'm secretly a 511 enjoyer soooo I made some silly comics about them :3 Some headcanons and additional context in the ao3 post here (warning i mention some stuff that contain Kuuga ending spoilers somewhat)

Hopefully next year I will have more time to do treats 😔 but I was glad to at least get 2 done (even this one came a day after anonymity was lifted lol)

#kamen rider kuuga #godai yusuke #ichijo kaoru #tsubaki shuichi #511 #the ichi in shuichi is the one kanji just like ichijo hence 511 #art #my post #another toku holiday special 2024 #the last one took me a while lol drawing multiple characters interacting is so difficult for me #it was fun dressing them up; thought about their shoes a lot for some reason #ichijo just brings his normal running sneakers #tsubaki for some reason strikes me as a sneaker collector (maybe its the gold chain energy) so he has nice hightops #godai has proper hiking boots and a fun camo shirt #can you tell i really love drawing godai with :> face #also even more secretly a godai/tsubaki enjoyer hehe

15 notes · View notes

jcmarchi · 5 days ago

Text

DeepSeek-R1: Transforming AI Reasoning with Reinforcement Learning

New Post has been published on https://thedigitalinsider.com/deepseek-r1-transforming-ai-reasoning-with-reinforcement-learning/

DeepSeek-R1: Transforming AI Reasoning with Reinforcement Learning

DeepSeek-R1 is the groundbreaking reasoning model introduced by China-based DeepSeek AI Lab. This model sets a new benchmark in reasoning capabilities for open-source AI. As detailed in the accompanying research paper, DeepSeek-R1 evolves from DeepSeek’s v3 base model and leverages reinforcement learning (RL) to solve complex reasoning tasks, such as advanced mathematics and logic, with unprecedented accuracy. The research paper highlights the innovative approach to training, the benchmarks achieved, and the technical methodologies employed, offering a comprehensive insight into the potential of DeepSeek-R1 in the AI landscape.

What is Reinforcement Learning?

Reinforcement learning is a subset of machine learning where agents learn to make decisions by interacting with their environment and receiving rewards or penalties based on their actions. Unlike supervised learning, which relies on labeled data, RL focuses on trial-and-error exploration to develop optimal policies for complex problems.

Early applications of RL include notable breakthroughs by DeepMind and OpenAI in the gaming domain. DeepMind’s AlphaGo famously used RL to defeat human champions in the game of Go by learning strategies through self-play, a feat previously thought to be decades away. Similarly, OpenAI leveraged RL in Dota 2 and other competitive games, where AI agents exhibited the ability to plan and execute strategies in high-dimensional environments under uncertainty. These pioneering efforts not only showcased RL’s ability to handle decision-making in dynamic environments but also laid the groundwork for its application in broader fields, including natural language processing and reasoning tasks.

By building on these foundational concepts, DeepSeek-R1 pioneers a training approach inspired by AlphaGo Zero to achieve “emergent” reasoning without relying heavily on human-labeled data, representing a major milestone in AI research.

Key Features of DeepSeek-R1

Reinforcement Learning-Driven Training: DeepSeek-R1 employs a unique multi-stage RL process to refine reasoning capabilities. Unlike its predecessor, DeepSeek-R1-Zero, which faced challenges like language mixing and poor readability, DeepSeek-R1 incorporates supervised fine-tuning (SFT) with carefully curated “cold-start” data to improve coherence and user alignment.

Performance: DeepSeek-R1 demonstrates remarkable performance on leading benchmarks:

MATH-500: Achieved 97.3% pass@1, surpassing most models in handling complex mathematical problems.

Codeforces: Attained a 96.3% ranking percentile in competitive programming, with an Elo rating of 2,029.

MMLU (Massive Multitask Language Understanding): Scored 90.8% pass@1, showcasing its prowess in diverse knowledge domains.

AIME 2024 (American Invitational Mathematics Examination): Surpassed OpenAI-o1 with a pass@1 score of 79.8%.

Distillation for Broader Accessibility: DeepSeek-R1’s capabilities are distilled into smaller models, making advanced reasoning accessible to resource-constrained environments. For instance, the distilled 14B and 32B models outperformed state-of-the-art open-source alternatives like QwQ-32B-Preview, achieving 94.3% on MATH-500.

Open-Source Contributions: DeepSeek-R1-Zero and six distilled models (ranging from 1.5B to 70B parameters) are openly available. This accessibility fosters innovation within the research community and encourages collaborative progress.

DeepSeek-R1’s Training Pipeline The development of DeepSeek-R1 involves:

Cold Start: Initial training uses thousands of human-curated chain-of-thought (CoT) data points to establish a coherent reasoning framework.

Reasoning-Oriented RL: Fine-tunes the model to handle math, coding, and logic-intensive tasks while ensuring language consistency and coherence.

Reinforcement Learning for Generalization: Incorporates user preferences and aligns with safety guidelines to produce reliable outputs across various domains.

Distillation: Smaller models are fine-tuned using the distilled reasoning patterns of DeepSeek-R1, significantly enhancing their efficiency and performance.

Industry Insights Prominent industry leaders have shared their thoughts on the impact of DeepSeek-R1:

Ted Miracco, Approov CEO: “DeepSeek’s ability to produce results comparable to Western AI giants using non-premium chips has drawn enormous international interest—with interest possibly further increased by recent news of Chinese apps such as the TikTok ban and REDnote migration. Its affordability and adaptability are clear competitive advantages, while today, OpenAI maintains leadership in innovation and global influence. This cost advantage opens the door to unmetered and pervasive access to AI, which is sure to be both exciting and highly disruptive.”

Lawrence Pingree, VP, Dispersive: “The biggest benefit of the R1 models is that it improves fine-tuning, chain of thought reasoning, and significantly reduces the size of the model—meaning it can benefit more use cases, and with less computation for inferencing—so higher quality and lower computational costs.”

Mali Gorantla, Chief Scientist at AppSOC (expert in AI governance and application security): “Tech breakthroughs rarely occur in a smooth or non-disruptive manner. Just as OpenAI disrupted the industry with ChatGPT two years ago, DeepSeek appears to have achieved a breakthrough in resource efficiency—an area that has quickly become the Achilles’ Heel of the industry.

Companies relying on brute force, pouring unlimited processing power into their solutions, remain vulnerable to scrappier startups and overseas developers who innovate out of necessity. By lowering the cost of entry, these breakthroughs will significantly expand access to massively powerful AI, bringing with it a mix of positive advancements, challenges, and critical security implications.”

Benchmark Achievements DeepSeek-R1 has proven its superiority across a wide array of tasks:

Educational Benchmarks: Demonstrates outstanding performance on MMLU and GPQA Diamond, with a focus on STEM-related questions.

Coding and Mathematical Tasks: Surpasses leading closed-source models on LiveCodeBench and AIME 2024.

General Question Answering: Excels in open-domain tasks like AlpacaEval2.0 and ArenaHard, achieving a length-controlled win rate of 87.6%.

Impact and Implications

Efficiency Over Scale: DeepSeek-R1’s development highlights the potential of efficient RL techniques over massive computational resources. This approach questions the necessity of scaling data centers for AI training, as exemplified by the $500 billion Stargate initiative led by OpenAI, Oracle, and SoftBank.

Open-Source Disruption: By outperforming some closed-source models and fostering an open ecosystem, DeepSeek-R1 challenges the AI industry’s reliance on proprietary solutions.

Environmental Considerations: DeepSeek’s efficient training methods reduce the carbon footprint associated with AI model development, providing a path toward more sustainable AI research.

Limitations and Future Directions Despite its achievements, DeepSeek-R1 has areas for improvement:

Language Support: Currently optimized for English and Chinese, DeepSeek-R1 occasionally mixes languages in its outputs. Future updates aim to enhance multilingual consistency.

Prompt Sensitivity: Few-shot prompts degrade performance, emphasizing the need for further prompt engineering refinements.

Software Engineering: While excelling in STEM and logic, DeepSeek-R1 has room for growth in handling software engineering tasks.

DeepSeek AI Lab plans to address these limitations in subsequent iterations, focusing on broader language support, prompt engineering, and expanded datasets for specialized tasks.

Conclusion

DeepSeek-R1 is a game changer for AI reasoning models. Its success highlights how careful optimization, innovative reinforcement learning strategies, and a clear focus on efficiency can enable world-class AI capabilities without the need for massive financial resources or cutting-edge hardware. By demonstrating that a model can rival industry leaders like OpenAI’s GPT series while operating on a fraction of the budget, DeepSeek-R1 opens the door to a new era of resource-efficient AI development.

The model’s development challenges the industry norm of brute-force scaling where it is always assumed that more computing equals better models. This democratization of AI capabilities promises a future where advanced reasoning models are not only accessible to large tech companies but also to smaller organizations, research communities, and global innovators.

As the AI race intensifies, DeepSeek stands as a beacon of innovation, proving that ingenuity and strategic resource allocation can overcome the barriers traditionally associated with advanced AI development. It exemplifies how sustainable, efficient approaches can lead to groundbreaking results, setting a precedent for the future of artificial intelligence.

0 notes

holland-vosijk-antari · 5 months ago

Text

massive lack of sleep-induced ramble incoming:

been thinking more about the bonds between antari (from reading the telepathy between holland and ojka while osaron is in them) which makes me think about if the antari could telepathically communicate. firstly, holland would chose to Not Do That.... i can imagine a young kell who has learnt in a book that antari are meant to be able to talk across a bond and so he tries it and gets no response, he wonders if its him doing it incorrectly or if he's not strong enough yet so he keeps trying until eventually giving up. holland is very glad when he gives up because he would be trying to close his mind to the suffering he is experiencing only to be brought back to the surface by a small eager voice in his head saying "hello :) can you hear me?" and he knows if he responds whether it be kindly or harshly he will never have peace from it

on the other hand though, during agos before kell knows that lila is antari he thinks of her so often that she hers his voie in her head, lila of course has her suspicions that she may be antari but has no idea that such a connection is possible, so she considers these thoughts that sound alarmingly like kell as just a lingering attachment to him that she's desperately tried to sever. when they do figure it out though, people question how those two pirates (ahem, privateers) seem to always know what the other is thinking or what the other is about to do...

#thinking of the cuteness of lila cracking a smile at something but only kell knows what #everyone else would wonder why shes smiling for seemingly no reason #and are probably a bit nervous tbh #also thought about lila being annoyed she didnt know it was a thing until after hollands death #because she wanted to menace him out of pure spite #also had a fic idea of kell giving up on contacting holland until one day he somehow gets caught and chained up #perhaps in a similar way that the danes did holland so he cant even use his powers to escape #the only way out would be to call to holland and hope he can hear #and after years of ignoring kells calls holland just cant ignore this one #that goodness in him that he tried to bury through his years of torment comes rushing to the surface and he cant help but do something #this is not a well structured post at all #i am very very tired #adsom #shades of magic #holland vosijk #lila bard #kell maresh #adsom ramble

16 notes · View notes

sundial-bee-scribbles · 5 months ago

Text

trying to psych myself up to finally do oc refs by doing fandom-related refs instead: volume 1

wanted to update my yuma from whatever tf this au is so he was a bit more unique... takes inspo from a lot of different things while also trying to be its own sorta thing? which is fitting given the au ;)

bonus chibi now that i'm also figuring out how tf to do chibis lol:

#my art lol #synth v yuma #yuma synthv #synth v #synthv fanart #synthesizer v #vocaloid #vocaloid fanart #YES I KNOW ITS DIFFERENT but at this rate its the umbrella tag. all vsynth shit goes under there just like on main 😔#sorry for the annoyign watermarks i just dont want this to get stolennn/traced it'll b my joker arc. is2g #like thats never happened to me before as far as i know but now that my art is getting 'better' i begin to get scared that it will happen #if my fanart got stolen i'd def sting a little yeah but not hurt AS bad as if someone stole my original shit. THAT would hurt #one of many reasons why i post less personal oc stuffs. although as mentioned above i AM in an oc mood so i wanna draw em maybe...#and stuff like this is a step to develop a PROPER FUCKING REF STYLE bc i SUCKKKK AT MAKING REFS LOL 😭 BUT I SHOULD GIT GUD #i have a few other refs planned for vocaloid au (i guess???) related shit but they're not done yet. this one was also a wip that i just??#impulsively decided to redo & finish bc i wanted to draw but nothing else i was trying to draw came out right. advantages of many wips #i have SOOO many things i could say abt some of the things that went into this redesign but i dont wanna come off as pretentious 😔💔#obviously it was primarily inspired by the vimalion yuma design but. there's moreeee that i can't explain here bc tag limits and im shy #i do think i want to try and be more intentional with my character designs now so i'm seeing how that goes as i redesign some old ocs #man though this kind of stuff makes me remember i used to LOVEE doing this stuff. and now its even crazierr given art improvement #uaurhghh my head is buzzing w/. so many thoughts. THIS ALWAYS FUCKING HAPPENS I GET SO MANY IDEAS WHEN IM BUSY GFD #this is actually from today though unlike some other things i might eventually post. that'll make more sense soon #and fuckkk i forgot the chain necklace thing on the chibi yeah but i couldnt get it to look good. whatever

16 notes · View notes

bleue-flora · 10 months ago

Text

Ok, given that the original courthouse had chains in the jail cell, we know they were well aware they existed as items. So, I think the reason they didn’t use them in Pandora’s Vault must be because the mining fatigue made them too hard to break. So, the implication is they used rope (leads) instead…

Like the ccs went through the effort of having a noticeably low durability’s netherite axe, shears named: Warden’s Torment, and you’re telling me they didn’t think of just putting some chains in their hot bar to give us brain rot?

#leads Eret found randomly for not freaking reason on a chest mind you…#they are so unhinged thinking about sea water you cannot tell me they didn’t think of hanging some chains from the ceiling…#like come on… don’t get me started with the xp bottles…#pandora’s vault #prison arc #dsmp #lore thoughts #this is fine #dsmp lore #pandora’s vault has a singular purpose #dreblr #c!dream #dream smp #dsmp analysis #dsmpblr #don’t mind be thinking about torture box again…

21 notes · View notes

potionwine · 10 days ago

Text

.

#feeling actually. hmm. sick to the stomach at the conversation happening in the discord #just because i haven’t posted my fic there isn’t concrete proof out there that i came up with the story independently #i’m still working on it because i’m slow #and i can’t find anyone willing to beta because 100k words is a whopping undertaking and it’s not even complete #am i going to be acccused of stealing other people’s plots or plagiarising ideas when i finally post #i came up with sending joshua back to childhood before phoenix gate #i came up with dion time loop #but everyone’s spitballing ideas and now i look like a fraud #i came up with it myself!!!#i haven’t spent a whole year painfully chaining word after word after word #completely without support or encouragement or friendship #to find myself in a place where people will say i copied them??? just because i haven’t posted??#or the worst—that my little project is ai-generated based on their prompts?#i didn’t take anyone’s prompt from today to magically start on a project that’s already thousands of words long #a hundred words a day is considered a good day for me that’s how hard and lonely it has been for a whole year #and because no one agreed to beta i don’t even have independent witnesses for the progression of my work #if i am accused of theft or ai it will kill me #it might actually destroy me #aalsjddkhsksjdhfdksk i knew i should have left the discord a long time ago #but if i leave now it’ll look like i stole someone’s ideas and cut and run ffffffff #and i can’t leave the heart of pf community literally everywhere else is indiscriminately t/d #even in waloed ships that i like people randomly bring up that ship for no fucking reason #and the other servers are all so inept and lax at keeping firm control of content that should be limited to the focal ship #it’s not that i think i am the *first* to have any ideas since these are all tropes and well known aus like groundhog #but these specific ideas for this specific ship in this specific pattern was something i thought of independently at least #and now everyone’s brainstormed my whole plot out in a chat and i can’t very well jump in like some absolute asshole #like ‘hey you’re describing my fic actually’#i can’t very well respond to nearly every comment with ‘oh that’s in my story’ ‘this too’ ‘that too’#that would be insufferable even if true #so i can only keep my mouth shut and they’re going to think i ripped off their thoughts and my fic is stillborn

4 notes · View notes