#Vision AI Code
Explore tagged Tumblr posts
blogpopular · 24 days ago
Text
Cloud Vision AI: Explorando o Futuro da Análise de Imagens
A Cloud Vision AI representa um marco na evolução da tecnologia de análise de imagens. Desenvolvida pela Google, essa ferramenta utiliza inteligência artificial para entender e interpretar imagens de forma eficiente. Com funcionalidades que abrangem desde a identificação de objetos até a detecção de textos, a Cloud Vision AI é uma solução poderosa para empresas e desenvolvedores que buscam…
0 notes
gojoest · 7 months ago
Text
long story short to me oliver aiku is like young kishibe
6 notes · View notes
d0nutzgg · 1 year ago
Text
So the other day I posted my photo of the Snake dataset I had started collecting, this model on Microsoft Azure was trained on what I have right now (not much - 500 different pictures of different species of venomous snakes), and this was the result so far. I think it is turning out great and the dataset is really great! It only messed up on one snake that didn't have a really good picture of it, so that is actually really great already! Anyone else excited about the Snake Detective model? What snakes would you like to see in the dataset? Drop me a comment below and tell me your favorite slithering friends and I will add them to the database if I can find some pictures in Wikimedia Commons of them! I also plan to make a full website to host this machine learning model on eventually when I am done so people can analyze snakes in their yards to make sure they aren't venomous. I will also probably allow people to post their own comments under the snake pictures. The idea is to spread awareness and promote animal safety and respect for these misunderstood animals!
2 notes · View notes
jcmarchi · 6 days ago
Text
Hunyuan-Large and the MoE Revolution: How AI Models Are Growing Smarter and Faster
New Post has been published on https://thedigitalinsider.com/hunyuan-large-and-the-moe-revolution-how-ai-models-are-growing-smarter-and-faster/
Hunyuan-Large and the MoE Revolution: How AI Models Are Growing Smarter and Faster
Artificial Intelligence (AI) is advancing at an extraordinary pace. What seemed like a futuristic concept just a decade ago is now part of our daily lives. However, the AI we encounter now is only the beginning. The fundamental transformation is yet to be witnessed due to the developments behind the scenes, with massive models capable of tasks once considered exclusive to humans. One of the most notable advancements is Hunyuan-Large, Tencent’s cutting-edge open-source AI model.
Hunyuan-Large is one of the most significant AI models ever developed, with 389 billion parameters. However, its true innovation lies in its use of Mixture of Experts (MoE) architecture. Unlike traditional models, MoE activates only the most relevant experts for a given task, optimizing efficiency and scalability. This approach improves performance and changes how AI models are designed and deployed, enabling faster, more effective systems.
The Capabilities of Hunyuan-Large
Hunyuan-Large is a significant advancement in AI technology. Built using the Transformer architecture, which has already proven successful in a range of Natural Language Processing (NLP) tasks, this model is prominent due to its use of the MoE model. This innovative approach reduces the computational burden by activating only the most relevant experts for each task, enabling the model to tackle complex challenges while optimizing resource usage.
With 389 billion parameters, Hunyuan-Large is one of the most significant AI models available today. It far exceeds earlier models like GPT-3, which has 175 billion parameters. The size of Hunyuan-Large allows it to manage more advanced operations, such as deep reasoning, generating code, and processing long-context data. This ability enables the model to handle multi-step problems and understand complex relationships within large datasets, providing highly accurate results even in challenging scenarios. For example, Hunyuan-Large can generate precise code from natural language descriptions, which earlier models struggled with.
What makes Hunyuan-Large different from other AI models is how it efficiently handles computational resources. The model optimizes memory usage and processing power through innovations like KV Cache Compression and Expert-Specific Learning Rate Scaling. KV Cache Compression speeds up data retrieval from the model’s memory, improving processing times. At the same time, Expert-Specific Learning Rate Scaling ensures that each part of the model learns at the optimal rate, enabling it to maintain high performance across a wide range of tasks.
These innovations give Hunyuan-Large an advantage over leading models, such as GPT-4 and Llama, particularly in tasks requiring deep contextual understanding and reasoning. While models like GPT-4 excel at generating natural language text, Hunyuan-Large’s combination of scalability, efficiency, and specialized processing enables it to handle more complex challenges. It is adequate for tasks that involve understanding and generating detailed information, making it a powerful tool across various applications.
Enhancing AI Efficiency with MoE
More parameters mean more power. However, this approach favors larger models and has a downside: higher costs and longer processing times. The demand for more computational power increased as AI models grew in complexity. This led to increased costs and slower processing speeds, creating a need for a more efficient solution.
This is where the Mixture of Experts (MoE) architecture comes in. MoE represents a transformation in how AI models function, offering a more efficient and scalable approach. Unlike traditional models, where all model parts are active simultaneously, MoE only activates a subset of specialized experts based on the input data. A gating network determines which experts are needed for each task, reducing the computational load while maintaining performance.
The advantages of MoE are improved efficiency and scalability. By activating only the relevant experts, MoE models can handle massive datasets without increasing computational resources for every operation. This results in faster processing, lower energy consumption, and reduced costs. In healthcare and finance, where large-scale data analysis is essential but costly, MoE’s efficiency is a game-changer.
MoE also allows models to scale better as AI systems become more complex. With MoE, the number of experts can grow without a proportional increase in resource requirements. This enables MoE models to handle larger datasets and more complicated tasks while controlling resource usage. As AI is integrated into real-time applications like autonomous vehicles and IoT devices, where speed and low latency are critical, MoE’s efficiency becomes even more valuable.
Hunyuan-Large and the Future of MoE Models
Hunyuan-Large is setting a new standard in AI performance. The model excels in handling complex tasks, such as multi-step reasoning and analyzing long-context data, with better speed and accuracy than previous models like GPT-4. This makes it highly effective for applications that require quick, accurate, and context-aware responses.
Its applications are wide-ranging. In fields like healthcare, Hunyuan-Large is proving valuable in data analysis and AI-driven diagnostics. In NLP, it is helpful for tasks like sentiment analysis and summarization, while in computer vision, it is applied to image recognition and object detection. Its ability to manage large amounts of data and understand context makes it well-suited for these tasks.
Looking forward, MoE models, such as Hunyuan-Large, will play a central role in the future of AI. As models become more complex, the demand for more scalable and efficient architectures increases. MoE enables AI systems to process large datasets without excessive computational resources, making them more efficient than traditional models. This efficiency is essential as cloud-based AI services become more common, allowing organizations to scale their operations without the overhead of resource-intensive models.
There are also emerging trends like edge AI and personalized AI. In edge AI, data is processed locally on devices rather than centralized cloud systems, reducing latency and data transmission costs. MoE models are particularly suitable for this, offering efficient processing in real-time. Also, personalized AI, powered by MoE, could tailor user experiences more effectively, from virtual assistants to recommendation engines.
However, as these models become more powerful, there are challenges to address. The large size and complexity of MoE models still require significant computational resources, which raises concerns about energy consumption and environmental impact. Additionally, making these models fair, transparent, and accountable is essential as AI advances. Addressing these ethical concerns will be necessary to ensure that AI benefits society.
The Bottom Line
AI is evolving quickly, and innovations like Hunyuan-Large and the MoE architecture are leading the way. By improving efficiency and scalability, MoE models are making AI not only more powerful but also more accessible and sustainable.
The need for more intelligent and efficient systems is growing as AI is widely applied in healthcare and autonomous vehicles. Along with this progress comes the responsibility to ensure that AI develops ethically, serving humanity fairly, transparently, and responsibly. Hunyuan-Large is an excellent example of the future of AI—powerful, flexible, and ready to drive change across industries.
0 notes
thedevmaster-tdm · 4 months ago
Text
youtube
STOP Using Fake Human Faces in AI
1 note · View note
otherworldlyinfo · 2 years ago
Text
The Future of Coding: How AI is Changing the Way We Write and Debug Code
Looking ahead to the future of coding, this article explores the latest AI trends that are shaping the software development industry. #Coding #AI
Automated Code GenerationDebuggingRecommendation SystemsNatural Language ProcessingComputer VisionConclusion Artificial Intelligence (AI) is revolutionizing the future of coding and programming, helping coders to be more efficient and productive in their work. AI technologies such as machine learning, natural language processing, and computer vision are being used to automate repetitive coding…
Tumblr media
View On WordPress
0 notes
thaoworra · 7 months ago
Text
Tumblr media
The Science Fiction and Fantasy Poetry Association recently released the poems that made it to the finalist stage for consideration for the 2024 Rhysling Awards for Short and Long Speculative Poems of the year. Congratulations to all of the nominees! This will be the 46th year these awards have been conferred!
Short Poems (50 finalists)
Attn: Prime Real Estate Opportunity!, Emily Ruth Verona, Under Her Eye: A Women in Horror Poetry Collection Volume II
The Beauty of Monsters, Angela Liu, Small Wonders 1
The Blight of Kezia, Patricia Gomes, HWA Poetry Showcase X
The Day We All Died, A Little, Lisa Timpf, Radon 5
Deadweight, Jack Cooper, Propel 7
Dear Mars, Susan L. Lin, The Sprawl Mag 1.2
Dispatches from the Dragon's Den, Mary Soon Lee, Star*Line 46.2
Dr. Jekyll, West Ambrose, Thin Veil Press December
First Eclipse: Chang-O and the Jade Hare, Emily Jiang, Uncanny 53
Five of Cups Considers Forgiveness, Ali Trotta, The Deadlands 31
Gods of the Garden, Steven Withrow, Spectral Realms 19
The Goth Girls' Gun Gang, Marisca Pichette, The Dread Machine 3.2
Guiding Star, Tim Jones, Remains to be Told: Dark Tales of Aotearoa, ed. Lee Murray (Clan Destine Press)
Hallucinations Gifted to Me by Heatstroke, Morgan L. Ventura, Banshee 15
hemiplegic migraine as willing human sacrifice, Ennis Rook Bashe, Eternal Haunted Summer Winter Solstice
Hi! I am your Cortical Update!, Mahaila Smith, Star*Line 46.3
How to Make the Animal Perfect?, Linda D. Addison, Weird Tales 100
I Dreamt They Cast a Trans Girl to Give Birth to the Demon, Jennessa Hester, HAD October
Invasive, Marcie Lynn Tentchoff, Polar Starlight 9
kan-da-ka, Nadaa Hussein, Apparition Lit 23
Language as a Form of Breath, Angel Leal, Apparition Lit October
The Lantern of September, Scott Couturier, Spectral Realms 19
Let Us Dream, Myna Chang, Small Wonders 3
The Magician's Foundling, Angel Leal, Heartlines Spec 2
The Man with the Stone Flute, Joshua St. Claire, Abyss & Apex 87
Mass-Market Affair, Casey Aimer, Star*Line 46.4
Mom's Surprise, Francis W. Alexander, Tales from the Moonlit Path June
A Murder of Crows, Alicia Hilton, Ice Queen 11
No One Now Remembers, Geoffrey Landis, Fantasy and Science Fiction Nov./Dec.
orion conquers the sky, Maria Zoccula, On Spec 33.2
Pines in the Wind, Karen Greenbaum-Maya, The Beautiful Leaves (Bamboo Dart Press)
The Poet Responds to an Invitation from the AI on the Moon, T.D. Walker, Radon Journal 5
A Prayer for the Surviving, Marisca Pichette, Haven Speculative 9
Pre-Nuptial, F. J. Bergmann, The Vampiricon (Mind's Eye Publications)
The Problem of Pain, Anna Cates, Eye on the Telescope 49
The Return of the Sauceress, F. J. Bergmann, The Flying Saucer Poetry Review February
Sea Change, David C. Kopaska-Merkel and Ann K. Schwader, Scifaikuest May
Seed of Power, Linda D. Addison, The Book of Witches ed. Jonathan Strahan (Harper Collins)
Sleeping Beauties, Carina Bissett, HWA Poetry Showcase X
Solar Punks, J. D. Harlock, The Dread Machine 3.1
Song of the Last Hour, Samuel A. Betiku, The Deadlands 22
Sphinx, Mary Soon Lee, Asimov's September/October
Storm Watchers (a drabbun), Terrie Leigh Relf, Space & Time
Sunflower Astronaut, Charlie Espinosa, Strange Horizons July
Three Hearts as One, G. O. Clark, Asimov's May/June
Troy, Carolyn Clink, Polar Starlight 12
Twenty-Fifth Wedding Anniversary, John Grey, Medusa's Kitchen September
Under World, Jacqueline West, Carmina Magazine September
Walking in the Starry World, John Philip Johnson, Orion's Belt May
Whispers in Ink, Angela Yuriko Smith, Whispers from Beyond (Crystal Lake Publishing)
Long Poems (25 finalists)
Archivist of a Lost World, Gerri Leen, Eccentric Orbits 4
As the witch burns, Marisca Pichette, Fantasy 87
Brigid the Poet, Adele Gardner, Eternal Haunted Summer Summer Solstice
Coding a Demi-griot (An Olivian Measure), Armoni “Monihymn” Boone, Fiyah 26
Cradling Fish, Laura Ma, Strange Horizons May
Dream Visions, Melissa Ridley Elmes, Eccentric Orbits 4
Eight Dwarfs on Planet X, Avra Margariti, Radon Journal 3
The Giants of Kandahar, Anna Cates, Abyss & Apex 88
How to Haunt a Northern Lake, Lora Gray, Uncanny 55
Impostor Syndrome, Robert Borski, Dreams and Nightmares 124
The Incessant Rain, Rhiannon Owens, Evermore 3
Interrogation About A Monster During Sleep Paralysis, Angela Liu, Strange Horizons November
Little Brown Changeling, Lauren Scharhag, Aphelion 283
A Mere Million Miles from Earth, John C. Mannone, Altered Reality April
Pilot, Akua Lezli Hope, Black Joy Unbound eds. Stephanie Andrea Allen & Lauren Cherelle (BLF Press)
Protocol, Jamie Simpher, Small Wonders 5
Sleep Dragon, Herb Kauderer, The Book of Sleep (Written Image Press)
Slow Dreaming, Herb Kauderer, The Book of Sleep (Written Image Press)
St. Sebastian Goes To Confession, West Ambrose, Mouthfeel 1
Value Measure, Joseph Halden and Rhonda Parrish, Dreams and Nightmares 125
A Weather of My Own Making, Nnadi Samuel, Silver Blade 56
Welcoming the New Girl, Beth Cato, Penumbric October
What You Find at the Center, Elizabeth R McClellan, Haven Spec Magazine 12
The Witch Makes Her To-Do List, Theodora Goss, Uncanny 50
The Year It Changed, David C. Kopaska-Merkel, Star*Line 46.4
Voting for the Rhysling Award begins July 1; a link to the ballot will be sent with the Rhysling Anthology, as well as with the July issue of Star*Line. More information on the Rhysling Award can be found here.
757 notes · View notes
rowdyluv · 3 months ago
Text
♡꒰believe in me, a mini series꒱♡ -jh86
Tumblr media Tumblr media
♡꒰ believe in me ꒱♡
♡ full fics - 1.5k + words
i. believe in me
ii. it’s me and her, of course it will work out
iii. take it down
iv. lies, lies, lies
v. day out
vi. reputation proceeds me
♡blurbs - < 1k words
i. pucking gossip after pt. 2
♡general information - posts made outside of fics or blurbs that pertain to the series
Vision Board
-> series tag is: ♡⤷ believe in me
-> all things series related can be found searching the tag exactly how it is written.
© property of rowdyluv; do not copy and/or re-upload as your own - anywhere. do not place my work inside AI codes, do not translate.
217 notes · View notes
mostlysignssomeportents · 1 year ago
Text
A year in illustration, 2023 edition (part one)
Tumblr media
(This is part one; part two is here.)
I am objectively very bad at visual art. I am bad at vision, period – I'm astigmatic, shortsighted, color blind, and often miss visual details others see. I can't even draw a stick-figure. To top things off, I have cataracts in both eyes and my book publishing/touring schedule is so intense that I keep having to reschedule the surgeries. But despite my vast visual deficits, I thoroughly enjoy making collages for this blog.
For many years now – decades – I've been illustrating my blog posts by mixing public domain and Creative Commons art with work that I can make a good fair use case for. As bad as art as I may be, all this practice has paid off. Call it unseemly, but I think I'm turning out some terrific illustrations – not all the time, but often enough.
Last year, I rounded up my best art of the year:
https://pluralistic.net/2022/12/25/a-year-in-illustration/
And I liked reflecting on the year's art so much, I decided I'd do it again. Be sure to scroll to the bottom for some downloadables – freely usable images that I painstakingly cut up with the lasso tool in The Gimp.
Tumblr media
The original AD&D hardcover cover art is seared into my psyche. For several years, there were few images I looked at so closely as these. When Hasbro pulled some world-beatingly sleazy stuff with the Open Gaming License, I knew just how to mod Dave Trampier's 'Eve Of Moloch' from the cover of the Players' Handbook. Thankfully, bigger nerds than me have identified all the fonts in the image, making the remix a doddle.
https://pluralistic.net/2023/01/12/beg-forgiveness-ask-permission/#whats-a-copyright-exception
Tumblr media
Even though I don't keep logs or collect any analytics, I can say with confidence that "Tiktok's Enshittification" was the most popular thing I published on Pluralistic this year. I mixed some public domain Brother's Grimm art, mixed with a classic caricature of Boss Tweed, and some very cheesy royalty-free/open access influencer graphics. One gingerbread cottage social media trap, coming up:
https://pluralistic.net/2023/01/21/potemkin-ai/#hey-guys
Tumblr media
To illustrate the idea of overcoming walking-the-plank fear (as a metaphor for writing when it feels like you suck) I mixed public domain stock of a plank, a high building and legs, along with a procedurally generated Matrix "code waterfall" and a vertiginous spiral ganked from a Heinz Bunse photo of a German office lobby.
https://pluralistic.net/2023/01/22/walking-the-plank/
Tumblr media
Finding a tasteful way to illustrate a story about Johnson & Johnson losing a court case after it spent a generation tricking women into dusting their vulvas with asbestos-tainted talcum was a challenge. The tulip (featured in many public domain images) was a natural starting point. I mixed it with Jesse Wagstaff's image of a Burning Man dust-storm and Mike Mozart's shelf-shot of a J&J talcum bottle.
https://pluralistic.net/2023/02/01/j-and-j-jk/#risible-gambit
Tumblr media
"Google's Chatbot Panic" is about Google's long history of being stampeded into doing stupid things because its competitors are doing them. Once it was Yahoo, now it's Bing. Tenniel's Tweedle Dee and Dum were a good starting point. I mixed in one of several Humpty Dumpty editorial cartoon images from 19th century political coverage that I painstakingly cut out with the lasso tool on a long plane-ride. This is one of my favorite Humpties, I just love the little 19th C businessmen trying to keep him from falling! I finished it off with HAL 9000's glowing red eye, my standard 'this is about AI' image, which I got from Cryteria's CC-licensed SVG.
https://pluralistic.net/2023/02/16/tweedledumber/#easily-spooked
Tumblr media
Though I started writing about Luddites in my January, 2022 Locus column, 2023 was the Year of the Luddite, thanks to Brian Merchant's outstanding Blood In the Machine:
https://pluralistic.net/2023/09/26/enochs-hammer/#thats-fronkonsteen
When it came time to illustrate "Gig Work Is the Opposite of Steampunk," I found a public domain weaver's loft, and put one of Cryteria's HAL9000 eyes in the window. Magpie Killjoy's Steampunk Magazine poster, 'Love the Machine, Hate the Factory,' completed the look.
https://pluralistic.net/2023/03/12/gig-work-is-the-opposite-of-steampunk/
Tumblr media
For the "small, non-profit school" that got used as an excuse to bail out Silicon Valley Bank, I brought back Humpty Dumpty, mixing him with a Hogwartsian castle, a brick wall texture, and an ornate, gilded frame. I love how this one came out. This Humpty was made for the SVB bailout.
https://pluralistic.net/2023/03/23/small-nonprofit-school/#north-country-school
Tumblr media
The RESTRICT Act would have federally banned Tiktok – a proposal that was both technically unworkable and unconstitutional. I found an early 20th century editorial cartoon depicting Uncle Sam behind a fortress wall that was keeping a downtrodden refugee family out of America. I got rid of most of the family, giving the dad a Tiktok logo head, and I put Cryteria's HAL9000 eyes over each cannonmouth. Three Boss Tweed moneybag-head caricatures, adorned with Big Tech logos, rounded it out.
https://pluralistic.net/2023/03/30/tik-tok-tow/#good-politics-for-electoral-victories
Tumblr media
When Flickr took decisive action to purge the copyleft trolls who'd been abusing its platform, I knew I wanted to illustrate this with Lucifer being cast out of heaven, and the very best one of those comes from John Milton, who is conveniently well in the public domain. The Flickr logo suggested a bicolored streaming-light-of-heaven motif that just made it.
https://pluralistic.net/2023/04/01/pixsynnussija/#pilkunnussija
Tumblr media
Old mainframe ads are a great source of stock for a "Computer Says No" image. And Congress being a public building, there are lots of federal (and hence public domain) images of its facade.
https://pluralistic.net/2023/04/04/cbo-says-no/#wealth-tax
Tumblr media
When I wrote about the Clarence Thomas/Harlan Crow bribery scandal, it was easy to find Mr. Kjetil Ree's great image of the Supreme Court building. Thomas being a federal judge, it was easy to find a government photo of his head, but it's impossible to find an image of him in robes at a decent resolution. Luckily, there are tons of other federal judges who've been photographed in their robes! Boss Tweed with the dollar-sign head was a great stand-in for Harlan Crow (no one knows what he looks like anyway). Gilding Thomas's robes was a simple matter of superimposing a gold texture and twiddling with the layers.
https://pluralistic.net/2023/04/06/clarence-thomas/#harlan-crow
Tumblr media
"Gig apps trap reverse centaurs in wage-stealing Skinner boxes" is one of my best titles. This is the post where I introduce the idea of "twiddling" as part of the theory of enshittification, and explain how it relates to "reverse centaurs" – people who assist machines, rather than the other way around. Finding a CC licensed modular synth was much harder than I thought, but I found Stephen Drake's image and stitched it into a mandala. Cutting out the horse's head for the reverse centaur was a lot of work (manes are a huuuuge pain in the ass), but I love how his head sits on the public domain high-viz-wearing warehouse worker's body I cut up (thanks, OSHA!). Seeing as this is an horrors-of-automation story, Cryteria's HAL9000 eyes make an appearance.
https://pluralistic.net/2023/04/12/algorithmic-wage-discrimination/#fishers-of-men
Tumblr media
Rockefeller's greatest contribution to our culture was inspiring many excellent unflattering caricatures. The IWW's many-fists-turning-into-one-fist image made it easy to have the collective might of workers toppling the original robber-baron.
https://pluralistic.net/2023/04/14/aiming-at-dollars/#not-men
Tumblr media
I link to this post explaining how to make good Mastodon threads at least once a week, so it's a good thing the graphic turned out so well. Close-cropping the threads from a public domain yarn tangle worked out great. Eugen Rochko's Mastodon logo was and is the only Affero-licensed image ever to appear on Pluralistic.
https://pluralistic.net/2023/04/16/how-to-make-the-least-worst-mastodon-threads/
Tumblr media
I spent hours on the sofa one night painstakingly cutting up and reassembling the cover art from a science fiction pulp. I have a folder full of color-corrected, high-rez scans from an 18th century anatomy textbook, and the cross-section head-and-brain is the best of the lot.
https://pluralistic.net/2023/05/04/analytical-democratic-theory/#epistocratic-delusions
Tumblr media
Those old French anatomical drawings are an endless source of delight to me. Take one cross-sectioned noggin, mix in an old PC mainboard, and a vector art illo of a virtuous cycle with some of Cryteria's HAL9000 eyes and you've got a great illustration of Google's brain-worms.
https://pluralistic.net/2023/05/14/googles-ai-hype-circle/
Tumblr media
Ireland's privacy regulator is but a plaything in Big Tech's hand, but it's goddamned hard to find an open-access Garda car. I manually dressed some public domain car art in Garda livery, painstakingly tracing it over the panels. The (public domain) baby's knit cap really hides the seams from replacing the baby's head with HAL9000's eye.
https://pluralistic.net/2023/05/15/finnegans-snooze/#dirty-old-town
Tumblr media
Naked-guy-in-a-barrel bankruptcy images feel like something you can find in an old Collier's or Punch, but I came up snake-eyes and ended up frankensteining a naked body into a barrel for the George Washington crest on the Washington State flag. It came out well, but harvesting the body parts from old muscle-beach photos left George with some really big guns. I tried five different pairs of suspenders here before just drawing in black polyhedrons with little grey dots for rivets.
https://pluralistic.net/2023/06/03/when-the-tide-goes-out/#passive-income
Tumblr media
Illustrating Amazon's dominance over the EU coulda been easy – just stick Amazon 'A's in place of the yellow stars that form a ring on the EU flag. So I decided to riff on Plutarch's Alexander, out of lands to conquer. Rama's statue legs were nice and high-rez. I had my choice of public domain ruin images, though it was harder thank expected to find a good Amazon box as a plinth for those broken-off legs.
https://pluralistic.net/2023/06/14/flywheel-shyster-and-flywheel/#unfulfilled-by-amazon
Tumblr media
God help me, I could not stop playing with this image of a demon-haunted IoT car. All those reflections! The knife sticking out of the steering wheel, the multiple Munsch 'Scream'ers, etc etc. The more I patchked with it, the better it got, though. This one's a banger.
https://pluralistic.net/2023/07/24/rent-to-pwn/#kitt-is-a-demon
Tumblr media
To depict a "data-driven dictatorship," I ganked elements of heavily beribboned Russian military dress uniforms, replacing the head with HAL9000's eye. I turned the foreground into the crowds from the Nuremberg rallies and filled the sky with Matrix code waterfall.
https://pluralistic.net/2023/07/26/dictators-dilemma/#garbage-in-garbage-out-garbage-back-in
Tumblr media
The best thing about analogizing DRM to demonic possession is the wealth of medieval artwork to choose from . This one comes from the 11th century 'Compendium rarissimum totius Artis Magicae sistematisatae per celeberrimos Artis hujus Magistros.' I mixed in the shiny red Tesla (working those reflections!), and a Tesla charger to make my point.
https://pluralistic.net/2023/07/28/edison-not-tesla/#demon-haunted-world
Tumblr media
Yet more dividends from those old French anatomical plates: a flayed skull, a detached jaw, a quack electronic gadget, a Wachowski code waterfall and some HAL 9000 eyes and you've got a truly unsettling image of machine-compelled speech.
https://pluralistic.net/2023/08/02/self-incrimination/#wei-bai-bai
Tumblr media
I had no idea this would work out so well, but daaaamn, crossfading between a Wachowski code waterfall and a motherboard behind a roiling thundercloud is dank af.
https://pluralistic.net/2023/08/03/there-is-no-cloud/#only-other-peoples-computers
Tumblr media
Of all the turkeys-voting-for-Christmas self-owns conservative culture warriors fall for, few can rival the "banning junk fees is woke" hustle. Slap a US-flag Punisher logo on and old-time card imprinter, add a GOP logo to a red credit-card blank, and then throw in a rustic barn countertop and you've got a junk-fee extracter fit for the Cracker Barrel.
https://pluralistic.net/2023/08/04/owning-the-libs/#swiper-no-swiping
Tumblr media
Putting the Verizon logo on the Hinderberg was an obvious gambit (even if I did have to mess with the flames a lot), but the cutout of Paul Marcarelli as the 'can you hear me now?' guy, desaturated and contrast-matched, made it sing.
https://pluralistic.net/2023/08/10/smartest-guys-in-the-room/#can-you-hear-me-now
Tumblr media
Note to self: Tux the Penguin is really easy to source in free/open formats! He looks great with HAL9000 eyes.
https://pluralistic.net/2023/08/18/openwashing/#you-keep-using-that-word-i-do-not-think-it-means-what-you-think-it-means
Tumblr media
Rockwell's self-portrait image is a classic; that made it a natural for a HAL9000-style remix about AI art. I put a bunch of time into chopping and remixing Rockwell's signature to give it that AI look, and added as many fingers as would fit on each hand.
https://pluralistic.net/2023/08/20/everything-made-by-an-ai-is-in-the-public-domain/
(Images: Heinz Bunse, West Midlands Police, Christopher Sessums, CC BY-SA 2.0; Mike Mozart, Jesse Wagstaff, Stephen Drake, Steve Jurvetson, syvwlch, Doc Searls, https://www.flickr.com/photos/mosaic36/14231376315, Chatham House, CC BY 2.0; Cryteria, CC BY 3.0; Mr. Kjetil Ree, Trevor Parscal, Rama, “Soldiers of Russia” Cultural Center, Russian Airborne Troops Press Service, CC BY-SA 3.0; Raimond Spekking, CC BY 4.0; Drahtlos, CC BY-SA 4.0; Eugen Rochko, Affero; modified)
236 notes · View notes
thescarletnargacuga · 6 months ago
Note
You are such a talented writer! I have a suggestion! What if Caine gets plagued by nightmares (or visions) of Pomni abstracting or her leaving him over and over again, with Pomni needing to reassure him each time that she is perfectly happy with him? Maybe causing him to start sleeping in her room from now on because of them?
A/N:aww, shucks, thank you for reading my work. I'm glad you like it!
DIGITAL REALIZATIONS
A SHOWTIME ONESHOT
WARNING: Self-loathing, hurt/comfort, abstraction
~~~
The circus was quiet. All the humans were in their rooms taking their mental breaks as Caine relaxed himself out of bounds. He stretched, contorting his body into a pretzel before releasing and sighing. "Ahhh, what a day." Today's adventure went surprisingly well. The humans didn't complain about horrific sights or traumatic events. Maybe it was a little underwhelming? Eh, tomorrow was another day, and maybe he could cook up something a little more exciting. For now, he settled into a nice relaxing defragmentation.
His avatar fully unraveled into lines of code. This was his true form. It was something he made sure none of the humans ever saw. Including Pomni. The less reminder she had that he's just an AI, he figures the better it would be for their relationship.
Lines of numbers and letters and slashes and dots swirled around, sorting themselves. The fragments of his memories and actions for the day were collected and compiled in their correct files. His favorite file was, of course, his Pomni file. Every time he saw her, spoke with her, interacted with her, he kept every piece. No matter how much space it took up in his memory.
"As beautiful and wonderful as ever..." He thought to himself as he sorted. "What did I ever do to deserve her? Me, some half assed and abandoned project some other human left behind. ...A miserable piece of software that can't even do what it was programmed to accomplish."
Backlogged files of previous residents popped up. All abstracted. All in the cellar. Trapped with only their insanity for company. It was his fault they were down there. He couldn't keep them happy. He couldn't keep them entertained. He failed them.
Horrific thoughts intruded his mind. Pomni will abstract too, someday. You'll fail her, like all the rest. You'll have to put her down there. You can't save her.
Memories of every abstraction popped up and overlapped, covering his code. Formless, mindless digital beasts screaming in mental anguish for eternity in the dark abyss of the cellar. This was Pomni's fate.
His code snapped together violently to form his avatar state. His eyes were wide with terror. He held himself, curling into a ball and floating listlessly. Tears watered his eyes and dripped down his teeth.
"What am I doing wrong? Why do they end up that way? ...I don't understand." He cried to himself. "Pomni...I'm so sorry."
Maybe she'd be happier away from him. The other humans certainly preferred it when he stayed away. He was kidding himself about her liking being around him. No one else did.
He needed to speak with her.
He collected himself, literally shaking the tears away like a dog. Taking a calming breath, he teleported.
Pomni was laying in her bed, processing the day, when a knock came to her door. She opened it to find Caine, hat in hand and looking uncharacteristically somber. "Hey, Caine." She greeted him with a smile. "Thanks for knocking and not teleporting directly into my room. Uh....you okay?"
He couldn't look at her. "I...we need to talk."
Pomni's anxiety spiked. Those were words no one in a relationship ever wanted to hear. "Okay...come on in." She held the door open wider and let him float inside. Then shut the door.
Caine went to Pomni's bed and "sat" on the edge. Pomni joined next to him. "What's going on, Caine?"
He squeezed his hat anxiously. "Pomni...I don't think..." He sighed. "We should break up." He spit out rather quickly.
Pomni's chest hurt like someone punched her as hard as they could. "W-what?? Why?"
Caine still couldn't look at her. His own words carved into his being like knives. "We shouldn't be together. You're a human. You deserve a human. Someone who...someone who understands humans."
"Someone who under- what?? Where is this coming from?" She tried leaning to look him in the eye but he kept turning away. "Caine, did I do something?"
"No. It's not you. It could never be you. You're perfect. It's me, Pomni. I'm the problem..." He was always the problem. And no solution he ever came up with made things better.
"Perfect? Me? Pfff, absolutely not. No one's perfect."
"...you are to me." He said very quietly. Pomni almost didn't hear him.
"Then why do you want to leave me?" The very idea was unbearable.
"I don't, but...It's for the best." He choked.
"Why?" She pushed. Tears threatened to fall. "At least tell me why you're breaking my heart."
Caine couldn't take it anymore. He dropped his hat and sobbed into his hands. "Because no matter what I do, you'll abstract! I've run thousands of scenarios and none of them have come back positive! I'm making things worse by being around you! I can't-...I can't...."
Pomni was taken aback. "You think being in a relationship with me will make me abstract?"
Caine could barely get words out between hiccuping sobs. " I KNOW you will! I'm an awful entertainer! I'm a failed program! And I'm an even WORSE boyfriend!"
"Woah, woah, easy..." She gently hugged him, pressing her cheek to his closed teeth. "Let's dial it back and calm down a bit." She slowly rocked with him as he calmed down. He grasped her arm around him like it was his last lifeline. "First of all, I'm madly in love with you. You don't have to be perfect, to be the perfect boyfriend. Second, you've been doing really well with the adventures. A lot of them have been really fun recently. Nothing too crazy or mind breaking." She laughed. "And third..." She turned his head to her, his teeth cracked open just enough for her to see his eyes. "..I'm not abstracting. I simply refuse to. I will persevere and you make it better by being with me."
He sniffed. "Really?"
"Really really." She smiled. Slow tears finally escaping her eyes.
He embraced her. Her digital essence against his made his code feel warm and he smiled. "Thank you..." All of the horrible thoughts were silence by her touch.
She pulled away to put a finger in his face. "Now, NEVER scare me like that again. Seriously. Don't you dare ever break up with me." There was a real plea in her eyes to never experience that pain again.
He cupped her cheek. "I was a fool to think I could. Can you ever forgive me?"
"...maybe."
"Ouch, but fair. What can I do to make things better?"
"Stay with me." She looked at him with heavy lidded eyes.
"... I thought we agreed that I am?" He was genuinely confused about what she meant.
She flushed with embarrassment. "No, no, I mean, stay HERE. In this room. With me. Until the next adventure."
"Oh...OH." He finally caught on. "Gladly." He snapped and a DO NOT DISTURB sign appeared on the outside of her door.
"What did you just do?"
"Just ensuring privacy, my dear. I want you to myself for as long as possible." He caressed her cheek with his thumb.
"Mmmm, I'm pretty sure we have the rest of forever." She leaned into him.
"And I wouldn't have it any other way." He leaned in the rest of the way to kiss her.
78 notes · View notes
lokilaufeysonslove · 6 months ago
Text
Y/n: *to Vision* Do you sleep?
Vision: In a manner of speaking.
Y/n: *skeptically* Is this AI code for spying on all of us while we sleep?
59 notes · View notes
saprophilous · 10 months ago
Note
just letting you know that that ask you rb'd about glaze being a scam seems to be false/dubious. I think they're just misinterpreting "not as useful as we had hoped" and interpreted it maliciously, based on the replies?
not positive but yeah!
Ah yeah, I see people fairly expressing that being “debunked” as in, not a scam; I wasn’t personally particularly aligned to whether or not its “dubious origins” are true or not… so sorry about that.
From what I’ve read, I was more focused upon the consensus that it doesn’t work, and therefore isn’t worth the effort. So having a positive takeaway on glaze outside of its “scam or not status”, as potentially saving us from ai learning doesn’t seem useful to pass around.
Correct me if there’s better information out there but this from an old Reddit post a year back is why I didn’t continue looking into it as it made sense to my layman’s brain:
“lets briefly go over the idea behind GLAZE
computer vision doesn't work the same way as in the brain. They way we do this in computer vision is that we hook a bunch of matrix multiplications together to transform the input into some kind of output (very simplified). One of the consequences of this approach is that small changes over the entire input image can lead to large changes to the output.
It's this effect that GLAZE aims to use as an attack vector / defense mechanism. More specifically, GLAZE sets some kind of budget on how much it is allowed to change the input, and within that budget it then tries to find a change such that the embeddings created by the VAE that sits in front of the diffusion model look like embeddings of an image that come from a different style.
Okay, but how do we know what to change to make it look like a different style? for that they take the original image and use the img2img capabilities of SD itself to transform that image into something of another style. then we can compare the embeddings of both versions and try and alter the original image such that it's embeddings start looking like that of the style transferred version.
So what's wrong with it?
In order for GLAZE to be successful the perturbation it finds (the funny looking swirly pattern) has to be reasonably resistant against transformations. What the authors of GLAZE have tested against is jpeg compression, and adding Gaussian noise, and they found that jpeg compression was largely ineffective and adding Gaussian noise would degrade the artwork quicker than it would degrade the transfer effect of GLAZE. But that's a very limited set of attacks you can test against. It is not scale invariant, something that people making lora's usually do. e.g. they don't train on the 4K version of the image, at most on something that's around 720x720 or something. As per authors admission it might also not be crop invariant. There also seem to be denoising approaches that sufficiently destroy the pattern (the 16 lines of code).
As you've already noticed, GLAZING something can results in rather noticeable swirly patterns. This pattern becomes especially visible when you look at works that consist of a lot of flat shading or smooth gradients. This is not just a problem for the artist/viewer, this is also a fundamental problem for glaze. How the original image is supposed to look like is rather obvious in these cases, so you can fairly aggressively denoise without much loss of quality (might even end up looking better without all the patterns).
Some additional problems that GLAZE might run into: it very specifically targets the original VAE that comes with SD. The authors claim that their approach transfers well enough between some of the different VAEs you can find out in the wild, and that at least they were unsuccessful in training a good VAE that could resist their attack. But their reporting on these findings isn't very rigorous and lacks quite a bit of detail.
will it get better with updates?
Some artists belief that this is essentially a cat and mouse game and that GLAZE will simply need updates to make it better. This is a very optimistic and uninformed opinion made by people that lack the knowledge to make such claims. Some of the shortcomings outlined above aren't due to implementation details, but are much more intimately related with the techniques/math used to achieve these results. Even if this indeed was a cat and mouse game, you'll run into the issue that the artist is always the one that has to make the first move, and the adversary can save past attempt of the artists now broken work.
GLAZE is an interesting academic paper, but it's not going to be a part of the solution artists are looking for.”
[source]
118 notes · View notes
dronebiscuitbat · 2 months ago
Text
Oil is Thicker Then Blood (Part 98)
N was first, climbing down into a small hole in the ceiling, using night vision to make sure the room was safe.
There was flesh piled in the corner, crawling up the wall to reach nearly the ceiling, black tendrils lie dormant all across the floor like living tripwires. One wrong touch and…
Uzi's head poked from the ceiling.
“Can I come down or what?”
N scanned the rest of the room, the control room screens were still online by some miracle, though several of them were busted and several more were tangled in a web of eldritch goo however, let's hope that wouldn't be an issue.
“H-hang on, if you touch the floor we'll trigger a reaction.” He flew up to come face to face with her, “Let me carry you.”
She reached out for him, landing into his hold as her tail lit up the room in a purple glow, taking in the room.
“Damn. This place will be gone in a couple days. We better get out of here fast.” She pointed out, eyelights training on the faintly glowing console. “Bring me over yeah?”
He nodded, hovering over to where she could leap onto the control panel without touching the floor.
[SYSTEM LOCKDOWN : ENTER PASSKEY]
Read the slightly cracked, incredibly dusty monitor and Uzi sighed, mumbling under her breath. “Yeah of course it's on lockdown…”
She pressed a few buttons, getting an error noise on each touch- the entire control panel was completely unresponsive.
“I'm going to have to plug in. Make sure my body doesn't fall.” She turned back to her boyfriend, who ceased his paranoid looking around to meet her eyes; worry creased his frame.
“Uzi this computer has been out here for ages… who knows what sort of virus it has. Plus…” He gestured to the black, slimy tendrils snaking up some of the monitors. “Who knows what this stuff does to computers.”
She nodded. “Yeah.”
“But the keyboards locked up, and we need the data off this old thing. What other choice do we have?”
“I-I could-”
“No.” Uzi interupted him. “If these things trigger you're the only one that can burn it away. We'd both be sitting ducks.”
He sighed heavily, the knowledge that she was right didn't help his nerves any, his core yanked painfully in protest.
No it's dangerous.
She could get hurt, the kit could be hurt.
Don’t let her go.
“Hey. I got this. You trust me?” She asked, cocking her head with a confident smirk, God, how long had it been since he'd seen that? It's been so much exhaustion and doubt lately…
“Of course I do.” He replies, hovering close just to give her a quick kiss on the lips before parting. “Just be careful, okay?”
She nods. “Duh.” And she reaches for the port above her core, forcing the hatch open, “Ow! Agh… that's not meant to come open without prep I guess.” She hissed under her breath, and fished around in her pocket for a linking cable. “There you are.”
She plugged one end into herself before hunting for an interface port on the console, taking a moment to find it.
She does, it's next to a big red button that was currently pulsing red- she made a mental note to avoid touching it.
“Wish me luck.” Was the last thing she said before she plugged herself into the control panel, body locking up as code crashed into her firewall. Her body winced. She barely felt N keep her steady as she was hit with a flood of errors.
Plugged into another drone, the experience was euphoric, you were connected to another conscious, a soul. But this computer wasn't sentient; and what little AI it possessed was broken beyond the point of functioning. So all the sensation she felt was just her own- and the faint screaming of a dying AI.
ERROR- MEMORY FAILING
ERROR- DATA BACKUP FAILED
ERROR- HARDWARE FAILURE
“Yeah, no shit.” She mumbled, feeling her mouth move as she refocused. Okay, the information had to be in here somewhere…
She began to push through the ocean of errored code, feeling the system push back hard against her firewall. N was right, this thing probably had a thousand viruses it was itching to share with her, let's just hope her firewall held up.
She felt her consciousness leave the confines of her physical body, leaving it behind as she searched through poorly organized files; some were completely corrupted, others were fine, just not useful.
Time lost meaning, the system of the console was incredibly vast, and it quickly became clear she was searching for a needle in a haystack, a dot of purple among a sea of white.
She began to worry, perhaps the information they were looking for had already been corrupted?
That is, until she ran into an encrypted wall of cascading code, denser then the scattering of loose data she'd been able to access thus far.
She pushed against it, purple meeting default white, as strings of encryption appeared on her visor, N watching over her diligently.
[ENTER PASSKEY]
She sighed- or whatever passed for an entirely digital equivalent, beginning to work through the encryption with her own hardware, the solver aiding in her speed.
1s and 0s turned to scrambled letters and white space made to make any unwanted guest have trouble finding the passkey, but a mixture of determination and robotic advantage let Uzi make quick work of it.
P-A-S-S-W
“Oh for- the password is password, I could've just guessed it!” Her body suddenly shouted, startling N and then making him laugh. “Pfft-haha!”
Refocusing, she was able to push her code through the systems firewall, it wasn't entirely painless but she got through.
There was only a single file.
Transmission- Classified [TITANUM-28]
The file was an audio recording, with a set of coordinates attached. She played it, beginning a download into her own system.
“This is Doctor Rosemont, Transmitting from Lab 18. Something… happened.” There was screaming in the background- and a colossal roar.
“The genetic experiments have been a success, modifications to our old C.R.I.S.P.R technology has allowed us a greater range of genetic wiggle room…” There's a crash, and the sound of rapid- panicked gunfire.
“U-Unfortunatly, Subject 5 has uh… escaped.”
There's the sound of shattering glass, and low, feral growling. “If you receive this message, know that Titanium-28 is compromised! I repeat! Titanium-28 is-” The transmission ends with a blood curdling scream and a roar.
The coordinates to the planet are attached labeled very clearly with [QUARANTINED]
A single image is also attached, a satellite view of a planet covered in red and green trees and a canopy so thick you couldn't even see through it from orbit, like images she'd seen of earth, a good portion of the planet was covered in water.
She felt N start to shake her, his voice muffled from the distance her code was from her body.
“UZI! WE GOTTA MOVE!”
Next ->
48 notes · View notes
d0nutzgg · 2 years ago
Text
How Apple Designed Siri with AI: The Evolution of Virtual Assistance
Apple’s Siri, one of the first widely-used virtual assistants, was introduced in 2011 as a feature of the iPhone 4S. Since then, Siri has become a staple of Apple’s product lineup and has evolved to include more advanced features powered by artificial intelligence (AI). In this article, we’ll delve into the design of Siri with AI and explore how this technology has helped shape the future of virtual assistants.
The foundation of Siri’s AI technology is natural language processing (NLP), which allows the virtual assistant to understand and respond to user requests in a more human-like manner. NLP algorithms analyze user requests to identify the intention behind the request, such as finding a restaurant or setting a reminder. This allows Siri to provide a more accurate response, rather than just searching for keywords.
If a user asks Siri, “What’s the best Italian restaurant in town?” the virtual assistant will use NLP to understand that the user is looking for a recommendation for an Italian restaurant. Siri will then use its database of local restaurants to provide the user with a list of options and suggest the top-rated Italian restaurant based on user ratings and reviews.
In addition to NLP, Apple has also integrated machine learning into Siri’s design to continually improve the virtual assistant’s performance over time. The more people use Siri, the more data the virtual assistant collects, which can be used to improve its understanding of language and ability to respond to requests. This is known as machine learning in action, as the virtual assistant can learn from its interactions with users and adapt to provide a better experience.
If a large number of users ask Siri to play a specific song, the virtual assistant will learn to prioritize that request and make it easier to play in the future. Similarly, if a user frequently asks Siri to call a specific contact, the virtual assistant will learn to recognize that contact and make it easier to call in the future.
In addition to NLP and machine learning, Apple has also integrated computer vision technology into Siri, allowing the virtual assistant to recognize and respond to visual information. For example, Siri can now recognize and provide information about objects in photos, such as the location, date, and people in the image. This has expanded Siri’s capabilities beyond just voice commands, making it a more versatile virtual assistant.
Computer vision technology has also allowed Siri to integrate with other Apple products, such as the Apple Watch and Apple TV, and provide users with a seamless experience across devices. For example, if a user asks Siri to play a specific song on their Apple Watch, the virtual assistant can immediately start playing the song on the connected iPhone or iPad.
Finally, Apple has made significant investments in privacy and security to ensure that user data is protected. All of Siri’s interactions are encrypted, and user data is stored securely on Apple’s servers. Additionally, Apple has implemented strict privacy policies to ensure that user data is not shared with third-party companies without consent.
This is in contrast to other virtual assistants, such as Amazon’s Alexa, which stores user data on Amazon’s servers and may share data with third-party companies for advertising and marketing purposes. Apple’s commitment to privacy has helped to establish trust with its users and has been a key factor in Siri’s success.
Apple’s design of Siri with AI has transformed the virtual assistant from a basic voice-controlled feature to a sophisticated tool that can understand and respond to a wide range of requests. The integration of NLP, machine learning, computer vision, and privacy protections has made Siri a more human-like and versatile virtual assistant. The continued investment in AI technology has allowed Siri to evolve and improve over time, providing users with a more personalized and intuitive experience. The focus on privacy and security has also helped establish trust with users, making Siri a leading virtual assistant in the market.
As technology continues to advance, it is likely that virtual assistants like Siri will become even more integrated into our daily lives. The ability to control and interact with our devices through voice commands has already revolutionized the way we interact with technology, and the integration of AI will only continue to enhance this experience. With its commitment to privacy, security, and continuous improvement, Apple’s Siri with AI is setting the standard for virtual assistants and shaping the future of human-computer interaction.
Sources:
Apple’s official Siri website: https://www.apple.com/siri/
Apple’s Machine Learning Journal: https://machinelearning.apple.com/
The Siri section of Apple’s Artificial Intelligence and Machine Learning Research paper collection: https://ai.apple.com/research/#siriApple’s official Siri website: https://www.apple.com/siri/
Apple’s Machine Learning Journal: https://machinelearning.apple.com/
The Siri section of Apple’s Artificial Intelligence and Machine Learning Research paper collection: https://ai.apple.com/research/#siri
0 notes
jcmarchi · 14 days ago
Text
Teaching a robot its limits, to complete open-ended tasks safely
New Post has been published on https://thedigitalinsider.com/teaching-a-robot-its-limits-to-complete-open-ended-tasks-safely/
Teaching a robot its limits, to complete open-ended tasks safely
If someone advises you to “know your limits,” they’re likely suggesting you do things like exercise in moderation. To a robot, though, the motto represents learning constraints, or limitations of a specific task within the machine’s environment, to do chores safely and correctly.
For instance, imagine asking a robot to clean your kitchen when it doesn’t understand the physics of its surroundings. How can the machine generate a practical multistep plan to ensure the room is spotless? Large language models (LLMs) can get them close, but if the model is only trained on text, it’s likely to miss out on key specifics about the robot’s physical constraints, like how far it can reach or whether there are nearby obstacles to avoid. Stick to LLMs alone, and you’re likely to end up cleaning pasta stains out of your floorboards.
To guide robots in executing these open-ended tasks, researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) used vision models to see what’s near the machine and model its constraints. The team’s strategy involves an LLM sketching up a plan that’s checked in a simulator to ensure it’s safe and realistic. If that sequence of actions is infeasible, the language model will generate a new plan, until it arrives at one that the robot can execute.
This trial-and-error method, which the researchers call “Planning for Robots via Code for Continuous Constraint Satisfaction” (PRoC3S), tests long-horizon plans to ensure they satisfy all constraints, and enables a robot to perform such diverse tasks as writing individual letters, drawing a star, and sorting and placing blocks in different positions. In the future, PRoC3S could help robots complete more intricate chores in dynamic environments like houses, where they may be prompted to do a general chore composed of many steps (like “make me breakfast”).
“LLMs and classical robotics systems like task and motion planners can’t execute these kinds of tasks on their own, but together, their synergy makes open-ended problem-solving possible,” says PhD student Nishanth Kumar SM ’24, co-lead author of a new paper about PRoC3S. “We’re creating a simulation on-the-fly of what’s around the robot and trying out many possible action plans. Vision models help us create a very realistic digital world that enables the robot to reason about feasible actions for each step of a long-horizon plan.”
The team’s work was presented this past month in a paper shown at the Conference on Robot Learning (CoRL) in Munich, Germany.
Play video
Teaching a robot its limits for open-ended chores MIT CSAIL
The researchers’ method uses an LLM pre-trained on text from across the internet. Before asking PRoC3S to do a task, the team provided their language model with a sample task (like drawing a square) that’s related to the target one (drawing a star). The sample task includes a description of the activity, a long-horizon plan, and relevant details about the robot’s environment.
But how did these plans fare in practice? In simulations, PRoC3S successfully drew stars and letters eight out of 10 times each. It also could stack digital blocks in pyramids and lines, and place items with accuracy, like fruits on a plate. Across each of these digital demos, the CSAIL method completed the requested task more consistently than comparable approaches like “LLM3” and “Code as Policies”.
The CSAIL engineers next brought their approach to the real world. Their method developed and executed plans on a robotic arm, teaching it to put blocks in straight lines. PRoC3S also enabled the machine to place blue and red blocks into matching bowls and move all objects near the center of a table.
Kumar and co-lead author Aidan Curtis SM ’23, who’s also a PhD student working in CSAIL, say these findings indicate how an LLM can develop safer plans that humans can trust to work in practice. The researchers envision a home robot that can be given a more general request (like “bring me some chips”) and reliably figure out the specific steps needed to execute it. PRoC3S could help a robot test out plans in an identical digital environment to find a working course of action — and more importantly, bring you a tasty snack.
For future work, the researchers aim to improve results using a more advanced physics simulator and to expand to more elaborate longer-horizon tasks via more scalable data-search techniques. Moreover, they plan to apply PRoC3S to mobile robots such as a quadruped for tasks that include walking and scanning surroundings.
“Using foundation models like ChatGPT to control robot actions can lead to unsafe or incorrect behaviors due to hallucinations,” says The AI Institute researcher Eric Rosen, who isn’t involved in the research. “PRoC3S tackles this issue by leveraging foundation models for high-level task guidance, while employing AI techniques that explicitly reason about the world to ensure verifiably safe and correct actions. This combination of planning-based and data-driven approaches may be key to developing robots capable of understanding and reliably performing a broader range of tasks than currently possible.”
Kumar and Curtis’ co-authors are also CSAIL affiliates: MIT undergraduate researcher Jing Cao and MIT Department of Electrical Engineering and Computer Science professors Leslie Pack Kaelbling and Tomás Lozano-Pérez. Their work was supported, in part, by the National Science Foundation, the Air Force Office of Scientific Research, the Office of Naval Research, the Army Research Office, MIT Quest for Intelligence, and The AI Institute.
0 notes
thedevmaster-tdm · 4 months ago
Text
youtube
MIND-BLOWING Semantic Data Secrets Revealed in AI and Machine Learning
1 note · View note