#Data science algorithms and models
Explore tagged Tumblr posts
Text
What is Data Science? Introduction, Basic Concepts & Process
what is data science? Complete information about data science for beginner to advance you search what is data science data science is like data analyzing, data saving, database etc.
#what is data science#Data science definition and scope#Importance of data science#Data science vs. data analytics#Data science applications and examples#Skills required for a data scientist#Data science job prospects and salary#Data science tools and technologies#Data science algorithms and models#Difference between data science and machine learning#Data science interview questions and answers
1 note
¡
View note
Text
MIT researchers introduce Boltz-1, a fully open-source model for predicting biomolecular structures
New Post has been published on https://thedigitalinsider.com/mit-researchers-introduce-boltz-1-a-fully-open-source-model-for-predicting-biomolecular-structures/
MIT researchers introduce Boltz-1, a fully open-source model for predicting biomolecular structures
MIT scientists have released a powerful, open-source AI model, called Boltz-1, that could significantly accelerate biomedical research and drug development.
Developed by a team of researchers in the MIT Jameel Clinic for Machine Learning in Health, Boltz-1 is the first fully open-source model that achieves state-of-the-art performance at the level of AlphaFold3, the model from Google DeepMind that predicts the 3D structures of proteins and other biological molecules.
MIT graduate students Jeremy Wohlwend and Gabriele Corso were the lead developers of Boltz-1, along with MIT Jameel Clinic Research Affiliate Saro Passaro and MIT professors of electrical engineering and computer science Regina Barzilay and Tommi Jaakkola. Wohlwend and Corso presented the model at a Dec. 5 event at MITâs Stata Center, where they said their ultimate goal is to foster global collaboration, accelerate discoveries, and provide a robust platform for advancing biomolecular modeling.
âWe hope for this to be a starting point for the community,â Corso said. âThere is a reason we call it Boltz-1 and not Boltz. This is not the end of the line. We want as much contribution from the community as we can get.â
Proteins play an essential role in nearly all biological processes. A proteinâs shape is closely connected with its function, so understanding a proteinâs structure is critical for designing new drugs or engineering new proteins with specific functionalities. But because of the extremely complex process by which a proteinâs long chain of amino acids is folded into a 3D structure, accurately predicting that structure has been a major challenge for decades.
DeepMindâs AlphaFold2, which earned Demis Hassabis and John Jumper the 2024 Nobel Prize in Chemistry, uses machine learning to rapidly predict 3D protein structures that are so accurate they are indistinguishable from those experimentally derived by scientists. This open-source model has been used by academic and commercial research teams around the world, spurring many advancements in drug development.
AlphaFold3 improves upon its predecessors by incorporating a generative AI model, known as a diffusion model, which can better handle the amount of uncertainty involved in predicting extremely complex protein structures. Unlike AlphaFold2, however, AlphaFold3 is not fully open source, nor is it available for commercial use, which prompted criticism from the scientific community and kicked off a global race to build a commercially available version of the model.
For their work on Boltz-1, the MIT researchers followed the same initial approach as AlphaFold3, but after studying the underlying diffusion model, they explored potential improvements. They incorporated those that boosted the modelâs accuracy the most, such as new algorithms that improve prediction efficiency.
Along with the model itself, they open-sourced their entire pipeline for training and fine-tuning so other scientists can build upon Boltz-1.
âI am immensely proud of Jeremy, Gabriele, Saro, and the rest of the Jameel Clinic team for making this release happen. This project took many days and nights of work, with unwavering determination to get to this point. There are many exciting ideas for further improvements and we look forward to sharing them in the coming months,â Barzilay says.
It took the MIT team four months of work, and many experiments, to develop Boltz-1. One of their biggest challenges was overcoming the ambiguity and heterogeneity contained in the Protein Data Bank, a collection of all biomolecular structures that thousands of biologists have solved in the past 70 years.
âI had a lot of long nights wrestling with these data. A lot of it is pure domain knowledge that one just has to acquire. There are no shortcuts,â Wohlwend says.
In the end, their experiments show that Boltz-1 attains the same level of accuracy as AlphaFold3 on a diverse set of complex biomolecular structure predictions.
âWhat Jeremy, Gabriele, and Saro have accomplished is nothing short of remarkable. Their hard work and persistence on this project has made biomolecular structure prediction more accessible to the broader community and will revolutionize advancements in molecular sciences,â says Jaakkola.
The researchers plan to continue improving the performance of Boltz-1 and reduce the amount of time it takes to make predictions. They also invite researchers to try Boltz-1 on their GitHub repository and connect with fellow users of Boltz-1 on their Slack channel.
âWe think there is still many, many years of work to improve these models. We are very eager to collaborate with others and see what the community does with this tool,â Wohlwend adds.
Mathai Mammen, CEO and president of Parabilis Medicines, calls Boltz-1 a âbreakthroughâ model. âBy open sourcing this advance, the MIT Jameel Clinic and collaborators are democratizing access to cutting-edge structural biology tools,â he says. âThis landmark effort will accelerate the creation of life-changing medicines. Thank you to the Boltz-1 team for driving this profound leap forward!â
âBoltz-1 will be enormously enabling, for my lab and the whole community,â adds Jonathan Weissman, an MIT professor of biology and member of the Whitehead Institute for Biomedical Engineering who was not involved in the study. âWe will see a whole wave of discoveries made possible by democratizing this powerful tool.â Weissman adds that he anticipates that the open-source nature of Boltz-1 will lead to a vast array of creative new applications.
This work was also supported by a U.S. National Science Foundation Expeditions grant; the Jameel Clinic; the U.S. Defense Threat Reduction Agency Discovery of Medical Countermeasures Against New and Emerging (DOMANE) Threats program; and the MATCHMAKERS project supported by the Cancer Grand Challenges partnership financed by Cancer Research UK and the U.S. National Cancer Institute.
#2024#3d#3D structure#acids#affiliate#ai#ai model#Algorithms#amino acids#applications#approach#Art#Artificial Intelligence#bank#Biology#Biomedical engineering#Cancer#CEO#challenge#channel#chemistry#collaborate#Collaboration#Community#computer#Computer Science#Computer Science and Artificial Intelligence Laboratory (CSAIL)#Computer science and technology#cutting#data
0 notes
Text
0 notes
Text
Probabilistic model-based clustering is an excellent approach to understanding the trends that may be inferred from data and making future forecasts. The relevance of model based clustering, one of the first subjects taught in data science, cannot be overstated. These models serve as the foundation for machine learning models to comprehend popular trends and their behavior. You can also learn about neural network guides and python for data science if you are interested in further career prospects of data science.Â
#Clustering#Data Mining#Machine Learning#Probabilistic Models#Model-Based Clustering#Gaussian Mixture Models#Expectation-Maximization Algorithm#Unsupervised Learning#Data Analysis#Pattern Recognition#Data Science#Bayesian Methods#Hidden Markov Models#Statistical Learning#Cluster Analysis.
0 notes
Photo
(via All you need to know about Machine Learning | meaning, tool, technique, math, algorithm, AI, accuracy, âŚetc)
#machine learning algorithm meaning#how does machine learning#regularization machine learning meaning#mean in machine learning#machine learning as a service#what is machine learning model#statistics for machine learning#tools of machine learning#machine learning tool#machine learning technique#machine learning math#artificial intelligence and machine learning meaning#ai machine learning meaning#what is machine learning in data science#types of machine learning algorithms#3 types of machine learning#different types of machine learning algorithms#regularization machine learning#machine learning regularization#regularization machine learning definition
0 notes
Text
Maximizing the Benefits of Risk Scoring and Classification in Forensic Analytics
Unlock the full potential of your forensic analytics with risk scoring and classification approaches. Learn how they can enhance fraud detection, improve regulatory compliance, optimize ICT systems and operations, analyze transactions and crime.
Unlock the Power of Forensic Analytics with Risk Scoring and Classification Forensic analytics plays a crucial role in many areas of business and government operations. From detecting and preventing fraud, to ensuring regulatory compliance and improving operations, to analyzing transactional data and detecting crime, the use of risk scoring and classification approaches can greatly enhance theâŚ
View On WordPress
1 note
¡
View note
Text
Arvind Narayanan, a computer science professor at Princeton University, is best known for calling out the hype surrounding artificial intelligence in his Substack, AI Snake Oil, written with PhD candidate Sayash Kapoor. The two authors recently released a book based on their popular newsletter about AIâs shortcomings.
But donât get it twistedâthey arenât against using new technology. âIt's easy to misconstrue our message as saying that all of AI is harmful or dubious,â Narayanan says. He makes clear, during a conversation with WIRED, that his rebuke is not aimed at the software per say, but rather the culprits who continue to spread misleading claims about artificial intelligence.
In AI Snake Oil, those guilty of perpetuating the current hype cycle are divided into three core groups: the companies selling AI, researchers studying AI, and journalists covering AI.
Hype Super-Spreaders
Companies claiming to predict the future using algorithms are positioned as potentially the most fraudulent. âWhen predictive AI systems are deployed, the first people they harm are often minorities and those already in poverty,â Narayanan and Kapoor write in the book. For example, an algorithm previously used in the Netherlands by a local government to predict who may commit welfare fraud wrongly targeted women and immigrants who didnât speak Dutch.
The authors turn a skeptical eye as well toward companies mainly focused on existential risks, like artificial general intelligence, the concept of a super-powerful algorithm better than humans at performing labor. Though, they donât scoff at the idea of AGI. âWhen I decided to become a computer scientist, the ability to contribute to AGI was a big part of my own identity and motivation,â says Narayanan. The misalignment comes from companies prioritizing long-term risk factors above the impact AI tools have on people right now, a common refrain Iâve heard from researchers.
Much of the hype and misunderstandings can also be blamed on shoddy, non-reproducible research, the authors claim. âWe found that in a large number of fields, the issue of data leakage leads to overoptimistic claims about how well AI works,â says Kapoor. Data leakage is essentially when AI is tested using part of the modelâs training dataâsimilar to handing out the answers to students before conducting an exam.
While academics are portrayed in AI Snake Oil as making âtextbook errors,â journalists are more maliciously motivated and knowingly in the wrong, according to the Princeton researchers: âMany articles are just reworded press releases laundered as news.â Reporters who sidestep honest reporting in favor of maintaining their relationships with big tech companies and protecting their access to the companiesâ executives are noted as especially toxic.
I think the criticisms about access journalism are fair. In retrospect, I could have asked tougher or more savvy questions during some interviews with the stakeholders at the most important companies in AI. But the authors might be oversimplifying the matter here. The fact that big AI companies let me in the door doesnât prevent me from writing skeptical articles about their technology, or working on investigative pieces I know will piss them off. (Yes, even if they make business deals, like OpenAI did, with the parent company of WIRED.)
And sensational news stories can be misleading about AIâs true capabilities. Narayanan and Kapoor highlight New York Times columnist Kevin Rooseâs 2023 chatbot transcript interacting with Microsoft's tool headlined âBingâs A.I. Chat: âI Want to Be Alive. đââ as an example of journalists sowing public confusion about sentient algorithms. âRoose was one of the people who wrote these articles,â says Kapoor. âBut I think when you see headline after headline that's talking about chatbots wanting to come to life, it can be pretty impactful on the public psyche.â Kapoor mentions the ELIZA chatbot from the 1960s, whose users quickly anthropomorphized a crude AI tool, as a prime example of the lasting urge to project human qualities onto mere algorithms.
Roose declined to comment when reached via email and instead pointed me to a passage from his related column, published separately from the extensive chatbot transcript, where he explicitly states that he knows the AI is not sentient. The introduction to his chatbot transcript focuses on âits secret desire to be humanâ as well as âthoughts about its creators,â and the comment section is strewn with readers anxious about the chatbotâs power.
Images accompanying news articles are also called into question in AI Snake Oil. Publications often use clichĂŠd visual metaphors, like photos of robots, at the top of a story to represent artificial intelligence features. Another common trope, an illustration of an altered human brain brimming with computer circuitry used to represent the AIâs neural network, irritates the authors. âWe're not huge fans of circuit brain,â says Narayanan. âI think that metaphor is so problematic. It just comes out of this idea that intelligence is all about computation.â He suggests images of AI chips or graphics processing units should be used to visually represent reported pieces about artificial intelligence.
Education Is All You Need
The adamant admonishment of the AI hype cycle comes from the authorsâ belief that large language models will actually continue to have a significant influence on society and should be discussed with more accuracy. âIt's hard to overstate the impact LLMs might have in the next few decades,â says Kapoor. Even if an AI bubble does eventually pop, I agree that aspects of generative tools will be sticky enough to stay around in some form. And the proliferation of generative AI tools, which developers are currently pushing out to the public through smartphone apps and even formatting devices around it, just heightens the necessity for better education on what AI even is and its limitations.
The first step to understanding AI better is coming to terms with the vagueness of the term, which flattens an array of tools and areas of research, like natural language processing, into a tidy, marketable package. AI Snake Oil divides artificial intelligence into two subcategories: predictive AI, which uses data to assess future outcomes; and generative AI, which crafts probable answers to prompts based on past data.
Itâs worth it for anyone who encounters AI tools, willingly or not, to spend at least a little time trying to better grasp key concepts, like machine learning and neural networks, to further demystify the technology and inoculate themselves from the bombardment of AI hype.
During my time covering AI for the past two years, Iâve learned that even if readers grasp a few of the limitations of generative tools, like inaccurate outputs or biased answers, many people are still hazy about all of its weaknesses. For example, in the upcoming season of AI Unlocked, my newsletter designed to help readers experiment with AI and understand it better, we included a whole lesson dedicated to examining whether ChatGPT can be trusted to dispense medical advice based on questions submitted by readers. (And whether it will keep your prompts about that weird toenail fungus private.)
A user may approach the AIâs outputs with more skepticism when they have a better understanding of where the modelâs training data came fromâoften the depths of the internet or Reddit threadsâand it may hamper their misplaced trust in the software.
Narayanan believes so strongly in the importance of quality education that he began teaching his children about the benefits and downsides of AI at a very young age. âI think it should start from elementary school,â he says. âAs a parent, but also based on my understanding of the research, my approach to this is very tech-forward.â
Generative AI may now be able to write half-decent emails and help you communicate sometimes, but only well-informed humans have the power to correct breakdowns in understanding around this technology and craft a more accurate narrative moving forward.
38 notes
¡
View notes
Text
How To Use AI To Fake A Scandal For Fun, Profit, and Clout
Or, I Just Saw People I Know To Be Reasonable Fall For A Fake "Ripoff" And Now I'm Going To Gently Demonstrate What Really Happened
So, we all know what people say about AI. It's just an automatic collage machine, it's stealing your data (as if the rest of the mainstream internet isn't - seriously, we should be using that knee-jerk disgust response to demand better internet privacy laws rather than try to beef up copyright so that compliance has to come at the beginning rather than the end of the process and you can be sued on suspicion of referencing, but I digress...), it can't create anything novel, some people go so far as to claim it's not even synthesizing anything, but just acting as a search engine and returning something run through a filter and "proving" it by "searching" for their own art and "finding" it.
And those are blatant lies.
The thing is, the reason AI is such a breakthrough - and the reason we memed with it so hard when DALL-E Mini and DALL-E 2 first dropped - is because it CAN create novel output. Because it CAN visualize the absurd ideas that no one has ever posted to the internet before. In fact, it would be a bigger breakthrough in computer science if we DID come up with an automatic collage machine - something that knows where to cut out a part of one image and paste it onto another, then smooth out the lighting and colors to make them fairly consistent, to make it look like what we would recognize as an image we're asking for? That would make the denoising algorithm on steroids that a diffusion model is look like child's play.
But, unlike the posts that claim that they're just acting as a collage maker at best and a search engine at worst, I'm not going to ask you to take my word for it (and stick a pin in this point, we'll come back to it later). I'm going to ask you to go to Simple Stable (or Craiyon, or the Karlo demo, if Google Colab feels too complicated for you - or if you like, do all of the above) and throw in a shitpost prompt or two. Ask for a velociraptor carousel pony ridden by a bunny. Ask for Godzilla fighting a wacky waving inflatable arm flailing tube man. Ask for an oil painting of a capybara wearing an ornate princess gown. Shitpost with it like we did before these myths took hold.
Now take your favorite result(s) and reverse image search them. Did you get anything remotely similar to your generated image? Probably not!
So then, how did someone end up getting a near perfect recreation of their work? Was that just some kind of wacky, one-in-a-million coincidence?
Well - oh no, look at that, I asked it for a simplistic character drawing and it happened to me too, it just returned a drawing of mine that I never even uploaded, and it's the worst drawing I've done since the fifth grade even just to embarrass me! Oh no, what happened, did they change things right under my nose, has digital surveillance gotten even WORSE?? Look, see, here's the original on the left, compare it to the output on the right - scary!! They're training on the contents of your computer in real time now, aaaagh!!
Except, of course, for the fact that the entire paragraph above was a lie and I did this on purpose in a way no one could possibly recreate from a text prompt, even with a perfect description.
How?
See, some models have this nifty little function called img2img. It can be used for anything from guiding the composition of your final image with a roughly drawn layout, to turning a building into a dragon...to post-processing of a hand-drawn image, to blatantly fucking lying about how AI works.
I took 5 minutes out of my day to crudely draw a character. I uploaded the image to this post. I saved the post as a draft. I stuck the image URL in the init_image field in Simple Stable, cranked the init strength up to 0.8, cleared all text prompts, and ran it. It did exactly what I told it to and tried to lightly refine the image I gave it.
If you see someone claiming that an AI stole their image with this kind of "proof", and the image they're comparing is not ITSELF a parody of an extremely well-known piece such as the Mona Lisa, or just so extremely generic that the level of similarity could be a coincidence (you/your favorite artist do/es not own the rule of thirds or basic fantasy creatures, just to name one family of example I've seen), this is what happened.
So from here you must realize that it is deeply insidious that posts that make these claims usually imply or even outright state that you should NOT try to recreate this but instead just take their word for it, stressing ~DON'T FEED THE MACHINE~. It's always some claim about "ohhh, the more you use them, the more they learn, I made a SACRIFICE so you don't have to" - but txt2img functions can't use your interaction to learn jack shit. There's no new information in a text prompt for them TO learn. Most img2img models can't learn from your input either, for that matter! I still recommend being careful about corporate img2img toys - we know that Facebook, for instance, is happy to try and beef up facial recognition for the WORST possible reasons - but if you're worried about your privacy and data harvesting, any given txt2img model is one of the least worrying things on the internet today.
So do be careful with your privacy online, and PLEASE use your very understandable knee-jerk horror response to how much extremely personal content can be found in training databases as a call to DEMAND better privacy laws ("do not track" should not be just for show ffs) and compliance with security protocols in fields that deal with very private information (COMMON CRAWL DOESN'T GO FAR OUT OF ITS WAY, IT SHOULD NEVER HAVE BEEN ABLE TO GET ANY MEDICAL IMAGES THE PATIENTS DIDN'T SHARE THEMSELVES HOLY SHIT, SOME HOSPITAL WORKERS AND/OR MEDICAL COMMUNICATIONS DEVELOPERS BETTER BE GETTING FIRED AND/OR SUED) - but don't just believe a convenient and easy-to-disprove lie because it aligns with that feeling.
419 notes
¡
View notes
Text
Prometheus Gave the Gift of Fire to Mankind. We Can't Give it Back, nor Should We.
AI. Artificial intelligence. Large Language Models. Learning Algorithms. Deep Learning. Generative Algorithms. Neural Networks. This technology has many names, and has been a polarizing topic in numerous communities online. By my observation, a lot of the discussion is either solely focused on A) how to profit off it or B) how to get rid of it and/or protect yourself from it. But to me, I feel both of these perspectives apply a very narrow usage lens on something that's more than a get rich quick scheme or an evil plague to wipe from the earth.
This is going to be long, because as someone whose degree is in psych and computer science, has been a teacher, has been a writing tutor for my younger brother, and whose fiance works in freelance data model training... I have a lot to say about this.
I'm going to address the profit angle first, because I feel most people in my orbit (and in related orbits) on Tumblr are going to agree with this: flat out, the way AI is being utilized by large corporations and tech startups -- scraping mass amounts of visual and written works without consent and compensation, replacing human professionals in roles from concept art to story boarding to screenwriting to customer service and more -- is unethical and damaging to the wellbeing of people, would-be hires and consumers alike. It's wasting energy having dedicated servers running nonstop generating content that serves no greater purpose, and is even pressing on already overworked educators because plagiarism just got a very new, harder to identify younger brother that's also infinitely more easy to access.
In fact, ChatGPT is such an issue in the education world that plagiarism-detector subscription services that take advantage of how overworked teachers are have begun paddling supposed AI-detectors to schools and universities. Detectors that plainly DO NOT and CANNOT work, because the difference between "A Writer Who Writes Surprisingly Well For Their Age" is indistinguishable from "A Language Replicating Algorithm That Followed A Prompt Correctly", just as "A Writer Who Doesn't Know What They're Talking About Or Even How To Write Properly" is indistinguishable from "A Language Replicating Algorithm That Returned Bad Results". What's hilarious is that the way these "detectors" work is also run by AI.
(to be clear, I say plagiarism detectors like TurnItIn.com and such are predatory because A) they cost money to access advanced features that B) often don't work properly or as intended with several false flags, and C) these companies often are super shady behind the scenes; TurnItIn for instance has been involved in numerous lawsuits over intellectual property violations, as their services scrape (or hopefully scraped now) the papers submitted to the site without user consent (or under coerced consent if being forced to use it by an educator), which it uses in can use in its own databases as it pleases, such as for training the AI detecting AI that rarely actually detects AI.)
The prevalence of visual and lingustic generative algorithms is having multiple, overlapping, and complex consequences on many facets of society, from art to music to writing to film and video game production, and even in the classroom before all that, so it's no wonder that many disgruntled artists and industry professionals are online wishing for it all to go away and never come back. The problem is... It can't. I understand that there's likely a large swath of people saying that who understand this, but for those who don't: AI, or as it should more properly be called, generative algorithms, didn't just show up now (they're not even that new), and they certainly weren't developed or invented by any of the tech bros peddling it to megacorps and the general public.
Long before ChatGPT and DALL-E came online, generative algorithms were being used by programmers to simulate natural processes in weather models, shed light on the mechanics of walking for roboticists and paleontologists alike, identified patterns in our DNA related to disease, aided in complex 2D and 3D animation visuals, and so on. Generative algorithms have been a part of the professional world for many years now, and up until recently have been a general force for good, or at the very least a force for the mundane. It's only recently that the technology involved in creating generative algorithms became so advanced AND so readily available, that university grad students were able to make the publicly available projects that began this descent into madness.
Does anyone else remember that? That years ago, somewhere in the late 2010s to the beginning of the 2020s, these novelty sites that allowed you to generate vague images from prompts, or generate short stylistic writings from a short prompt, were popping up with University URLs? Oftentimes the queues on these programs were hours long, sometimes eventually days or weeks or months long, because of how unexpectedly popular this concept was to the general public. Suddenly overnight, all over social media, everyone and their grandma, and not just high level programming and arts students, knew this was possible, and of course, everyone wanted in. Automated art and writing, isn't that neat? And of course, investors saw dollar signs. Simply scale up the process, scrape the entire web for data to train the model without advertising that you're using ALL material, even copyrighted and personal materials, and sell the resulting algorithm for big money. As usual, startup investors ruin every new technology the moment they can access it.
To most people, it seemed like this magic tech popped up overnight, and before it became known that the art assets on later models were stolen, even I had fun with them. I knew how learning algorithms worked, if you're going to have a computer make images and text, it has to be shown what that is and then try and fail to make its own until it's ready. I just, rather naively as I was still in my early 20s, assumed that everything was above board and the assets were either public domain or fairly licensed. But when the news did came out, and when corporations started unethically implementing "AI" in everything from chatbots to search algorithms to asking their tech staff to add AI to sliced bread, those who were impacted and didn't know and/or didn't care where generative algorithms came from wanted them GONE. And like, I can't blame them. But I also quietly acknowledged to myself that getting rid of a whole technology is just neither possible nor advisable. The cat's already out of the bag, the genie has left its bottle, the Pandorica is OPEN. If we tried to blanket ban what people call AI, numerous industries involved in making lives better would be impacted. Because unfortunately the same tool that can edit selfies into revenge porn has also been used to identify cancer cells in patients and aided in decoding dead languages, among other things.
When, in Greek myth, Prometheus gave us the gift of fire, he gave us both a gift and a curse. Fire is so crucial to human society, it cooks our food, it lights our cities, it disposes of waste, and it protects us from unseen threats. But fire also destroys, and the same flame that can light your home can burn it down. Surely, there were people in this mythic past who hated fire and all it stood for, because without fire no forest would ever burn to the ground, and surely they would have called for fire to be given back, to be done away with entirely. Except, there was no going back. The nature of life is that no new element can ever be undone, it cannot be given back.
So what's the way forward, then? Like, surely if I can write a multi-paragraph think piece on Tumblr.com that next to nobody is going to read because it's long as sin, about an unpopular topic, and I rarely post original content anyway, then surely I have an idea of how this cyberpunk dystopia can be a little less.. Dys. Well I do, actually, but it's a long shot. Thankfully, unlike business majors, I actually had to take a cyber ethics course in university, and I actually paid attention. I also passed preschool where I learned taking stuff you weren't given permission to have is stealing, which is bad. So the obvious solution is to make some fucking laws to limit the input on data model training on models used for public products and services. It's that simple. You either use public domain and licensed data only or you get fined into hell and back and liable to lawsuits from any entity you wronged, be they citizen or very wealthy mouse conglomerate (suing AI bros is the only time Mickey isn't the bigger enemy). And I'm going to be honest, tech companies are NOT going to like this, because not only will it make doing business more expensive (boo fucking hoo), they'd very likely need to throw out their current trained datasets because of the illegal components mixed in there. To my memory, you can't simply prune specific content from a completed algorithm, you actually have to redo rhe training from the ground up because the bad data would be mixed in there like gum in hair. And you know what, those companies deserve that. They deserve to suffer a punishment, and maybe fold if they're young enough, for what they've done to creators everywhere. Actually, laws moving forward isn't enough, this needs to be retroactive. These companies need to be sued into the ground, honestly.
So yeah, that's the mess of it. We can't unlearn and unpublicize any technology, even if it's currently being used as a tool of exploitation. What we can do though is demand ethical use laws and organize around the cause of the exclusive rights of individuals to the content they create. The screenwriter's guild, actor's guild, and so on already have been fighting against this misuse, but given upcoming administration changes to the US, things are going to get a lot worse before thet get a little better. Even still, don't give up, have clear and educated goals, and focus on what you can do to affect change, even if right now that's just individual self-care through mental and physical health crises like me.
#ai#artificial intelligence#generative algorithms#llm#large language model#chatgpt#ai art#ai writing#kanguin original
9 notes
¡
View notes
Text
Researchers reduce bias in AI models while preserving or improving accuracy
New Post has been published on https://thedigitalinsider.com/researchers-reduce-bias-in-ai-models-while-preserving-or-improving-accuracy/
Researchers reduce bias in AI models while preserving or improving accuracy
Machine-learning models can fail when they try to make predictions for individuals who were underrepresented in the datasets they were trained on.
For instance, a model that predicts the best treatment option for someone with a chronic disease may be trained using a dataset that contains mostly male patients. That model might make incorrect predictions for female patients when deployed in a hospital.
To improve outcomes, engineers can try balancing the training dataset by removing data points until all subgroups are represented equally. While dataset balancing is promising, it often requires removing large amount of data, hurting the modelâs overall performance.
MIT researchers developed a new technique that identifies and removes specific points in a training dataset that contribute most to a modelâs failures on minority subgroups. By removing far fewer datapoints than other approaches, this technique maintains the overall accuracy of the model while improving its performance regarding underrepresented groups.
In addition, the technique can identify hidden sources of bias in a training dataset that lacks labels. Unlabeled data are far more prevalent than labeled data for many applications.
This method could also be combined with other approaches to improve the fairness of machine-learning models deployed in high-stakes situations. For example, it might someday help ensure underrepresented patients arenât misdiagnosed due to a biased AI model.
âMany other algorithms that try to address this issue assume each datapoint matters as much as every other datapoint. In this paper, we are showing that assumption is not true. There are specific points in our dataset that are contributing to this bias, and we can find those data points, remove them, and get better performance,â says Kimia Hamidieh, an electrical engineering and computer science (EECS) graduate student at MIT and co-lead author of a paper on this technique.
She wrote the paper with co-lead authors Saachi Jain PhD â24 and fellow EECS graduate student Kristian Georgiev; Andrew Ilyas MEng â18, PhD â23, a Stein Fellow at Stanford University; and senior authors Marzyeh Ghassemi, an associate professor in EECS and a member of the Institute of Medical Engineering Sciences and the Laboratory for Information and Decision Systems, and Aleksander Madry, the Cadence Design Systems Professor at MIT. The research will be presented at the Conference on Neural Information Processing Systems.
Removing bad examples
Often, machine-learning models are trained using huge datasets gathered from many sources across the internet. These datasets are far too large to be carefully curated by hand, so they may contain bad examples that hurt model performance.
Scientists also know that some data points impact a modelâs performance on certain downstream tasks more than others.
The MIT researchers combined these two ideas into an approach that identifies and removes these problematic datapoints. They seek to solve a problem known as worst-group error, which occurs when a model underperforms on minority subgroups in a training dataset.
The researchersâ new technique is driven by prior work in which they introduced a method, called TRAK, that identifies the most important training examples for a specific model output.
For this new technique, they take incorrect predictions the model made about minority subgroups and use TRAK to identify which training examples contributed the most to that incorrect prediction.
âBy aggregating this information across bad test predictions in the right way, we are able to find the specific parts of the training that are driving worst-group accuracy down overall,â Ilyas explains.
Then they remove those specific samples and retrain the model on the remaining data.
Since having more data usually yields better overall performance, removing just the samples that drive worst-group failures maintains the modelâs overall accuracy while boosting its performance on minority subgroups.
A more accessible approach
Across three machine-learning datasets, their method outperformed multiple techniques. In one instance, it boosted worst-group accuracy while removing about 20,000 fewer training samples than a conventional data balancing method. Their technique also achieved higher accuracy than methods that require making changes to the inner workings of a model.
Because the MIT method involves changing a dataset instead, it would be easier for a practitioner to use and can be applied to many types of models.
It can also be utilized when bias is unknown because subgroups in a training dataset are not labeled. By identifying datapoints that contribute most to a feature the model is learning, they can understand the variables it is using to make a prediction.
âThis is a tool anyone can use when they are training a machine-learning model. They can look at those datapoints and see whether they are aligned with the capability they are trying to teach the model,â says Hamidieh.
Using the technique to detect unknown subgroup bias would require intuition about which groups to look for, so the researchers hope to validate it and explore it more fully through future human studies.
They also want to improve the performance and reliability of their technique and ensure the method is accessible and easy-to-use for practitioners who could someday deploy it in real-world environments.
âWhen you have tools that let you critically look at the data and figure out which datapoints are going to lead to bias or other undesirable behavior, it gives you a first step toward building models that are going to be more fair and more reliable,â Ilyas says.
This work is funded, in part, by the National Science Foundation and the U.S. Defense Advanced Research Projects Agency.
#000#ai#ai model#AI models#Algorithms#applications#approach#Artificial Intelligence#author#Behavior#Bias#Building#cadence#chronic disease#computer#Computer Science#Computer Science and Artificial Intelligence Laboratory (CSAIL)#Computer science and technology#conference#data#datasets#defense#Defense Advanced Research Projects Agency (DARPA)#Design#Disease#driving#easy#Electrical engineering and computer science (EECS)#engineering#engineers
0 notes
Text
AI helps distinguish dark matter from cosmic noise
Dark matter is the invisible force holding the universe together â or so we think. It makes up around 85% of all matter and around 27% of the universeâs contents, but since we canât see it directly, we have to study its gravitational effects on galaxies and other cosmic structures. Despite decades of research, the true nature of dark matter remains one of scienceâs most elusive questions.
According to a leading theory, dark matter might be a type of particle that barely interacts with anything else, except through gravity. But some scientists believe these particles could occasionally interact with each other, a phenomenon known as self-interaction. Detecting such interactions would offer crucial clues about dark matterâs properties.
However, distinguishing the subtle signs of dark matter self-interactions from other cosmic effects, like those caused by active galactic nuclei (AGN) â the supermassive black holes at the centers of galaxies â has been a major challenge. AGN feedback can push matter around in ways that are similar to the effects of dark matter, making it difficult to tell the two apart.
In a significant step forward, astronomer David Harvey at EPFLâs  Laboratory of Astrophysics has developed a deep-learning algorithm that can untangle these complex signals. Their AI-based method is designed to differentiate between the effects of dark matter self-interactions and those of AGN feedback by analyzing images of galaxy clusters â vast collections of galaxies bound together by gravity. The innovation promises to greatly enhance the precision of dark matter studies.
Harvey trained a Convolutional Neural Network (CNN) â a type of AI that is particularly good at recognizing patterns in images â with images from the BAHAMAS-SIDM project, which models galaxy clusters under different dark matter and AGN feedback scenarios. By being fed thousands of simulated galaxy cluster images, the CNN learned to distinguish between the signals caused by dark matter self-interactions and those caused by AGN feedback.
Among the various CNN architectures tested, the most complex - dubbed âInceptionâ â proved to also be the most accurate. The AI was trained on two primary dark matter scenarios, featuring different levels of self-interaction, and validated on additional models, including a more complex, velocity-dependent dark matter model.
Inceptionachieved an impressive accuracy of 80% under ideal conditions, effectively identifying whether galaxy clusters were influenced by self-interacting dark matter or AGN feedback. It maintained is high performance even when the researchers introduced realistic observational noise that mimics the kind of data we expect from future telescopes like Euclid.
What this means is that Inception â and the AI approach more generally â could prove incredibly useful for analyzing the massive amounts of data we collect from space. Moreover, the AIâs ability to handle unseen data indicates that itâs adaptable and reliable, making it a promising tool for future dark matter research.
AI-based approaches like Inception could significantly impact our understanding of what dark matter actually is. As new telescopes gather unprecedented amounts of data, this method will help scientists sift through it quickly and accurately, potentially revealing the true nature of dark matter.
10 notes
¡
View notes
Text
This day in history
#15yrsago EULAs + Arbitration = endless opportunity for abuse https://archive.org/details/TheUnconcionabilityOfArbitrationAgreementsInEulas
#15yrsago Wikipediaâs facts-about-facts make the impossible real https://web.archive.org/web/20091116023225/http://www.make-digital.com/make/vol20/?pg=16
#10yrsago Youtube nukes 7 hoursâ worth of science symposium audio due to background music during lunch break https://memex.craphound.com/2014/11/25/youtube-nukes-7-hours-worth-of-science-symposium-audio-due-to-background-music-during-lunch-break/
#10yrsago El Deafo: moving, fresh YA comic-book memoir about growing up deaf https://memex.craphound.com/2014/11/25/el-deafo-moving-fresh-ya-comic-book-memoir-about-growing-up-deaf/
#5yrsago Networked authoritarianism may contain the seeds of its own undoing https://crookedtimber.org/2019/11/25/seeing-like-a-finite-state-machine/
#5yrsago After Katrina, neoliberals replaced New Orleansâ schools with charters, which are now failing https://www.nola.com/news/education/article_0c5918cc-058d-11ea-aa21-d78ab966b579.html
#5yrsago Talking about Disneyâs 1964 Carousel of Progress with Bleeding Cool: our lost animatronic future https://bleedingcool.com/pop-culture/castle-talk-cory-doctorow-on-disneys-carousel-of-progress-and-lost-optimism/
#5yrsago Tiny alterations in training data can introduce âbackdoorsâ into machine learning models https://arxiv.org/abs/1903.06638
#5yrsago Leaked documents document Chinaâs plan for mass arrests and concentration-camp internment of Uyghurs and other ethnic minorities in Xinjiang https://www.icij.org/investigations/china-cables/exposed-chinas-operating-manuals-for-mass-internment-and-arrest-by-algorithm/
#5yrsago Hong Kong elections: overconfident Beijing loyalist parties suffer a near-total rout https://www.scmp.com/news/hong-kong/politics/article/3039132/results-blog
#5yrsago Library Socialism: a utopian vision of a sustaniable, luxuriant future of circulating abundance https://memex.craphound.com/2019/11/25/library-socialism-a-utopian-vision-of-a-sustaniable-luxuriant-future-of-circulating-abundance/
#1yrago The moral injury of having your work enshittified https://pluralistic.net/2023/11/25/moral-injury/#enshittification
7 notes
¡
View notes
Note
Hi! Iâm a student currently learning computer science in college and would love it if you had any advice for a cool personal project to do? Thanks!
Personal Project Ideas
Hiya!! đ
It's so cool that you're a computer science student, and with that, you have plenty of options for personal projects that can help with learning more from what they teach you at college. I don't have any experience being a university student however đ
Someone asked me a very similar question before because I shared my projects list and they asked how I come up with project ideas - maybe this can inspire you too, here's the link to the post [LINK]
However, I'll be happy to share some ideas with you right now. Just a heads up: you can alter the projects to your own specific interests or goals in mind. Though it's a personal project meaning not an assignment from school, you can always personalise it to yourself as well! Also, I don't know the level you are, e.g. beginner or you're pretty confident in programming, if the project sounds hard, try to simplify it down - no need to go overboard!!
But here is the list I came up with (some are from my own list):
Personal Finance Tracker
A web app that tracks personal finances by integrating with bank APIs. You can use Python with Flask for the backend and React for the frontend. I think this would be great for learning how to work with APIs and how to build web applications đŚ
Online Food Ordering System
A web app that allows users to order food from a restaurant's menu. You can use PHP with Laravel for the backend and Vue.js for the frontend. This helps you learn how to work with databases (a key skill I believe) and how to build interactive user interfaces đđž
Movie Recommendation System
I see a lot of developers make this on Twitter and YouTube. It's a machine-learning project that recommends movies to users based on their past viewing habits. You can use Python with Pandas, Scikit-learn, and TensorFlow for the machine learning algorithms. Obviously, this helps you learn about how to build machine-learning models, and how to use libraries for data manipulation and analysis đ
Image Recognition App
This is more geared towards app development if you're interested! It's an Android app that uses image recognition to identify objects in a photo. You can use Java or Kotlin for the Android development and TensorFlow for machine learning algorithms. Learning how to work with image recognition and how to build mobile applications - which is super cool đ
Social Media Platform
(I really want to attempt this one soon) A web app that allows users to post, share, and interact with each other's content. Come up with a cool name for it! You can use Ruby on Rails for the backend and React for the frontend. This project would be great for learning how to build full-stack web applications (a plus cause that's a trend that companies are looking for in developers) and how to work with user authentication and authorization (another plus)! ���
Text-Based Adventure Game
If you're interested in game developments, you could make a simple game where users make choices and navigate through a story by typing text commands. You can use Python for the game logic and a library like Pygame for the graphics. This project would be great for learning how to build games and how to work with input/output. đŽ
Weather App
Pretty simple project - I did this for my apprenticeship and coding night classes! It's a web app that displays weather information for a user's location. You can use Node.js with Express for the backend and React for the frontend. Working with APIs again, how to handle asynchronous programming, and how to build responsive user interfaces! đ
Online Quiz Game
A web app that allows users to take quizzes and compete with other players. You could personalise it to a module you're studying right now - making a whole quiz application for it will definitely help you study! You can use PHP with Laravel for the backend and Vue.js for the frontend. You get to work with databases, build real-time applications, and maybe work with user authentication. đ§Ž
Chatbot
(My favourite, I'm currently planning for this one!) A chatbot that can answer user questions and provide information. You can use Python with Flask for the backend and a natural language processing library like NLTK for the chatbot logic. If you want to mauke it more beginner friendly, you could use HTML, CSS and JavaScript and have hard-coded answers set, maybe use a bunch of APIs for the answers etc! This project would be great because you get to learn how to build chatbots, and how to work with natural language processing - if you go that far! đ¤
Another place I get inspiration for more web frontend dev projects is on Behance and Pinterest - on Pinterest search for like "Web design" or "[Specific project] web design e.g. shopping web design" and I get inspiration from a bunch of pins I put together! Maybe try that out!
I hope this helps and good luck with your project!
#my asks#resources#programming#coding#studying#codeblr#progblr#studyblr#comp sci#computer science#projects ideas#coding projects#coding study#cs studyblr#cs academia
178 notes
¡
View notes
Text
The crazy thing about AI is I spent the summer of 2021 learning basic machine learning i.e. feed forward neural networks and convolutional neural networks to apply to science. People had models they'd trained on cat images to try to create new cat images and they looked awful but it was fun. And it was also beginning to look highly applicable to the field I'm in (astronomy/cosmology) where you will have an immense amount of 2d image data that can't easily be parsed with an algorithm and would take too many hours for a human to sift through. I was genuinely kinda excited about it then. But seemingly in a blink of an eye we have these chat bots and voice AI and thieving art AI and it's all deadset on this rapid acceleration into cyberpunk dystopia capitalist hellscape and I hate it hate hate it
33 notes
¡
View notes
Text
AI & IT'S IMPACT
Unleashing the Power: The Impact of AI Across Industries and Future Frontiers
Artificial Intelligence (AI), once confined to the realm of science fiction, has rapidly become a transformative force across diverse industries. Its influence is reshaping the landscape of how businesses operate, innovate, and interact with their stakeholders. As we navigate the current impact of AI and peer into the future, it's evident that the capabilities of this technology are poised to reach unprecedented heights.
1. Healthcare:
In the healthcare sector, AI is a game-changer, revolutionizing diagnostics, treatment plans, and patient care. Machine learning algorithms analyze vast datasets to identify patterns, aiding in early disease detection. AI-driven robotic surgery is enhancing precision, reducing recovery times, and minimizing risks. Personalized medicine, powered by AI, tailors treatments based on an individual's genetic makeup, optimizing therapeutic outcomes.
2. Finance:
AI is reshaping the financial industry by enhancing efficiency, risk management, and customer experiences. Algorithms analyze market trends, enabling quicker and more accurate investment decisions. Chatbots and virtual assistants powered by AI streamline customer interactions, providing real-time assistance. Fraud detection algorithms work tirelessly to identify suspicious activities, bolstering security measures in online transactions.
3. Manufacturing:
In manufacturing, AI is optimizing production processes through predictive maintenance and quality control. Smart factories leverage AI to monitor equipment health, reducing downtime by predicting potential failures. Robots and autonomous systems, guided by AI, enhance precision and efficiency in tasks ranging from assembly lines to logistics. This not only increases productivity but also contributes to safer working environments.
4. Education:
AI is reshaping the educational landscape by personalizing learning experiences. Adaptive learning platforms use AI algorithms to tailor educational content to individual student needs, fostering better comprehension and engagement. AI-driven tools also assist educators in grading, administrative tasks, and provide insights into student performance, allowing for more effective teaching strategies.
5. Retail:
In the retail sector, AI is transforming customer experiences through personalized recommendations and efficient supply chain management. Recommendation engines analyze customer preferences, providing targeted product suggestions. AI-powered chatbots handle customer queries, offering real-time assistance. Inventory management is optimized through predictive analytics, reducing waste and ensuring products are readily available.
6. Future Frontiers:
A. Autonomous Vehicles: The future of transportation lies in AI-driven autonomous vehicles. From self-driving cars to automated drones, AI algorithms navigate and respond to dynamic environments, ensuring safer and more efficient transportation. This technology holds the promise of reducing accidents, alleviating traffic congestion, and redefining mobility.
B. Quantum Computing: As AI algorithms become more complex, the need for advanced computing capabilities grows. Quantucm omputing, with its ability to process vast amounts of data at unprecedented speeds, holds the potential to revolutionize AI. This synergy could unlock new possibilities in solving complex problems, ranging from drug discovery to climate modeling.
C. AI in Creativity: AI is not limited to data-driven tasks; it's also making inroads into the realm of creativity. AI-generated art, music, and content are gaining recognition. Future developments may see AI collaborating with human creators, pushing the boundaries of what is possible in fields traditionally associated with human ingenuity.
In conclusion, the impact of AI across industries is profound and multifaceted. From enhancing efficiency and precision to revolutionizing how we approach complex challenges, AI is at the forefront of innovation. The future capabilities of AI hold the promise of even greater advancements, ushering in an era where the boundaries of what is achievable continue to expand. As businesses and industries continue to embrace and adapt to these transformative technologies, the synergy between human intelligence and artificial intelligence will undoubtedly shape a future defined by unprecedented possibilities.
19 notes
¡
View notes
Text
Predictive Modelling and Risk Forecasting of Crime Using Linked ID
Using advanced data mining and statistical methodology to identify individuals at risk and prevent crime in your community Crime fighting is a major concern for governments and law enforcement agencies worldwide. To combat crime, various interventions such as social programs and credit assessments have been implemented. However, these interventions are not always effective in preventing crime.âŚ
View On WordPress
#community#Algorithm#Analytics#data#data linkage#data science#datamining#Statistical Modelling#Statistics
0 notes