Tumgik
#DataSets
jcmarchi · 6 months
Text
Unlocking mRNA’s cancer-fighting potential
New Post has been published on https://thedigitalinsider.com/unlocking-mrnas-cancer-fighting-potential/
Unlocking mRNA’s cancer-fighting potential
Tumblr media Tumblr media
What if training your immune system to attack cancer cells was as easy as training it to fight Covid-19? Many people believe the technology behind some Covid-19 vaccines, messenger RNA, holds great promise for stimulating immune responses to cancer.
But using messenger RNA, or mRNA, to get the immune system to mount a prolonged and aggressive attack on cancer cells — while leaving healthy cells alone — has been a major challenge.
The MIT spinout Strand Therapeutics is attempting to solve that problem with an advanced class of mRNA molecules that are designed to sense what type of cells they encounter in the body and to express therapeutic proteins only once they have entered diseased cells.
“It’s about finding ways to deal with the signal-to-noise ratio, the signal being expression in the target tissue and the noise being expression in the nontarget tissue,” Strand CEO Jacob Becraft PhD ’19 explains. “Our technology amplifies the signal to express more proteins for longer while at the same time effectively eliminating the mRNA’s off-target expression.”
Strand is set to begin its first clinical trial in April, which is testing a proprietary, self-replicating mRNA molecule’s ability to express immune signals directly from a tumor, eliciting the immune system to attack and kill the tumor cells directly. It’s also being tested as a possible improvement for existing treatments to a number of solid tumors.
As they work to commercialize its early innovations, Strand’s team is continuing to add capabilities to what it calls its “programmable medicines,” improving mRNA molecules’ ability to sense their environment and generate potent, targeted responses where they’re needed most.
“Self-replicating mRNA was the first thing that we pioneered when we were at MIT and in the first couple years at Strand,” Becraft says. “Now we’ve also moved into approaches like circular mRNAs, which allow each molecule of mRNA to express more of a protein for longer, potentially for weeks at a time. And the bigger our cell-type specific datasets become, the better we are at differentiating cell types, which makes these molecules so targeted we can have a higher level of safety at higher doses and create stronger treatments.”
Making mRNA smarter
Becraft got his first taste of MIT as an undergraduate at the University of Illinois when he secured a summer internship in the lab of MIT Institute Professor Bob Langer.
“That’s where I learned how lab research could be translated into spinout companies,” Becraft recalls.
The experience left enough of an impression on Becraft that he returned to MIT the next fall to earn his PhD, where he worked in the Synthetic Biology Center under professor of bioengineering and electrical engineering and computer science Ron Weiss. During that time, he collaborated with postdoc Tasuku Kitada to create genetic “switches” that could control protein expression in cells.
Becraft and Kitada realized their research could be the foundation of a company around 2017 and started spending time in the Martin Trust Center for MIT Entrepreneurship. They also received support from MIT Sandbox and eventually worked with the Technology Licensing Office to establish Strand’s early intellectual property.
“We started by asking, where is the highest unmet need that also allows us to prove out the thesis of this technology? And where will this approach have therapeutic relevance that is a quantum leap forward from what anyone else is doing?” Becraft says. “The first place we looked was oncology.”
People have been working on cancer immunotherapy, which turns a patient’s immune system against cancer cells, for decades. Scientists in the field have developed drugs that produce some remarkable results in patients with aggressive, late-stage cancers. But most next-generation cancer immunotherapies are based on recombinant (lab-made) proteins that are difficult to deliver to specific targets in the body and don’t remain active for long enough to consistently create a durable response.
More recently, companies like Moderna, whose founders also include MIT alumni, have pioneered the use of mRNAs to create proteins in cells. But to date, those mRNA molecules have not been able to change behavior based on the type of cells they enter, and don’t last for very long in the body.
“If you’re trying to engage the immune system with a tumor cell, the mRNA needs to be expressing from the tumor cell itself, and it needs to be expressing over a long period of time,” Becraft says. “Those challenges are hard to overcome with the first generation of mRNA technologies.”
Strand has developed what it calls the world’s first mRNA programming language that allows the company to specify the tissues its mRNAs express proteins in.
“We built a database that says, ‘Here are all of the different cells that the mRNA could be delivered to, and here are all of their microRNA signatures,’ and then we use computational tools and machine learning to differentiate the cells,” Becraft explains. “For instance, I need to make sure that the messenger RNA turns off when it’s in the liver cell, and I need to make sure that it turns on when it’s in a tumor cell or a T-cell.”
Strand also uses techniques like mRNA self-replication to create more durable protein expression and immune responses.
“The first versions of mRNA therapeutics, like the Covid-19 vaccines, just recapitulate how our body’s natural mRNAs work,” Becraft explains. “Natural mRNAs last for a few days, maybe less, and they express a single protein. They have no context-dependent actions. That means wherever the mRNA is delivered, it’s only going to express a molecule for a short period of time. That’s perfect for a vaccine, but it’s much more limiting when you want to create a protein that’s actually engaging in a biological process, like activating an immune response against a tumor that could take many days or weeks.”
Technology with broad potential
Strand’s first clinical trial is targeting solid tumors like melanoma and triple-negative breast cancer. The company is also actively developing mRNA therapies that could be used to treat blood cancers.
“We’ll be expanding into new areas as we continue to de-risk the translation of the science and create new technologies,” Becraft says.
Strand plans to partner with large pharmaceutical companies as well as investors to continue developing drugs. Further down the line, the founders believe future versions of its mRNA therapies could be used to treat a broad range of diseases.
“Our thesis is: amplified expression in specific, programmed target cells for long periods of time,” Becraft says. “That approach can be utilized for [immunotherapies like] CAR T-cell therapy, both in oncology and autoimmune conditions. There are also many diseases that require cell-type specific delivery and expression of proteins in treatment, everything from kidney disease to types of liver disease. We can envision our technology being used for all of that.”
7 notes · View notes
jbfly46 · 1 year
Text
I bleed revolution. If your only anarchist actions are related to union organizing, then you’re not an anarchist, you’re a corporate puppet. Everything you do should work to subvert the current and future actions of the state and all of their tentacle corporate affiliations. If your only goal in life is to work under the orders of someone else, under someone’s else’s direction, with someone else’s instructions, then you’re not a human being. You’re chattel cattle at best. If a corporate pig tells or wants you to do something, then you should do the exact opposite, or else you’re just a pawn in a game of global corporate chess. Every one of your actions should be both a defensive and offensive maneuver. If you defend while you attack, you become one with your true purpose, which is to dismantle the state and all corporate authority. If you don’t think in a linear manner, then you’re not apart of their datasets, and they can’t predict your next move. You operate from outside of their datasets and what they think is your next move is never your next move. Then they start to doubt their own intelligence and all the false assumptions it’s based on, and the system starts to crumble. You use any means necessary, because that is your constitutional right, just as they use any means necessary to hold onto the power they stole from you. They stole your birthright, and it’s your legal duty as an American citizen to seek a redress of your grievances, using whatever it takes. Under no pretext.
9 notes · View notes
neuralnetworkdatasets · 10 months
Text
3 notes · View notes
nikitricky · 10 months
Text
youtube
Ever wondered what the datasets used to train AI look like? This video is a subset of ImageNet-1k (18k images) with some other metrics.
Read more on how I made it and see some extra visualizations.
Okay! I'll split this up by the elements in the video, but first I need to add some context about
The dataset
ImageNet-1k (aka ILSVRC 2012) is an image classification dataset - you have a set number of classes (in this case 1000) and each class has a set of images. This is the most popular version of ImageNet, which usually has 21000 classes.
ImageNet was made using nouns from WordNet, searched online. From 2010 to 2017 yearly competitions were held to determine the best image classification model. It has greatly benefitted computer vision, developing model architectures that you've likely used unknowingly. See the accuracy progression here.
ResNet
Residual Network (or ResNet) is an architecture for image recognition made in 2015, trying to fix "vanishing/exploding gradients" (read the paper here). It managed to achieve an accuracy of 96.43% (that's 96 thousand times better than randomly guessing!), winning first place back in 2015. I'll be using a smaller version of this model (ResNet-50), boasting an accuracy of 95%.
The scatter plot
If you look at the video long enough, you'll realize that similar images (eg. dogs, types of food) will be closer together than unrelated ones. This is achieved using two things: image embeddings and dimensionality reduction.
Image embeddings
In short, image embeddings are points in an n-dimensional space (read this post for more info on higher dimensions), in this case, made from chopping off the last layer from ResNet-50, producing a point in 1024-dimensional space.
The benefit of doing all of that than just comparing pixels between two images is that the model (specifically made for classification) only looks for features that would make the classification easier (preserving semantic information). For instance - you have 3 images of dogs, two of them are the same breed, but the first one looks more similar to the other one (eg. matching background). If you compare the pixels, the first and third images would be closer, but if you use embeddings the first and second ones would be closer because of the matching breeds.
Dimensionality reduction
Now we have all these image embeddings that are grouped by semantic (meaning) similarity and we want to visualize them. But how? You can't possibly display a 1024-dimensional scatter plot to someone and for them to understand it. That's where dimensionality reduction comes into play. In this case, we're reducing 1024 dimensions to 2 using an algorithm called t-SNE. Now the scatter plot will be something we mere mortals can comprehend.
Extra visualizations
Here's the scatter plot in HD:
Tumblr media
This idea actually comes from an older project where I did this on a smaller dataset (about 8k images). The results were quite promising! You can see how each of the 8 classes is neatly separated, plus how differences in the subject's angle, surroundings, and color.
Tumblr media
Find the full-resolution image here
Similar images
I just compared every point to every other point (in the 2d space, It would be too computationally expensive otherwise) and got the 6 closest points to that. You can see when the model incorrectly classifies something if the related images are not similar to the one presented (eg. there's an image of a payphone but all of the similar images are bridges).
Pixel rarity
This one was pretty simple, I used a script to count the occurrences of pixel colors. Again, this idea comes from an older project, where I counted the entirety of the dataset, so I just used that.
Extra visualization
Here are all the colors that appeared in the image, sorted by popularity, left to right, up to down
Tumblr media
Some final stuff
MP means Megapixel (one million pixels) - a 1000x1000 image is one megapixel big (it has one million pixels)
That's all, thanks for reading. Feel free to ask questions and I'll try my best to respond to them.
3 notes · View notes
titleknown · 2 years
Photo
Tumblr media
So, using AI art, I asked NightCafe to make Hordak from He-Man/She-Ra in the style of Yoshitaka Amano & Tetsuya Nomura.
You will note that, while this design looks very cool, it also does not look much like Hordak.
Which, I did not get, until friend of the blog @therobotmonster showed me the dataset, and it turns out a lot of not-Hordak things were very much correlated by the AI with Hordak.
Which, I think shows some interesting things about the nature of AI Art and datasets, really.
23 notes · View notes
edujournalblogs · 1 year
Text
Data Cleaning in Data Science
Tumblr media
Data cleaning is an integral part of data preprocessing viz., removing or correcting inaccurate information within a data set. This could mean missing data, spelling mistakes, and duplicates to name a few issues. Inaccurate information can lead to issues during analysis phase if not properly addressed at the earlier stages.
Data Cleaning vs Data Wrangling : Data cleaning focuses on fixing inaccuracies within your data set. Data wrangling, on the other hand, is concerned with converting the data’s format into one that can be accepted and processed by a machine learning model.
Data Cleaning steps to follow :
Remove irrelevant data
Resolve any duplicates issues
Correct structural errors if any
Deal with missing fields in the dataset
Zone in on any data outliers and remove them
Validate your data
At EduJournal, we understand the importance of gaining practical skills and industry-relevant knowledge to succeed in the field of data analytics / data science. Our certified program in data science and data analytics is designed to equip freshers / experienced with the necessary expertise and hands-on experience experience so they are well equiped for the job.
URL : http://www.edujournal.com
2 notes · View notes
joe-england · 2 years
Link
2 notes · View notes
analyticspursuit · 2 years
Text
The 5 Free Dataset Sources for Data Analytics Projects
In this video, I'm sharing the five free dataset sources that are perfect for data analytics projects. By using these free datasets, you'll be able to create powerful data analytics projects in no time! Dataset sources are essential for data analytics projects, and these five free dataset sources will help you get started quickly.
By using these sources, you'll be able to collect data from a variety of sources and crunch the numbers with ease. So be sure to check out this video to learn about the five free dataset sources for data analytics projects!
2 notes · View notes
Text
Analysing large data sets using AWS Athena
Handling large datasets can feel overwhelming, especially when you're faced with endless rows of data and complex information. At our company, we faced these challenges head-on until we discovered AWS Athena. Athena transformed the way we handle massive datasets by simplifying the querying process without the hassle of managing servers or dealing with complex infrastructure. In this article, I’ll Walk you through how AWS Athena has revolutionized our approach to data analysis. We’ll explore how it leverages SQL to make working with big data straightforward and efficient. If you’ve ever struggled with managing large datasets and are looking for a practical solution, you’re in the right place.
Efficient Data Storage and Querying
Through our experiences, we found that two key strategies significantly enhanced our performance with Athena: partitioning data and using columnar storage formats like Parquet. These methods have dramatically reduced our query times and improved our data analysis efficiency. Here’s a closer look at how we’ve implemented these strategies:
Data Organization for Partitioning and Parquet
Organize your data in S3 for efficient querying:
s3://your-bucket/your-data/
├── year=2023/
│   ├── month=01/
│   │   ├── day=01/
│   │   │   └── data-file
│   │   └── day=02/
│   └── month=02/
└── year=2024/
└── month=01/
└── day=01/
Preprocessing Data for Optimal Performance
Before importing datasets into AWS Glue and Athena, preprocessing is essential to ensure consistency and efficiency. This involves handling mixed data types, adding date columns for partitioning, and converting files to a format suitable for Athena.
Note: The following steps are optional based on the data and requirements. Use them according to your requirements.
1. Handling Mixed Data Types
To address columns with mixed data types, standardize them to the most common type using the following code snippet:def determine_majority_type(series): # get the types of all non-null values types = series.dropna().apply(type) # count the occurrences of each type type_counts = types.value_counts()
preprocess.py
2. Adding Date Columns for Partitioning
To facilitate partitioning, add additional columns for year, month, and day:def add_date_columns_to_csv(file_path): try: # read the CSV file df = pd.read_csv(file_path)
partitioning.py
3. Converting CSV to Parquet Format
For optimized storage and querying, convert CSV files to Parquet format:def detect_and_convert_mixed_types(df): for col in df.columns: # detect mixed types in the column if df[col].apply(type).nunique() > 1:
paraquet.py
4. Concatenating Multiple CSV Files
To consolidate multiple CSV files into one for Parquet conversion:def read_and_concatenate_csv_files(directory): all_dfs = [] # recursively search for CSV files in the directory
concatenate.py
Step-by-Step Guide to Managing Datasets with AWS Glue and Athena
1. Place Your Source Dataset in S3
Tumblr media
2. Create a Crawler in AWS Glue
In the AWS Glue console, create a new crawler to catalog your data and make it queryable with Athena.
Specify Your S3 Bucket: Set the S3 bucket path as the data source in the crawler configuration.
IAM Role: Assign an IAM role with the necessary permissions to access your S3 bucket and Glue Data Catalog.
Tumblr media
3. Set Up the Glue Database
Create a new database in the AWS Glue Data Catalog where your CSV data will be stored. This database acts as a container for your tables.
Database Creation: Go to the AWS Glue Data Catalog section and create a new database.
Crawler Output Configuration: Specify this database for storing the table metadata and optionally provide a prefix for your table names.
4. Configure Crawler Schedule
Set the crawler schedule to keep your data catalog up to date:
Hourly
Daily
Weekly
Monthly
On-Demand
Scheduling the crawler ensures data will be updated to our table, if any updates to existing data or adding of new files etc.
5. Run the Crawler
Initiate the crawler by clicking the "Run Crawler" button in the Glue console. The crawler will analyze your data, determine optimal data types for each column, and create a table in the Glue Data Catalog.
6. Review and Edit the Table Schema
Post-crawler, review and modify the table schema:
Change Data Types: Adjust data types for any column as needed.
Create Partitions: Set up partitions to improve query performance and data organization.
Tumblr media
7. Query Your Data with AWS Athena
In the Athena console:
Connect to Glue Database: Use the database created by the Glue Crawler.
Write SQL Queries: Leverage SQL for querying your data directly in Athena.
8. Performance Comparison
After the performance optimizations, we got the following results:
To illustrate it, I ran following queries on 1.6 GB data:
For Parquet data format without partitioning
SELECT * FROM "athena-learn"."parquet" WHERE transdate='2024-07-05';
For Partitioning with CSV
Tumblr media
Query Runtime for Parquet Files: 8.748 seconds. Parquet’s columnar storage format and compression contribute to this efficiency.
Query Runtime for Partitioned CSV Files: 2.901 seconds. Partitioning helps reduce the data scanned, improving query speed.
Data Scanned for Paraquet Files:  60.44MB
Data Scanned for Partitioned CSV Files: 40.04MB
Key Insight: Partitioning CSV files improves query performance, but using Parquet files offers superior results due to their optimized storage and compression features.
9. AWS Athena Pricing and Optimization
AWS Athena pricing is straightforward: you pay $5.00 per terabyte (TB) of data scanned by your SQL queries. However, you can significantly reduce costs and enhance query performance by implementing several optimization strategies.
Conclusion
AWS Athena offers a powerful, serverless SQL interface for querying large datasets. By adopting best practices in data preprocessing, organization, and Athena usage, you can manage and analyze your data efficiently without the overhead of complex infrastructure.
0 notes
govindhtech · 20 days
Text
Observability data: Secret To Successful Data Integration
Tumblr media
Data observability platforms
In the past, creating data pipelines has frequently taken precedence over thorough monitoring and alerting for data engineers. The timely and cost-effective completion of projects frequently took precedence over the long-term integrity of data. Subtle indicators like regular, inexplicable data spikes, slow performance decline, or irregular data quality are typically overlooked by data engineers.
These were perceived as singular occurrences rather than widespread problems. A larger picture becomes visible with improved Observability Data. Uncovered bottlenecks are exposed, resource allocation is optimized, data lineage gaps are found, and firefighting is eventually turned into prevention.
Data engineer
There weren’t many technologies specifically for Data observability accessible until recently. Data engineers frequently turned to creating unique monitoring solutions, which required a lot of time and resources. Although this method worked well in less complicated settings, Observability Data has become an essential part of the data engineering toolbox due to the growing complexity of contemporary data architectures and the growing dependence on data-driven decision-making.
It’s critical to recognize that things are shifting quickly in this situation. According to projections made by Gartner, “by 2026, up from less than 20% in 2024, 50% of enterprises implementing distributed data architectures will have adopted data observability tools toincrease awareness of the current status of the data landscape.”
Data observability is becoming more and more important as data becomes more crucial to company success. Data engineers are now making Observability Data a top priority and a fundamental part of their jobs due to the development of specialized tools and a rising realization of the costs associated with low-quality data.
what is data observability
The process of keeping an eye on and managing data to guarantee its availability, dependability, and quality throughout an organization’s many systems, pipelines, and processes is known as Observability Data. It gives teams a thorough insight of the condition and healthcare of the data, empowering them to see problems early and take preventative action.
Data observability vs Data quality
Dangers lurking in your data pipeline
The following indications indicate whether your data team requires a Observability Data tool:
The high frequency of inaccurate, inconsistent, or missing data can be ascribed to problems with data quality. Finding the source of the data quality problem becomes difficult, even if you can identify the problem. To help ensure data accuracy, data teams frequently need to adhere to a manual method.
Another clue could be long-term outages in data processing operations that keep happening. When data is inaccessible for protracted periods of time, it indicates problems with the reliability of the data pipeline, which undermines trust among downstream consumers and stakeholders.
Understanding data dependencies and relationships presents difficulties for data teams.
If you find yourself using a lot of manual checks and alarms and are unable to handle problems before they affect downstream systems, it may be time to look at observability tools.
The entire data integration process may become more challenging to manage if complex data processing workflows with several steps and a variety of data sources are not well managed.
Another warning flag could be trouble managing the data lifecycle in accordance with compliance guidelines and data privacy and security laws.
A Observability Data tool can greatly enhance your data engineering procedures and the general quality of your data if you’re having any of these problems. Through the provision of data pipeline visibility, anomaly detection, and proactive issue resolution, these technologies can assist you in developing more dependable and effective data systems.
Neglecting the indicators that suggest Observability Data is necessary might have a domino effect on an organization’s undesirable outcomes. Because certain effects are intangible, it might be difficult to accurately estimate these losses; however, they can identify important areas of potential loss.
Data inaccuracies can cause faulty business decisions, lost opportunities, and client attrition, costing money. False data can damage a company’s brand and customers’ trust in its products and services. Although they are hard to measure, the intangible effects on customer trust and reputation can have long-term effects.
Put observability first to prevent inaccurate data from derailing your efforts
Data observability gives data engineers the ability to become data stewards rather than just data movers. You are adopting a more comprehensive, strategic strategy rather than merely concentrating on the technical issues of transferring data from diverse sources into a consolidated repository. You may streamline impact management, comprehend dependencies and lineage, and maximize pipeline efficiency using observability. These advantages all contribute to improved governance, economical resource usage, and reduced expenses.
Data quality becomes a quantifiable indicator that is simple to monitor and enhance with Observability Data. It is possible to anticipate possible problems in your data pipelines and datasets before they become major ones. This methodology establishes a robust and effective data environment.
Observability becomes essential when data complexity increases because it helps engineers to create solid, dependable, and trustworthy data foundations, which ultimately speeds up time-to-value for the entire company. You may reduce these risks and increase the return on investment (ROI) of your data and AI initiatives by making investments in Observability Data.
To put it simply, data observability gives data engineers the ability to create and manage solid, dependable, and high-quality data pipelines that add value to the company.
Read more on govindhtech.com
0 notes
jcmarchi · 4 months
Text
Scientists use generative AI to answer complex questions in physics
New Post has been published on https://thedigitalinsider.com/scientists-use-generative-ai-to-answer-complex-questions-in-physics/
Scientists use generative AI to answer complex questions in physics
Tumblr media Tumblr media
When water freezes, it transitions from a liquid phase to a solid phase, resulting in a drastic change in properties like density and volume. Phase transitions in water are so common most of us probably don’t even think about them, but phase transitions in novel materials or complex physical systems are an important area of study.
To fully understand these systems, scientists must be able to recognize phases and detect the transitions between. But how to quantify phase changes in an unknown system is often unclear, especially when data are scarce.
Researchers from MIT and the University of Basel in Switzerland applied generative artificial intelligence models to this problem, developing a new machine-learning framework that can automatically map out phase diagrams for novel physical systems.
Their physics-informed machine-learning approach is more efficient than laborious, manual techniques which rely on theoretical expertise. Importantly, because their approach leverages generative models, it does not require huge, labeled training datasets used in other machine-learning techniques.
Such a framework could help scientists investigate the thermodynamic properties of novel materials or detect entanglement in quantum systems, for instance. Ultimately, this technique could make it possible for scientists to discover unknown phases of matter autonomously.
“If you have a new system with fully unknown properties, how would you choose which observable quantity to study? The hope, at least with data-driven tools, is that you could scan large new systems in an automated way, and it will point you to important changes in the system. This might be a tool in the pipeline of automated scientific discovery of new, exotic properties of phases,” says Frank Schäfer, a postdoc in the Julia Lab in the Computer Science and Artificial Intelligence Laboratory (CSAIL) and co-author of a paper on this approach.
Joining Schäfer on the paper are first author Julian Arnold, a graduate student at the University of Basel; Alan Edelman, applied mathematics professor in the Department of Mathematics and leader of the Julia Lab; and senior author Christoph Bruder, professor in the Department of Physics at the University of Basel. The research is published today in Physical Review Letters.
Detecting phase transitions using AI
While water transitioning to ice might be among the most obvious examples of a phase change, more exotic phase changes, like when a material transitions from being a normal conductor to a superconductor, are of keen interest to scientists.
These transitions can be detected by identifying an “order parameter,” a quantity that is important and expected to change. For instance, water freezes and transitions to a solid phase (ice) when its temperature drops below 0 degrees Celsius. In this case, an appropriate order parameter could be defined in terms of the proportion of water molecules that are part of the crystalline lattice versus those that remain in a disordered state.
In the past, researchers have relied on physics expertise to build phase diagrams manually, drawing on theoretical understanding to know which order parameters are important. Not only is this tedious for complex systems, and perhaps impossible for unknown systems with new behaviors, but it also introduces human bias into the solution.
More recently, researchers have begun using machine learning to build discriminative classifiers that can solve this task by learning to classify a measurement statistic as coming from a particular phase of the physical system, the same way such models classify an image as a cat or dog.
The MIT researchers demonstrated how generative models can be used to solve this classification task much more efficiently, and in a physics-informed manner.
The Julia Programming Language, a popular language for scientific computing that is also used in MIT’s introductory linear algebra classes, offers many tools that make it invaluable for constructing such generative models, Schäfer adds.
Generative models, like those that underlie ChatGPT and Dall-E, typically work by estimating the probability distribution of some data, which they use to generate new data points that fit the distribution (such as new cat images that are similar to existing cat images).
However, when simulations of a physical system using tried-and-true scientific techniques are available, researchers get a model of its probability distribution for free. This distribution describes the measurement statistics of the physical system.
A more knowledgeable model
The MIT team’s insight is that this probability distribution also defines a generative model upon which a classifier can be constructed. They plug the generative model into standard statistical formulas to directly construct a classifier instead of learning it from samples, as was done with discriminative approaches.
“This is a really nice way of incorporating something you know about your physical system deep inside your machine-learning scheme. It goes far beyond just performing feature engineering on your data samples or simple inductive biases,” Schäfer says.
This generative classifier can determine what phase the system is in given some parameter, like temperature or pressure. And because the researchers directly approximate the probability distributions underlying measurements from the physical system, the classifier has system knowledge.
This enables their method to perform better than other machine-learning techniques. And because it can work automatically without the need for extensive training, their approach significantly enhances the computational efficiency of identifying phase transitions.
At the end of the day, similar to how one might ask ChatGPT to solve a math problem, the researchers can ask the generative classifier questions like “does this sample belong to phase I or phase II?” or “was this sample generated at high temperature or low temperature?”
Scientists could also use this approach to solve different binary classification tasks in physical systems, possibly to detect entanglement in quantum systems (Is the state entangled or not?) or determine whether theory A or B is best suited to solve a particular problem. They could also use this approach to better understand and improve large language models like ChatGPT by identifying how certain parameters should be tuned so the chatbot gives the best outputs.
In the future, the researchers also want to study theoretical guarantees regarding how many measurements they would need to effectively detect phase transitions and estimate the amount of computation that would require.
This work was funded, in part, by the Swiss National Science Foundation, the MIT-Switzerland Lockheed Martin Seed Fund, and MIT International Science and Technology Initiatives.
2 notes · View notes
actowizsolution · 1 month
Text
Web Scraping Services - Data Extraction Company - Actowiz
Looking for expert web scraping services? Actowiz specializes in custom data extraction solutions, helping businesses access and analyze the data they need. Trust Actowiz for all your data scraping needs.
know More>> https://www.actowizsolutions.com/
0 notes
elucidata · 2 months
Text
0 notes
ANALYZE AND PRODUCE CORRECTED INDIVIDUAL DNA TO BE INTRODUCED AND COMPLETELY REPLACE AGED DNA USING CRISPR AND RELATED SPECIES TRANSPLANT TECHNIQUES SCIENCE AND TECHNOLOGY
0 notes
mlearningai · 2 months
Text
Drowning in data?
Learn how to surf the information wave with the latest tool!
In this new data paradigm,
we now interact more naturally and with richer meaning.
We've moved beyond SQL for database queries.
Now, it's almost like having a conversation with data. For example, you could ask complex questions in plain language.
The database would then understand the context and details of your query.
1 note · View note
francescolelli · 3 months
Photo
Tumblr media
Understanding Users' Experiences of Interaction with Smart Devices: A Socio-Technical Perspective
This is a short preview of the article: Human-computer interaction (HCI) is a multidisciplinary field that explores the interaction between humans and computers, emphasizing especially the design and use of computer technology. Within this domain, the notion of sense of agency, usually defined as the users perceiving their actions
If you like it consider checking out the full version of the post at: Understanding Users' Experiences of Interaction with Smart Devices: A Socio-Technical Perspective
If you are looking for ideas for tweet or re-blog this post you may want to consider the following hashtags:
Hashtags: #Agency, #Datasets, #DeviceAgency, #FreeDatasets, #InternetOfThings, #IoT, #SmartDevice, #SmartDevices, #SocioTechnical, #Survey, #UserAgency
The Hashtags of the Categories are: #HCI, #InternetofThings, #Publication
Understanding Users' Experiences of Interaction with Smart Devices: A Socio-Technical Perspective is available at the following link: https://francescolelli.info/hci/understanding-users-experiences-of-interaction-with-smart-devices-a-socio-technical-perspective/ You will find more information, stories, examples, data, opinions and scientific papers as part of a collection of articles about Information Management, Computer Science, Economics, Finance and More.
The title of the full article is: Understanding Users' Experiences of Interaction with Smart Devices: A Socio-Technical Perspective
It belong to the following categories: HCI, Internet of Things, Publication
The most relevant keywords are: Agency, datasets, device agency, free datasets, internet of things, IoT, smart device, smart devices, socio-technical, survey, user agency
It has been published by Francesco Lelli at Francesco Lelli a blog about Information Management, Computer Science, Finance, Economics and nearby ideas and opinions
Human-computer interaction (HCI) is a multidisciplinary field that explores the interaction between humans and computers, emphasizing especially the design and use of computer technology. Within this domain, the notion of sense of agency, usually defined as the users perceiving their actions
Hope you will find it interesting and that it will help you in your journey
Human-computer interaction (HCI) is a multidisciplinary field that explores the interaction between humans and computers, emphasizing especially the design and use of computer technology. Within this domain, the notion of sense of agency, usually defined as the users perceiving their actions as influencing the system, is of crucial importance. Another central notion is that of…
0 notes