#DataSets
Explore tagged Tumblr posts
jcmarchi · 13 days ago
Text
Impact and innovation of AI in energy use with James Chalmers
New Post has been published on https://thedigitalinsider.com/impact-and-innovation-of-ai-in-energy-use-with-james-chalmers/
Impact and innovation of AI in energy use with James Chalmers
In the very first episode of our monhtly Explainable AI podcas, hosts Paul Anthony Claxton and Rohan Hall sat down with James Chalmers, Chief Revenue Officer of Novo Power, to discuss one of the most pressing issues in AI today: energy consumption and its environmental impact.
Together, they explored how AI���s rapid expansion is placing significant demands on global power infrastructures and what leaders in the tech industry are doing to address this.
The conversation covered various important topics, from the unique power demands of generative AI models to potential solutions like neuromorphic computing and waste heat recapture. If you’re interested in how AI shapes business and global energy policies, this episode is a must-listen.
Why this conversation matters for the future of AI
The rise of AI, especially generative models, isn’t just advancing technology; it’s consuming power at an unprecedented rate. Understanding these impacts is crucial for AI enthusiasts who want to see AI development continue sustainably and ethically.
As James explains, AI’s current reliance on massive datasets and intensive computational power has given it the fastest-growing energy footprint of any technology in history. For those working in AI, understanding how to manage these demands can be a significant asset in building future-forward solutions.
Main takeaways
AI’s power consumption problem: Generative AI models, which require vast amounts of energy for training and generation, consume ten times more power than traditional search engines.
Waste heat utilization: Nearly all power in data centers is lost as waste heat. Solutions like those at Novo Power are exploring how to recycle this energy.
Neuromorphic computing: This emerging technology, inspired by human neural networks, promises more energy-efficient AI processing.
Shift to responsible use: AI can help businesses address inefficiencies, but organizations need to integrate AI where it truly supports business goals rather than simply following trends.
Educational imperative: For AI to reach its potential without causing environmental strain, a broader understanding of its capabilities, impacts, and sustainable use is essential.
Meet James Chalmers
James Chalmers is a seasoned executive and strategist with extensive international experience guiding ventures through fundraising, product development, commercialization, and growth.
As the Founder and Managing Partner at BaseCamp, he has reshaped traditional engagement models between startups, service providers, and investors, emphasizing a unique approach to creating long-term value through differentiation.
Rather than merely enhancing existing processes, James champions transformative strategies that set companies apart, strongly emphasizing sustainable development.
Numerous accolades validate his work, including recognition from Forbes and Inc. Magazine as a leader of one of the Fastest-Growing and Most Innovative Companies, as well as B Corporation’s Best for The World and MedTech World’s Best Consultancy Services.
He’s also a LinkedIn ‘Top Voice’ on Product Development, Entrepreneurship, and Sustainable Development, reflecting his ability to drive substantial and sustainable growth through innovation and sound business fundamentals.
At BaseCamp, James applies his executive expertise to provide hands-on advisory services in fundraising, product development, commercialization, and executive strategy.
His commitment extends beyond addressing immediate business challenges; he prioritizes building competency and capacity within each startup he advises. Focused on sustainability, his work is dedicated to supporting companies that address one or more of the United Nations’ 17 Sustainable Development Goals through AI, DeepTech, or Platform Technologies.
About the hosts:
Paul Anthony Claxton – Q1 Velocity Venture Capital | LinkedIn
www.paulclaxton.io – am a Managing General Partner at Q1 Velocity Venture Capital… · Experience: Q1 Velocity Venture Capital · Education: Harvard Extension School · Location: Beverly Hills · 500+ connections on LinkedIn. View Paul Anthony Claxton’s profile on LinkedIn, a professional community of 1 billion members.
Tumblr media
Rohan Hall – Code Genie AI | LinkedIn
Are you ready to transform your business using the power of AI? With over 30 years of… · Experience: Code Genie AI · Location: Los Angeles Metropolitan Area · 500+ connections on LinkedIn. View Rohan Hall’s profile on LinkedIn, a professional community of 1 billion members.
Tumblr media Tumblr media
Like what you see? Then check out tonnes more.
From exclusive content by industry experts and an ever-increasing bank of real world use cases, to 80+ deep-dive summit presentations, our membership plans are packed with awesome AI resources.
Subscribe now
3 notes · View notes
debtservicecoverageratio · 8 days ago
Text
DATASETS IN FINTECH STARTUP WORLD
Here are some real-world examples of fintech companies using datasets to improve their services:
1. Personalized Financial Planning:
Mint: Mint aggregates financial data from various sources like bank accounts, credit cards, and investments to provide users with a holistic view of their finances. It then uses this data to offer personalized budgets, track spending habits, and suggest ways to save money.
Personal Capital: Similar to Mint, Personal Capital analyzes user data to provide personalized financial advice, including investment recommendations and retirement planning.
2. Credit Scoring and Lending:
Upstart: Upstart uses alternative data sources like education and employment history, in addition to traditional credit scores, to assess creditworthiness and provide loans to individuals who may be overlooked by traditional lenders. This expands access to credit and often results in fairer lending practices.
Kiva: Kiva uses a dataset of loan applications and repayment history to assess the risk of lending to individuals in developing countries. This data-driven approach allows them to provide microloans to entrepreneurs who lack access to traditional banking systems.
3. Fraud Detection:
Stripe: Stripe uses machine learning algorithms to analyze transaction data and identify potentially fraudulent activity. This helps protect businesses from losses and ensures secure online payments.
Paypal: Paypal employs sophisticated fraud detection systems that analyze vast amounts of data to identify and prevent unauthorized transactions, protecting both buyers and sellers.
4. Investment Platforms:
Robinhood: Robinhood uses data to provide users with insights into stock performance, market trends, and personalized investment recommendations. This makes investing more accessible and helps users make informed decisions.
Betterment: Betterment uses algorithms and data analysis to create diversified investment portfolios tailored to individual risk tolerance and financial goals. This automated approach simplifies investing and helps users achieve their long-term financial objectives.
These are just a few examples of how fintech companies leverage datasets to improve their services and provide better value to their customers.
2 notes · View notes
jbfly46 · 1 year ago
Text
I bleed revolution. If your only anarchist actions are related to union organizing, then you’re not an anarchist, you’re a corporate puppet. Everything you do should work to subvert the current and future actions of the state and all of their tentacle corporate affiliations. If your only goal in life is to work under the orders of someone else, under someone’s else’s direction, with someone else’s instructions, then you’re not a human being. You’re chattel cattle at best. If a corporate pig tells or wants you to do something, then you should do the exact opposite, or else you’re just a pawn in a game of global corporate chess. Every one of your actions should be both a defensive and offensive maneuver. If you defend while you attack, you become one with your true purpose, which is to dismantle the state and all corporate authority. If you don’t think in a linear manner, then you’re not apart of their datasets, and they can’t predict your next move. You operate from outside of their datasets and what they think is your next move is never your next move. Then they start to doubt their own intelligence and all the false assumptions it’s based on, and the system starts to crumble. You use any means necessary, because that is your constitutional right, just as they use any means necessary to hold onto the power they stole from you. They stole your birthright, and it’s your legal duty as an American citizen to seek a redress of your grievances, using whatever it takes. Under no pretext.
9 notes · View notes
neuralnetworkdatasets · 1 year ago
Text
3 notes · View notes
nikitricky · 1 year ago
Text
youtube
Ever wondered what the datasets used to train AI look like? This video is a subset of ImageNet-1k (18k images) with some other metrics.
Read more on how I made it and see some extra visualizations.
Okay! I'll split this up by the elements in the video, but first I need to add some context about
The dataset
ImageNet-1k (aka ILSVRC 2012) is an image classification dataset - you have a set number of classes (in this case 1000) and each class has a set of images. This is the most popular version of ImageNet, which usually has 21000 classes.
ImageNet was made using nouns from WordNet, searched online. From 2010 to 2017 yearly competitions were held to determine the best image classification model. It has greatly benefitted computer vision, developing model architectures that you've likely used unknowingly. See the accuracy progression here.
ResNet
Residual Network (or ResNet) is an architecture for image recognition made in 2015, trying to fix "vanishing/exploding gradients" (read the paper here). It managed to achieve an accuracy of 96.43% (that's 96 thousand times better than randomly guessing!), winning first place back in 2015. I'll be using a smaller version of this model (ResNet-50), boasting an accuracy of 95%.
The scatter plot
If you look at the video long enough, you'll realize that similar images (eg. dogs, types of food) will be closer together than unrelated ones. This is achieved using two things: image embeddings and dimensionality reduction.
Image embeddings
In short, image embeddings are points in an n-dimensional space (read this post for more info on higher dimensions), in this case, made from chopping off the last layer from ResNet-50, producing a point in 1024-dimensional space.
The benefit of doing all of that than just comparing pixels between two images is that the model (specifically made for classification) only looks for features that would make the classification easier (preserving semantic information). For instance - you have 3 images of dogs, two of them are the same breed, but the first one looks more similar to the other one (eg. matching background). If you compare the pixels, the first and third images would be closer, but if you use embeddings the first and second ones would be closer because of the matching breeds.
Dimensionality reduction
Now we have all these image embeddings that are grouped by semantic (meaning) similarity and we want to visualize them. But how? You can't possibly display a 1024-dimensional scatter plot to someone and for them to understand it. That's where dimensionality reduction comes into play. In this case, we're reducing 1024 dimensions to 2 using an algorithm called t-SNE. Now the scatter plot will be something we mere mortals can comprehend.
Extra visualizations
Here's the scatter plot in HD:
Tumblr media
This idea actually comes from an older project where I did this on a smaller dataset (about 8k images). The results were quite promising! You can see how each of the 8 classes is neatly separated, plus how differences in the subject's angle, surroundings, and color.
Tumblr media
Find the full-resolution image here
Similar images
I just compared every point to every other point (in the 2d space, It would be too computationally expensive otherwise) and got the 6 closest points to that. You can see when the model incorrectly classifies something if the related images are not similar to the one presented (eg. there's an image of a payphone but all of the similar images are bridges).
Pixel rarity
This one was pretty simple, I used a script to count the occurrences of pixel colors. Again, this idea comes from an older project, where I counted the entirety of the dataset, so I just used that.
Extra visualization
Here are all the colors that appeared in the image, sorted by popularity, left to right, up to down
Tumblr media
Some final stuff
MP means Megapixel (one million pixels) - a 1000x1000 image is one megapixel big (it has one million pixels)
That's all, thanks for reading. Feel free to ask questions and I'll try my best to respond to them.
3 notes · View notes
edujournalblogs · 1 year ago
Text
Data Cleaning in Data Science
Tumblr media
Data cleaning is an integral part of data preprocessing viz., removing or correcting inaccurate information within a data set. This could mean missing data, spelling mistakes, and duplicates to name a few issues. Inaccurate information can lead to issues during analysis phase if not properly addressed at the earlier stages.
Data Cleaning vs Data Wrangling : Data cleaning focuses on fixing inaccuracies within your data set. Data wrangling, on the other hand, is concerned with converting the data’s format into one that can be accepted and processed by a machine learning model.
Data Cleaning steps to follow :
Remove irrelevant data
Resolve any duplicates issues
Correct structural errors if any
Deal with missing fields in the dataset
Zone in on any data outliers and remove them
Validate your data
At EduJournal, we understand the importance of gaining practical skills and industry-relevant knowledge to succeed in the field of data analytics / data science. Our certified program in data science and data analytics is designed to equip freshers / experienced with the necessary expertise and hands-on experience experience so they are well equiped for the job.
URL : http://www.edujournal.com
2 notes · View notes
joe-england · 2 years ago
Link
2 notes · View notes
analyticspursuit · 2 years ago
Text
The 5 Free Dataset Sources for Data Analytics Projects
In this video, I'm sharing the five free dataset sources that are perfect for data analytics projects. By using these free datasets, you'll be able to create powerful data analytics projects in no time! Dataset sources are essential for data analytics projects, and these five free dataset sources will help you get started quickly.
By using these sources, you'll be able to collect data from a variety of sources and crunch the numbers with ease. So be sure to check out this video to learn about the five free dataset sources for data analytics projects!
2 notes · View notes
bluefeatheredvelociraptor · 23 days ago
Text
This reminded me of the time I was doing social service for my bachelor's degree.
I'm a biologist. Back then (2007-2008ish, I guess? Don't remember, it's been a while lol) I joined the Ornithology Lab hoping to start my bachelor's thesis early (I did NOT but that's another story lmao). Part of my social service job involved transcribing lots (and I mean LOTS, there were journals dating back to the 80s) of field journals from past students into Excel spreadsheets and then entering the curated info into a special database designed by the Mexican environmental commission (CONABIO) for it to be accessible to other researchers and to add to the national biodiversity repository.
Oh, boy.
The spelling in plenty of those journals was TERRIBLE. And I'm not referring to the questionable spelling of scientific names (which can be truly difficult to memorize and write). I'm talking about the spelling of things like the alpha codes we ornithologists use to abbreviate either the scientific names or the standardized common names in English (HOW DO YOU MISSPELL FOUR / SIX LETTERS???), site identifiers, descriptions, field observations, etc. Heck, there were times when even the names of the observers were spelled differently ON THE SAME PAGE written BY THE SAME PERSON. Had at least one instance where a student regularly spelled his own name wrong and the head of the Laboratory didn't remember which spelling was the correct one, so we had to settle with the most common spelling of that student's name.
Considering all this information was gathered by fellow biology students during field practices (who in all likelihood were making these identifications with the aid of guidebooks and the professors' guidance), one would expect them to be able to write with certain grammatical consistency, as was to be expected of their academic level. But nope.
And yes, I know people can be dyslexic (or have other undiagnosed learning disabilities) and struggle with reading and writing, but some of those journals were written by people who were somewhat bordering on functional illiteracy, which I find truly baffling of people studying for a higher education degree.
Curating all that info was tortuous but I managed. And in the end I completed the mandatory 480 hours (and more!) of the social service necessary for graduation. Good grief, though. Reading OPs post gave me serious war flashbacks 😂
Working on a dataset of roadkill reports. state agency personnel CANNOT spell
Tumblr media
42K notes · View notes
aimarketingexpertemea · 9 days ago
Text
DATASETS in GOOGLE SEARCH
Value of datasets in Google Search:
Imagine you're a detective investigating a complex case. You have witness testimonies (like knowledge graphs), photos from the crime scene (images), and maybe even security camera footage (videos). These give you clues and a general idea of what happened. But what if you could access the raw forensic data – fingerprints, DNA analysis, ballistic reports? That's what datasets offer in the world of search.
Datasets are like the underlying evidence that allows you to go beyond surface-level understanding and conduct your own in-depth analysis. They empower you to connect the dots, uncover hidden patterns, and draw your own conclusions.
Think of a student researching the impact of social media on teenagers. They might find articles discussing the topic (knowledge graphs) and see illustrative images or videos. But with a dataset containing survey results, social media usage statistics, and mental health indicators, they can dive deeper, explore correlations, and potentially uncover new insights about this complex relationship.
Datasets are not just for academics or detectives, though. They can be incredibly useful for everyday life. Imagine a family planning a vacation. They might look at beautiful pictures of destinations (images) and watch travel vlogs (videos). But with access to datasets on weather patterns, flight prices, and local attractions, they can make informed decisions, optimize their itinerary, and ultimately have a more fulfilling experience.
The beauty of datasets lies in their versatility. They can be visualized, analyzed, and combined with other data sources to create new knowledge and solve real-world problems. Google Search, with its vast reach and powerful algorithms, has the potential to make these datasets accessible to everyone, democratizing information and empowering individuals to make data-driven decisions in all aspects of their lives.
By integrating datasets seamlessly into search results, Google can transform the way we interact with information, moving beyond passive consumption to active exploration and discovery. This will not only enhance our understanding of the world around us but also foster a more informed and data-literate society.
0 notes
jcmarchi · 8 months ago
Text
Unlocking mRNA’s cancer-fighting potential
New Post has been published on https://thedigitalinsider.com/unlocking-mrnas-cancer-fighting-potential/
Unlocking mRNA’s cancer-fighting potential
Tumblr media Tumblr media
What if training your immune system to attack cancer cells was as easy as training it to fight Covid-19? Many people believe the technology behind some Covid-19 vaccines, messenger RNA, holds great promise for stimulating immune responses to cancer.
But using messenger RNA, or mRNA, to get the immune system to mount a prolonged and aggressive attack on cancer cells — while leaving healthy cells alone — has been a major challenge.
The MIT spinout Strand Therapeutics is attempting to solve that problem with an advanced class of mRNA molecules that are designed to sense what type of cells they encounter in the body and to express therapeutic proteins only once they have entered diseased cells.
“It’s about finding ways to deal with the signal-to-noise ratio, the signal being expression in the target tissue and the noise being expression in the nontarget tissue,” Strand CEO Jacob Becraft PhD ’19 explains. “Our technology amplifies the signal to express more proteins for longer while at the same time effectively eliminating the mRNA’s off-target expression.”
Strand is set to begin its first clinical trial in April, which is testing a proprietary, self-replicating mRNA molecule’s ability to express immune signals directly from a tumor, eliciting the immune system to attack and kill the tumor cells directly. It’s also being tested as a possible improvement for existing treatments to a number of solid tumors.
As they work to commercialize its early innovations, Strand’s team is continuing to add capabilities to what it calls its “programmable medicines,” improving mRNA molecules’ ability to sense their environment and generate potent, targeted responses where they’re needed most.
“Self-replicating mRNA was the first thing that we pioneered when we were at MIT and in the first couple years at Strand,” Becraft says. “Now we’ve also moved into approaches like circular mRNAs, which allow each molecule of mRNA to express more of a protein for longer, potentially for weeks at a time. And the bigger our cell-type specific datasets become, the better we are at differentiating cell types, which makes these molecules so targeted we can have a higher level of safety at higher doses and create stronger treatments.”
Making mRNA smarter
Becraft got his first taste of MIT as an undergraduate at the University of Illinois when he secured a summer internship in the lab of MIT Institute Professor Bob Langer.
“That’s where I learned how lab research could be translated into spinout companies,” Becraft recalls.
The experience left enough of an impression on Becraft that he returned to MIT the next fall to earn his PhD, where he worked in the Synthetic Biology Center under professor of bioengineering and electrical engineering and computer science Ron Weiss. During that time, he collaborated with postdoc Tasuku Kitada to create genetic “switches” that could control protein expression in cells.
Becraft and Kitada realized their research could be the foundation of a company around 2017 and started spending time in the Martin Trust Center for MIT Entrepreneurship. They also received support from MIT Sandbox and eventually worked with the Technology Licensing Office to establish Strand’s early intellectual property.
“We started by asking, where is the highest unmet need that also allows us to prove out the thesis of this technology? And where will this approach have therapeutic relevance that is a quantum leap forward from what anyone else is doing?” Becraft says. “The first place we looked was oncology.”
People have been working on cancer immunotherapy, which turns a patient’s immune system against cancer cells, for decades. Scientists in the field have developed drugs that produce some remarkable results in patients with aggressive, late-stage cancers. But most next-generation cancer immunotherapies are based on recombinant (lab-made) proteins that are difficult to deliver to specific targets in the body and don’t remain active for long enough to consistently create a durable response.
More recently, companies like Moderna, whose founders also include MIT alumni, have pioneered the use of mRNAs to create proteins in cells. But to date, those mRNA molecules have not been able to change behavior based on the type of cells they enter, and don’t last for very long in the body.
“If you’re trying to engage the immune system with a tumor cell, the mRNA needs to be expressing from the tumor cell itself, and it needs to be expressing over a long period of time,” Becraft says. “Those challenges are hard to overcome with the first generation of mRNA technologies.”
Strand has developed what it calls the world’s first mRNA programming language that allows the company to specify the tissues its mRNAs express proteins in.
“We built a database that says, ‘Here are all of the different cells that the mRNA could be delivered to, and here are all of their microRNA signatures,’ and then we use computational tools and machine learning to differentiate the cells,” Becraft explains. “For instance, I need to make sure that the messenger RNA turns off when it’s in the liver cell, and I need to make sure that it turns on when it’s in a tumor cell or a T-cell.”
Strand also uses techniques like mRNA self-replication to create more durable protein expression and immune responses.
“The first versions of mRNA therapeutics, like the Covid-19 vaccines, just recapitulate how our body’s natural mRNAs work,” Becraft explains. “Natural mRNAs last for a few days, maybe less, and they express a single protein. They have no context-dependent actions. That means wherever the mRNA is delivered, it’s only going to express a molecule for a short period of time. That’s perfect for a vaccine, but it’s much more limiting when you want to create a protein that’s actually engaging in a biological process, like activating an immune response against a tumor that could take many days or weeks.”
Technology with broad potential
Strand’s first clinical trial is targeting solid tumors like melanoma and triple-negative breast cancer. The company is also actively developing mRNA therapies that could be used to treat blood cancers.
“We’ll be expanding into new areas as we continue to de-risk the translation of the science and create new technologies,” Becraft says.
Strand plans to partner with large pharmaceutical companies as well as investors to continue developing drugs. Further down the line, the founders believe future versions of its mRNA therapies could be used to treat a broad range of diseases.
“Our thesis is: amplified expression in specific, programmed target cells for long periods of time,” Becraft says. “That approach can be utilized for [immunotherapies like] CAR T-cell therapy, both in oncology and autoimmune conditions. There are also many diseases that require cell-type specific delivery and expression of proteins in treatment, everything from kidney disease to types of liver disease. We can envision our technology being used for all of that.”
7 notes · View notes
wastebaskettaxon · 10 days ago
Text
0 notes
titleknown · 2 years ago
Text
...I will say @mortalityplays, even as someone who's generally positive towards AI art/image synthesis, thank you for approaching it from this data privacy view instead of the copyright argument which, as I've talked about before, is a very bad framework.
Like... legit, it disturbs me how much people are moving into copyright maximalism when a much more helpful way to think of it would be from the data privacy angle you're describing, and I wish more people would rally around trying to take action with that instead of trying to make our bad copyright system even worse.
Because, as friend of the blog @tangibletechnomancy said, when you look into it AI art is one of the least concerning things they're doing with that data, and if this motivated folks to push back against that, well, that's probably a good thing.
ngl it's driving me a little bit fucking insane that the whole conversation about image scraping for AI has settled on copyright and legality as a primary concern, and not consent. my shit should not be used without my consent. I will give it away for free, but I want to be asked.
I don't want to be included in studies without my knowledge or consent. I don't want my face captured for the training of facial recognition models without my knowledge or consent. I don't want my voice captured for the training of speech recognition models without my consent. I don't want my demographic or interest profile captured without my consent. I don't want my art harvested for visual model training without my consent. It's not about 'theft' (fake idea) or 'ownership' (fake idea) or 'inherent value' (fake idea). It's about my ability to opt out from being used as a data point. I object to being a commodity by default.
31K notes · View notes
govindhtech · 20 days ago
Text
Use AWS Supply Chain Analytics To Gain Useful Knowledge
Tumblr media
Use AWS Supply Chain Analytics to unleash the power of your supply chain data and obtain useful insights.
AWS Supply Chain
Reduce expenses and minimize risks with a supply chain solution driven by machine learning.
Demand forecasting and inventory visibility, actionable insights, integrated contextual collaboration, demand and supply planning, n-tier supplier visibility, and sustainability information management are all enhanced by AWS Supply Chain, a cloud-based supply chain management application that aggregates data and offers ML-powered forecasting techniques. In addition to utilizing ML and generative AI to transform and combine fragmented data into the supply chain data lake (SCDL), AWS Supply Chain can interact with your current solutions for enterprise resource planning (ERP) and supply chain management. Without requiring replatforming, upfront license costs, or long-term commitments, AWS Supply Chain may enhance supply chain risk management.
Advantages
Reduce the risk of overstock and stock-outs
Reduce extra inventory expenditures and enhance consumer experiences by reducing the risk of overstock and stock-outs.
Increase visibility quickly
Obtain supply chain visibility quickly without having to make long-term commitments, pay upfront license fees, or replatform.
Actionable insights driven by ML
Use actionable insights driven by machine learning (ML) to make better supply chain decisions.
Simplify the process of gathering sustainability data and collaborating on supply plans
Work with partners on order commitments and supply plans more safely and conveniently. Determine and address shortages of materials or components and gather sustainability data effectively.
AWS is announcing that AWS Supply Chain Analytics, which is powered by Amazon QuickSight, is now generally available. Using your data in AWS Supply Chain, this new functionality enables you to create personalized report dashboards. Your supply chain managers or business analysts can use this functionality to visualize data, conduct bespoke analysis, and obtain useful insights for your supply chain management operations.
Amazon QuickSight embedded authoring tools are integrated into the AWS Supply Chain user interface, and AWS Supply Chain Analytics makes use of the AWS Supply Chain data lake. You may create unique insights, measurements, and key performance indicators (KPIs) for your operational analytics using this integration’s unified and customizable interface.
Furthermore, AWS Supply Chain Analytics offers pre-made dashboards that you may use exactly as is or alter to suit your requirements. The following prebuilt dashboards will be available to you at launch:
Plan-Over-Plan Variance: Shows differences in units and values across important dimensions including product, site, and time periods by comparing two demand plans. Seasonality Analytics: Provides a view of demand from year to year, showing trends in average demand quantities and emphasizing seasonality patterns with monthly and weekly heatmaps.
Let’s begin
Allow me to explain you about AWS Supply Chain Analytics’ features.
Turning on AWS Supply Chain Analytics is the first step. Go to Settings, pick Organizations, and then pick Analytics to accomplish this. You can enable analytics data access here.
Now you can add new roles with analytics access or edit roles that already exist.
After this feature is activated, you may choose the Connecting to Analytics card or Analytics from the left navigation menu to access the AWS Supply Chain Analytics feature when you log in to AWS Supply Chain.
The Supply Chain Function dropdown list then allows you to choose the prebuilt dashboards you require:
The best thing about these prebuilt dashboards is how simple it is to get started. All of the data, analysis, and even a dashboard will be prepared for me by AWS Supply Chain Analytics. You click Add to get started.
Then view the results when navigating to the dashboard page. Additionally, you can share this dashboard with your colleagues, which enhances teamwork.
You can go to Datasets and choose New Datasets if you need to add more datasets in order to create a custom dashboard.
You can leverage an existing dataset in this case, which is the AWS Supply Chain data lake.
You can leverage an existing dataset in this case, which is the AWS Supply Chain data lake.
After that, you may decide which table to use in your analysis. You can view every field that is provided in the Data section. AWS Supply Chain creates all data sets that begin with asc_, including supply planning, demand planning, insights, and other data sets.
Additionally, you can locate every dataset you have added to the AWS Supply Chain. One thing to keep in mind is that before using AWS Supply Chain Analytics, you must ingest data if you haven’t already done so in AWS Supply Chain Data Lake.
You can begin your analysis at this point.
Currently accessible
In every country where AWS Supply Chain is available, AWS Supply Chain Analytics is now widely accessible. Try using AWS Supply Chain Analytics to see how it can change your operations.
Read more on Govindhtech.com
1 note · View note
anarchytecture · 2 months ago
Text
Tumblr media
0 notes
pleasantinternetfest · 2 months ago
Text
Analysing large data sets using AWS Athena
Handling large datasets can feel overwhelming, especially when you're faced with endless rows of data and complex information. At our company, we faced these challenges head-on until we discovered AWS Athena. Athena transformed the way we handle massive datasets by simplifying the querying process without the hassle of managing servers or dealing with complex infrastructure. In this article, I’ll Walk you through how AWS Athena has revolutionized our approach to data analysis. We’ll explore how it leverages SQL to make working with big data straightforward and efficient. If you’ve ever struggled with managing large datasets and are looking for a practical solution, you’re in the right place.
Efficient Data Storage and Querying
Through our experiences, we found that two key strategies significantly enhanced our performance with Athena: partitioning data and using columnar storage formats like Parquet. These methods have dramatically reduced our query times and improved our data analysis efficiency. Here’s a closer look at how we’ve implemented these strategies:
Data Organization for Partitioning and Parquet
Organize your data in S3 for efficient querying:
s3://your-bucket/your-data/
├── year=2023/
│   ├── month=01/
│   │   ├── day=01/
│   │   │   └── data-file
│   │   └── day=02/
│   └── month=02/
└── year=2024/
└── month=01/
└── day=01/
Preprocessing Data for Optimal Performance
Before importing datasets into AWS Glue and Athena, preprocessing is essential to ensure consistency and efficiency. This involves handling mixed data types, adding date columns for partitioning, and converting files to a format suitable for Athena.
Note: The following steps are optional based on the data and requirements. Use them according to your requirements.
1. Handling Mixed Data Types
To address columns with mixed data types, standardize them to the most common type using the following code snippet:def determine_majority_type(series): # get the types of all non-null values types = series.dropna().apply(type) # count the occurrences of each type type_counts = types.value_counts()
preprocess.py
2. Adding Date Columns for Partitioning
To facilitate partitioning, add additional columns for year, month, and day:def add_date_columns_to_csv(file_path): try: # read the CSV file df = pd.read_csv(file_path)
partitioning.py
3. Converting CSV to Parquet Format
For optimized storage and querying, convert CSV files to Parquet format:def detect_and_convert_mixed_types(df): for col in df.columns: # detect mixed types in the column if df[col].apply(type).nunique() > 1:
paraquet.py
4. Concatenating Multiple CSV Files
To consolidate multiple CSV files into one for Parquet conversion:def read_and_concatenate_csv_files(directory): all_dfs = [] # recursively search for CSV files in the directory
concatenate.py
Step-by-Step Guide to Managing Datasets with AWS Glue and Athena
1. Place Your Source Dataset in S3
Tumblr media
2. Create a Crawler in AWS Glue
In the AWS Glue console, create a new crawler to catalog your data and make it queryable with Athena.
Specify Your S3 Bucket: Set the S3 bucket path as the data source in the crawler configuration.
IAM Role: Assign an IAM role with the necessary permissions to access your S3 bucket and Glue Data Catalog.
Tumblr media
3. Set Up the Glue Database
Create a new database in the AWS Glue Data Catalog where your CSV data will be stored. This database acts as a container for your tables.
Database Creation: Go to the AWS Glue Data Catalog section and create a new database.
Crawler Output Configuration: Specify this database for storing the table metadata and optionally provide a prefix for your table names.
4. Configure Crawler Schedule
Set the crawler schedule to keep your data catalog up to date:
Hourly
Daily
Weekly
Monthly
On-Demand
Scheduling the crawler ensures data will be updated to our table, if any updates to existing data or adding of new files etc.
5. Run the Crawler
Initiate the crawler by clicking the "Run Crawler" button in the Glue console. The crawler will analyze your data, determine optimal data types for each column, and create a table in the Glue Data Catalog.
6. Review and Edit the Table Schema
Post-crawler, review and modify the table schema:
Change Data Types: Adjust data types for any column as needed.
Create Partitions: Set up partitions to improve query performance and data organization.
Tumblr media
7. Query Your Data with AWS Athena
In the Athena console:
Connect to Glue Database: Use the database created by the Glue Crawler.
Write SQL Queries: Leverage SQL for querying your data directly in Athena.
8. Performance Comparison
After the performance optimizations, we got the following results:
To illustrate it, I ran following queries on 1.6 GB data:
For Parquet data format without partitioning
SELECT * FROM "athena-learn"."parquet" WHERE transdate='2024-07-05';
For Partitioning with CSV
Tumblr media
Query Runtime for Parquet Files: 8.748 seconds. Parquet’s columnar storage format and compression contribute to this efficiency.
Query Runtime for Partitioned CSV Files: 2.901 seconds. Partitioning helps reduce the data scanned, improving query speed.
Data Scanned for Paraquet Files:  60.44MB
Data Scanned for Partitioned CSV Files: 40.04MB
Key Insight: Partitioning CSV files improves query performance, but using Parquet files offers superior results due to their optimized storage and compression features.
9. AWS Athena Pricing and Optimization
AWS Athena pricing is straightforward: you pay $5.00 per terabyte (TB) of data scanned by your SQL queries. However, you can significantly reduce costs and enhance query performance by implementing several optimization strategies.
Conclusion
AWS Athena offers a powerful, serverless SQL interface for querying large datasets. By adopting best practices in data preprocessing, organization, and Athena usage, you can manage and analyze your data efficiently without the overhead of complex infrastructure.
0 notes