#contrastive learning
Explore tagged Tumblr posts
Text
The AI Scientist
New Post has been published on https://thedigitalinsider.com/the-ai-scientist/
The AI Scientist
A model that can produce novel AI papers plus some really cool papers and tech releases this week.
Next Week in The Sequence:
Edge 423: We explore the fundamentals of state space models including the fmaous S4 paper. The tech section provides an overview of NVIDIA’s NIM framework.
Edge 424: We dive into the DeepMind’s amazing AlphaProof and AlphaGeometry-2 that achieved silver medal in the latest international math olympiad.
You can subscribe to The Sequence below:
TheSequence is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
📝 Editorial: The AI Scientist
If you read this newsletter, you know that I firmly believe discovering new science might be the ultimate test for AGI. While we are still far from having AI that can formulate something like the Riemann Hypothesis or the Theory of General Relativity, we have made tremendous progress in proving and validating scientific ideas across disciplines such as mathematics, physics, biology, chemistry, and others.
The reason science presents such a challenging bar for AI is that it involves aspects like long-term planning, creativity, multidisciplinary knowledge, multi-step fact-checking, and many other components that are still in the very early stages of development in generative AI.
However, progress is being made.
This week, the Japanese AI startup Sakana AI, in collaboration with several other AI labs, published a paper detailing The AI Scientist, a framework for open-ended scientific discovery. The AI Scientist is capable of conducting open-ended research, executing experiments, generating code, visualizing results, and even presenting them in full reports. In the initial demonstrations, The AI Scientist made several contributions across different areas of AI research, including diffusion models, transformers, and grokking.
The core ideas behind The AI Scientist resemble models such as DeepMind’s Alpha Geometry, AlphaProof, or the NuminaMath model that recently won first prize in the AI Math Olympiad. These models use an LLM for idea formulation, combined with more symbolic models for experimentation. The biggest challenge with this approach is whether the idea-generation portion will quickly hit its limits. Some of the most groundbreaking scientific discoveries in history seem to involve a component of human ingenuity that doesn’t yet appear to be present in LLMs. However, this path holds great potential for exploring new ideas in scientific research.
For now, The AI Scientist represents an exciting advancement in open-ended scientific research.
🔎 ML Research
The AI Scientist
Researchers from Sakana AI, Oxford, University of British Columbia and several other institutions published a paper unveiling the AI Scientist, a pipeline for open ended scientific research using LLMs. The AI Scientist injects AI in different area of scientific research such as ideation, a literature search, experiment planning, experiment iterations, manuscript writing, and peer reviewing —> Read more.
Imagen 3
Google published the technical report of Imagen 3, their marquee text-to-image model. The paper details the training and evaluation details behind Imagen 3 as well as some of the challenges around safety —> Read more.
Mitigating Hallucinations
Google Research published a paper detailing HALVA, a contrastive tuning method that can mitigate hallucinations in language and image assistants. Like other contrastive learning methods, HALVA generates alternative representations of factual tokens with the objective of boosting the probability of the model identifying the correct token —> Read more.
Your Context is Not an Array
Qualcomm Research published a paper that explores the limitations of transformers. The paper suggest that some of the generalization challenges of transformers are related with the inability to perform random memory access within its context window —> Read more.
Mutual Reasoning in LLMs
Microsoft Research published a paper introducing rStar, a self-play multi reasoning approach that seems to improve reasoning capabilities in small language models. rStar uses a generation-discrimination process to decouple the different steps in the reasoning process —> Read more.
Pretraining vs. Fine Tuning
Researchers from Johns Hopkins University published a paper exploring the relationship between pretraining and fine-tuning in LLMs. The paper explores the diminishing returns of fine-tuning after certain scale —> Read more.
🤖 AI Tech Releases
Grok-2
xAI unveiled a new version of Grok that matches the performance of top open source models —> Read more.
SWE-Bench
OpenAI released a subset of the famous SWE-Bench benchmark with human verification —> Read more.
Claude Prompt Caching
Anthropic unveiled prompt caching capabilities for Claude 3.5 Sonnet and Claude 3 Haiku —> Read more.
Airflow 2.10
Apache Airflow 2.10 arrived with a strong focu on AI workflows —> Read more.
AI Risks Database
MIT open sourced a database of over 700 AI risks across different categories —> Read more.
🛠 Real World AI
Image Animation at Meta
Meta discusses the AI techniques used for image animation at scale —> Read more.
Model Reliability at Salesforce
Salesforce discusses the methods used to ensure AI model reliability and performance in their internal pipelines —> Read more.
📡AI Radar
Fei-Fei Li’s World Labs raised $100 million at a $1 billion valuation.
Decentralized AI startup Sahara AI raised $43 million in new funding.
Snowflake announced its Cortex Analyst solution to power self-service analytics with AI.
AI observaility platform Goodfire raised $7 million in new funding.
AI-focused VC Radical Ventures raised a new $800 million fund.
Raunway Gen-3 Turbo showcased very impressive capabilities.
AI-based stock evaluator TipRanks was acquired for $200 million.
Real Estate AI company EliseAI raised $75 million at $1 billion valuation.
Encord, an AI data development platform, raised a $30 million Series B.
RAG as a service platform Ragie raised $5.5 million.
CodeRabbit raised $16 million for using AI to automate code reviews.
AI-based scientific research platform Consensus raised an $11.5 million Series A.
TheSequence is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
#AGI#ai#ai model#AI research#alphageometry#AlphaProof#amazing#Analytics#animation#approach#as a service#benchmark#billion#Biology#challenge#chemistry#claude#claude 3#claude 3.5#Claude 3.5 Sonnet#code#Collaboration#contrastive learning#creativity#data#Database#decentralized AI#DeepMind#details#development
0 notes
Text
AI discovers that not every fingerprint is unique
0 notes
Text
drew some book!husbands. they feel like they've taken more traits from each other than the show.
#decided book!az has hints of elton john. its the old queen energy#book!crowley is nice looking average white man who learns how to wear a suit but half the reason he's interesting is bc he's trailing aroun#an old bookseller who looks maybe 30 years his senior and giving him babygirl eyes#its the stark contrast in the looks between the “bland” (cr) and “interesting” (az) that im going for with these guys#god im trying to make them not look like butch lesbians purely from an art skill pov but i cant beat the allegations#good omens#ineffable husbands#crowley#aziraphale
20K notes
·
View notes
Text
50+ deaths at 5 am got me yelling absolute nonsense to the bosses kicking my whole entire ass
#ultrakill#v1#gabriel#doodles#art#ultrakill fanart#fastest death was like. 6 seconds. maybe less. i was playing on normal#and then my brother told me the secrets of Spamming Slide Like Your Life Depends On It and i got him in like. 15 tries#max0r wasnt kidding this guys entire strat is Teleporting Behind You#so the contrast of nearly killing him first try and then beating his stupid gay ass real fast in his second fight was REALLY funny#i learned bitch#i dont think i've ever yapped and screamed and yelled so much during a videogame before. dont ask me about the noise i made when#the mannequins started moving like coked up little spiders#''i dont believe people are genuinely this loud when playing lethal company they're making this up'' me playing ultrakill:
825 notes
·
View notes
Text
Your Ancient History, Written In Wax
-
Danny knew he should have put better security around the Sarcophagus of Eternal Sleep. It wasn’t even Vlad who opened it this time! The fruitloop was too busy doing his actual mayor duties because for some godforsaken reason, the man got re-elected.
No, it wasn’t Vlad. And it wasn’t Fright Knight, either. Nor the Observants. Who opened the Sarcophagus, then? Danny didn’t have time to find out as Pariah Dark promptly tore open a hole in reality and started hunting Danny down.
The battle was longer this time. He didn’t have the Ecto-Skeleton, as that was the first thing Pariah had destroyed. The halfa had grown a lot over the past few years, and learned some new tricks, but apparently sleeping in a magic ghost box meant that Pariah had absorbed a lot of power. The bigger ghost acted like a one-man army!
Amity Park was caught in the middle of the battle, but the residents made sure it went no further than that. Vlad and the Fentons made a barrier around the town to keep the destruction from leaking. Sam, Tucker, and Dani did crowd control while Danny faced the king head-on.
Their battle shook the Zone and pulled them wildly between the mortal plane and the afterlife. Sometimes, residents noticed a blow from Pariah transported them to the age of the dinosaurs, and Phantom’s Wail brought them to an unknown future. Then they were in a desert. Then a blazing forest. Then underwater. It went on like that, but no one dared step foot outside of Amity. They couldn’t risk being left behind.
It took ages to beat him, but eventually, Danny stood above the old ghost king, encasing his symbols of power in ice so they couldn’t be used again. He refused to claim the title for himself. Tired as he was, Danny handed the objects off to Clockwork for safe keeping and started repairing the damage Pariah had done to the town. The tear he’d made was too big to fix, for now, so no one bothered. They just welcomed their new ghostly neighbors with open arms and worked together to restore Amity Park.
Finally, the day came to bring down the barrier. People were gathered around the giant device the Fentons had built to sustain it. Danny had brought Clockwork to Amity, to double check that they had returned to the right time and dimension.
Clockwork assured everyone that they were in the right spot, and only a small amount of time had passed, so the Fentons gave the signal to drop the shield.
Very quickly did they discover that something was wrong. The air smelled different. The noise of the nearby city, Elmerton, was louder and more chaotic. Something was there that wasn’t before, and it put everyone on edge.
Clockwork smiled, made a remark about the town fitting in better than before, and disappearing before Danny could catch him.
Frantic, Danny had a few of his ghost buds stay behind to protect the town while he investigated.
He flew far and wide, steadily growing horrified at the changes the world had undergone. Heroes, villains, rampant crime and alien invasions. The Earth was unrecognizable. There were people moving around the stars like it was second nature and others raising dead gods like the apocalypse was coming. Magic and ectoplasm was everywhere, rather than following the ley lines like they were supposed to.
Danny returned to Amity.
The fight with Pariah had taken them through space and time. Somewhere along the way, they had changed the course of history so badly that this now felt like an alien world.
How was he supposed to fix this?
-
In the Watchtower, The Flash was wrapping up monitor duty while Impulse buzzed around him, a little more jittery than usual. The boy was talking a mile a minute, when alarms started blaring an alarming green. Flash had never seen this alarm before, and its crackling whine was grating on his ears.
Flash returned to the monitor, frantically clicking around to find the issue, but nothing was popping up. No major disasters, no invasions, no declarations of war. Nothing! What was causing the alarm?
Impulse swore and zipped to a window, pressing his face against it and staring down at Earth. “Fuck! It’s today isn’t it? I forgot!”
“What’s today?” Flash asked. He shot off a text to Batman, asking if it was an error. The big Bat said it wasn’t, and that he would be there soon.
“The arrival of Amity Park. I learned about this in school; the alarm always gives me headaches.”
Flash turned to his grandson, getting his attention. “Bart,” he stressed. “What are you talking about?”
Impulse barely glanced over his shoulder. Now that Flash was facing him, he could see a strong glow coming from Earth. “The first villain, first anti-villain, and the first hero,” he said anxiously. “They all protect the town of the original metas. They’re all here.”
“Here? Now??”
“Yeah? They weren’t before, but they are now. The first hero said there was time stuff involved, which was what inspired me to start practicing time travel in the first place.”
“I’m not following.”
“It’s okay. We should probably go welcome them before they tear apart Illinois, though. The history I remember says that some of them freaked and destroyed a chunk of the Midwest during a fight with each other.”
“WHAT?”
#dpxdc#pondhead blurbs#liminal amity park#I’ve seen stuff like this in the mhaxdp fandom and I eat it up every time#basically the fight with Pariah caused the town to jump through time a little#and while they THOUGHT they were keeping everything in#shit leaked out and tainted those points in time#so technically#historically and genetically speaking#Amity Park is the origin point for the meta gene and Danny made history as the first hero#because Clockwork is a little shit#everyone embodies a basic ability and it has grown from there#the flash family are direct descendants of Dani (speed force Dani for the win)#Dash is the reason super strength exists#so on and so forth#go buck wild#bart learned about it briefly in history class in the 30th century#practically hero worships them#booster gold knows about them too but in contrast to Bart’s excitement#booster is fucking terrified because there was a period where Amity Park rebelled against the US government#and he’s from that specific time#he learned to fear phantom because he lived during that part while Bart is from farther in the future when those issues got resolved#guess who’s chosen to welcome the town? >:)#if you’re wondering what happened to the GIW#they turned into the branch Amanda Waller runs#Danny is the first hero#Vlad the first villain#and Dani the first anti hero#there’s an arc where Danny is trying to fix things but clockwork won’t let him into the timestream and all the heroes are horrified#because yeah Danny is the OG but if he goes back in time to fix his ‘mistake’ what will happen to them?
711 notes
·
View notes
Text
Thinking about the parallels between Ed’s and Mary’s violence. Stede hardly even questions Mary’s attempt to kill him, he’s mostly outraged at the method. The story punishes violence stemming from racism or enforcement of harmful social norms, but Mary’s violence to ensure she can continue to live a life where she can be herself is completely unpunished. Instead it’s an open action of self protection and expression that leads to understanding and love and a better outcome for everyone. She’s rewarded! She gets to end the marriage while also reconciling with Stede and knowing that they will both be free and happy. Also she gets Doug and the money (free real estate!) What more do we need to reinforce the message that it’s not violence that’s the problem, it’s why you use it
Ed’s violence against Izzy was for freedom too, but even more justified because it wasn’t just freedom from an unsuitable and stifling marriage, it was freedom from targeted abuse and control
Mary’s skewer more than validates Ed’s leg shot and I will die on this hill
#ed did nothing wrong#if anything part of his journey is to learn that sometimes violence is justified and worth it#and the contrast between mary’s violence for freedom and izzy’s violence of control#aaargh
177 notes
·
View notes
Text
Shadow
#ok this is all i have for this week unless i get super art cracked tonight and managed to draw some shit#shadow the hedgehog#shadow#sonic fanart#sth fanart#mono art#illustration#i need to learn how to do more contrast damn...
2K notes
·
View notes
Text
visit
#kylux#kylo ren#armitage hux#still learning their faces... augh...#didnt draw their gloves simply bc i missed shading hands i apologise orz#one thing i am struggling with but find very interesting#is how drawing kylo it seems he is made of all these harsh dark lines#with very high contrast#vs hux who has much less contrast and looks weird w heavy dark lines#so it's hard to bridge the gap stylistically between the two of them#hopefully i will figure it out...#BUT it is a very enticing difference#my art#also since im late to the actual game of making kylux art i will probably be retreading sooo much ground#🫡 nothing i can do about it... bc i want to retread classics....
313 notes
·
View notes
Note
Uhm
Uhm
I would like some uhm
Helmet party
Por favor?
Not only I am very crazy abt them but I need something happy (not rn but I like happy things and they make me feel happy)
So, uh
I would like some Helmet party, por favor.
They're flying high
#gopher art#tf2 soldier#tf2 engineer#helmet party#team fortress 2#ok sorry this took so long school's kicking my ass (with help from random chance and mental issues) but i will complete every request i have#and i will still take new requests going forward#anyway i think a really cute idea is that Solly thinks rocket jumping is the peak of freedom so he brings engie along one day just to#do some jumps. and that inspires engie to learn to wrangler jump or something#helmet party isnt my favorite engie or solly ship but its really stinkin cute#also they're blu purely to contrast the rocket jumper
156 notes
·
View notes
Text
#trying to figure out how i wanna draw myself when i start my new sketchbook#+ i kinda do wanna learn lineless art bc theres a certain style of it that reminds me of gouache! and i love flat matte gouache#its so hard tho as someone who struggles with contrast. and LOVES lines.#i wish it was as easy as changing the lineart color at the very end of a lined and colored piece ykwim#but the line color i choose when im painting in gouache is different than the colors i choose when coloring already existing lineart#if that makes sense#anyway this was directly inspired by manaohu on twitter ^_^ they are one of my fave artists#a doodley#ok unlocked
308 notes
·
View notes
Text
Idek what this is but here you go 🤲
#tfw when you have a doodle idea and it turns into a whole drawing#just a silly little thing#about her being much smarter than people expect her to be and how knowledge (?) isnt about knowing the most out of a book#its about the experiences you’ve had and what you’ve learned from them#and she’s had so many unspoken experiences but people don’t get it because she can’t put it all into words#so there’s the contrast between her messing up saying “don’t patronize me’’ and then her iconic quote ‘’you can change’’#and then some silly symbolism to go with it#hi guys I’m very normal about her I promise#cassandra cain#dc comics#batgirl#dc#batman#batfam#batman fanart#batgirl (2000)#art#my art#Bing’s doodles#using a normal pallet < using either black and white or the most blinding colors possible
182 notes
·
View notes
Text
Code Embedding: A Comprehensive Guide
New Post has been published on https://thedigitalinsider.com/code-embedding-a-comprehensive-guide/
Code Embedding: A Comprehensive Guide
Code embeddings are a transformative way to represent code snippets as dense vectors in a continuous space. These embeddings capture the semantic and functional relationships between code snippets, enabling powerful applications in AI-assisted programming. Similar to word embeddings in natural language processing (NLP), code embeddings position similar code snippets close together in the vector space, allowing machines to understand and manipulate code more effectively.
What are Code Embeddings?
Code embeddings convert complex code structures into numerical vectors that capture the meaning and functionality of the code. Unlike traditional methods that treat code as sequences of characters, embeddings capture the semantic relationships between parts of the code. This is crucial for various AI-driven software engineering tasks, such as code search, completion, bug detection, and more.
For example, consider these two Python functions:
def add_numbers(a, b): return a + b
def sum_two_values(x, y): result = x + y return result
While these functions look different syntactically, they perform the same operation. A good code embedding would represent these two functions with similar vectors, capturing their functional similarity despite their textual differences.
Vector Embedding
How are Code Embeddings Created?
There are different techniques for creating code embeddings. One common approach involves using neural networks to learn these representations from a large dataset of code. The network analyzes the code structure, including tokens (keywords, identifiers), syntax (how the code is structured), and potentially comments to learn the relationships between different code snippets.
Let’s break down the process:
Code as a Sequence: First, code snippets are treated as sequences of tokens (variables, keywords, operators).
Neural Network Training: A neural network processes these sequences and learns to map them to fixed-size vector representations. The network considers factors like syntax, semantics, and relationships between code elements.
Capturing Similarities: The training aims to position similar code snippets (with similar functionality) close together in the vector space. This allows for tasks like finding similar code or comparing functionality.
Here’s a simplified Python example of how you might preprocess code for embedding:
import ast def tokenize_code(code_string): tree = ast.parse(code_string) tokens = [] for node in ast.walk(tree): if isinstance(node, ast.Name): tokens.append(node.id) elif isinstance(node, ast.Str): tokens.append('STRING') elif isinstance(node, ast.Num): tokens.append('NUMBER') # Add more node types as needed return tokens # Example usage code = """ def greet(name): print("Hello, " + name + "!") """ tokens = tokenize_code(code) print(tokens) # Output: ['def', 'greet', 'name', 'print', 'STRING', 'name', 'STRING']
This tokenized representation can then be fed into a neural network for embedding.
Existing Approaches to Code Embedding
Existing methods for code embedding can be classified into three main categories:
Token-Based Methods
Token-based methods treat code as a sequence of lexical tokens. Techniques like Term Frequency-Inverse Document Frequency (TF-IDF) and deep learning models like CodeBERT fall into this category.
Tree-Based Methods
Tree-based methods parse code into abstract syntax trees (ASTs) or other tree structures, capturing the syntactic and semantic rules of the code. Examples include tree-based neural networks and models like code2vec and ASTNN.
Graph-Based Methods
Graph-based methods construct graphs from code, such as control flow graphs (CFGs) and data flow graphs (DFGs), to represent the dynamic behavior and dependencies of the code. GraphCodeBERT is a notable example.
TransformCode: A Framework for Code Embedding
TransformCode: Unsupervised learning of code embedding
TransformCode is a framework that addresses the limitations of existing methods by learning code embeddings in a contrastive learning manner. It is encoder-agnostic and language-agnostic, meaning it can leverage any encoder model and handle any programming language.
The diagram above illustrates the framework of TransformCode for unsupervised learning of code embedding using contrastive learning. It consists of two main phases: Before Training and Contrastive Learning for Training. Here’s a detailed explanation of each component:
Before Training
1. Data Preprocessing:
Dataset: The initial input is a dataset containing code snippets.
Normalized Code: The code snippets undergo normalization to remove comments and rename variables to a standard format. This helps in reducing the influence of variable naming on the learning process and improves the generalizability of the model.
Code Transformation: The normalized code is then transformed using various syntactic and semantic transformations to generate positive samples. These transformations ensure that the semantic meaning of the code remains unchanged, providing diverse and robust samples for contrastive learning.
2. Tokenization:
Train Tokenizer: A tokenizer is trained on the code dataset to convert code text into embeddings. This involves breaking down the code into smaller units, such as tokens, that can be processed by the model.
Embedding Dataset: The trained tokenizer is used to convert the entire code dataset into embeddings, which serve as the input for the contrastive learning phase.
Contrastive Learning for Training
3. Training Process:
Train Sample: A sample from the training dataset is selected as the query code representation.
Positive Sample: The corresponding positive sample is the transformed version of the query code, obtained during the data preprocessing phase.
Negative Samples in Batch: Negative samples are all other code samples in the current mini-batch that are different from the positive sample.
4. Encoder and Momentum Encoder:
Transformer Encoder with Relative Position and MLP Projection Head: Both the query and positive samples are fed into a Transformer encoder. The encoder incorporates relative position encoding to capture the syntactic structure and relationships between tokens in the code. An MLP (Multi-Layer Perceptron) projection head is used to map the encoded representations to a lower-dimensional space where the contrastive learning objective is applied.
Momentum Encoder: A momentum encoder is also used, which is updated by a moving average of the query encoder’s parameters. This helps maintain the consistency and diversity of the representations, preventing the collapse of the contrastive loss. The negative samples are encoded using this momentum encoder and enqueued for the contrastive learning process.
5. Contrastive Learning Objective:
Compute InfoNCE Loss (Similarity): The InfoNCE (Noise Contrastive Estimation) loss is computed to maximize the similarity between the query and positive samples while minimizing the similarity between the query and negative samples. This objective ensures that the learned embeddings are discriminative and robust, capturing the semantic similarity of the code snippets.
The entire framework leverages the strengths of contrastive learning to learn meaningful and robust code embeddings from unlabeled data. The use of AST transformations and a momentum encoder further enhances the quality and efficiency of the learned representations, making TransformCode a powerful tool for various software engineering tasks.
Key Features of TransformCode
Flexibility and Adaptability: Can be extended to various downstream tasks requiring code representation.
Efficiency and Scalability: Does not require a large model or extensive training data, supporting any programming language.
Unsupervised and Supervised Learning: Can be applied to both learning scenarios by incorporating task-specific labels or objectives.
Adjustable Parameters: The number of encoder parameters can be adjusted based on available computing resources.
TransformCode introduces A data-augmentation technique called AST transformation, applying syntactic and semantic transformations to the original code snippets. This generates diverse and robust samples for contrastive learning.
Applications of Code Embeddings
Code embeddings have revolutionized various aspects of software engineering by transforming code from a textual format to a numerical representation usable by machine learning models. Here are some key applications:
Improved Code Search
Traditionally, code search relied on keyword matching, which often led to irrelevant results. Code embeddings enable semantic search, where code snippets are ranked based on their similarity in functionality, even if they use different keywords. This significantly improves the accuracy and efficiency of finding relevant code within large codebases.
Smarter Code Completion
Code completion tools suggest relevant code snippets based on the current context. By leveraging code embeddings, these tools can provide more accurate and helpful suggestions by understanding the semantic meaning of the code being written. This translates to faster and more productive coding experiences.
Automated Code Correction and Bug Detection
Code embeddings can be used to identify patterns that often indicate bugs or inefficiencies in code. By analyzing the similarity between code snippets and known bug patterns, these systems can automatically suggest fixes or highlight areas that might require further inspection.
Enhanced Code Summarization and Documentation Generation
Large codebases often lack proper documentation, making it difficult for new developers to understand their workings. Code embeddings can create concise summaries that capture the essence of the code’s functionality. This not only improves code maintainability but also facilitates knowledge transfer within development teams.
Improved Code Reviews
Code reviews are crucial for maintaining code quality. Code embeddings can assist reviewers by highlighting potential issues and suggesting improvements. Additionally, they can facilitate comparisons between different code versions, making the review process more efficient.
Cross-Lingual Code Processing
The world of software development is not limited to a single programming language. Code embeddings hold promise for facilitating cross-lingual code processing tasks. By capturing the semantic relationships between code written in different languages, these techniques could enable tasks like code search and analysis across programming languages.
Choosing the Right Code Embedding Model
There’s no one-size-fits-all solution for choosing a code embedding model. The best model depends on various factors, including the specific objective, the programming language, and available resources.
Key Considerations:
Specific Objective: For code completion, a model adept at local semantics (like word2vec-based) might be sufficient. For code search requiring understanding broader context, graph-based models might be better.
Programming Language: Some models are tailored for specific languages (e.g., Java, Python), while others are more general-purpose.
Available Resources: Consider the computational power required to train and use the model. Complex models might not be feasible for resource-constrained environments.
Additional Tips:
Experimentation is Key: Don’t be afraid to experiment with a few different models to see which one performs best for your specific dataset and use case.
Stay Updated: The field of code embeddings is constantly evolving. Keep an eye on new models and research to ensure you’re using the latest advancements.
Community Resources: Utilize online communities and forums dedicated to code embeddings. These can be valuable sources of information and insights from other developers.
The Future of Code Embeddings
As research in this area continues, code embeddings are poised to play an increasingly central role in software engineering. By enabling machines to understand code on a deeper level, they can revolutionize the way we develop, maintain, and interact with software.
References and Further Reading
CodeBERT: A Pre-Trained Model for Programming and Natural Languages
GraphCodeBERT: Pre-trained Code Representation Learning with Data Flow
InferCode: Self-Supervised Learning of Code Representations by Predicting Subtrees
Transformers: Attention Is All You Need
Contrastive Learning for Unsupervised Code Embedding
#Abstract Syntax Tree#ai#Analysis#applications#approach#Artificial Intelligence#attention#Behavior#bug#bugs#Capture#code#Code Embedding#Code embeddings#Code Review#Code Search#Code2vec#CodeBERT#coding#Community#comprehensive#computing#continuous#contrastive learning#data#Deep Learning#detection#developers#development#diversity
0 notes
Text
Wip for a lil Bildad/UZ mini comic that's making me laugh bc it is indeed gonna be a really silly time.
As much as I live laugh love trying out new things stylistically, sometimes I just wanna do what is the most intuitive for me (aka pencil scribbles to make up for my piss poor anatomy skills, messy gouache coloring, and fucking with the colour gradients until I look like I know colour theory (I do not))
#ignore his ghostly hand i was zoomed in and forgot to colour it#i think im learning i like strangely coloured highlights#and contrasting colours#and the colour RED#good omens#good omens fanart#wip#bildad the shuhite#crowley
101 notes
·
View notes
Text
There's something at the tip of my tongue about the parallels between Jackie and Wilson and Who We Are
How the narrator of Jackie and Wilson wants to run away with a woman that he's carved out of his imagination based on a brief interaction. How they would try the world, but good god it wasn't for them. So they run away from it into a fantasy world where they live by their own rules.
And then comes the narrator of Who We Are, who dreamt his whole life of finding someone who would hold him like water or like a knife, only to find that running away from the world will only get them so far, since "the hardest part is who we are". And only to find out that the "phantom life" he's fantasized about is actually just that: a phantom. And its absence sharpens like a knife
#hozier#sahar stfu about hozier#I'm aware that on the surface these songs are not really about the same thing#but there's something about the way that j&w narrator craves the fantasy he speaks of#that reminds me so much of the voice of Who We Are who thinks he realizes the fantasy only to watch it become a nightmare#and the contrast between the dream-fuelled lyrics of J&W that evoke daring adventures on the wide opened desert#and the sharp concise biting lyrics of who we are sung in tired resignation#and followed by near-wailing at the end#that just sticks out to me so much#it's like the whole of Unreal Unearth is just a sequence of the various voices of the self-titled album coming back#sometimes disillusioned#sometimes exhausted#sometimes resigned#sometimes wise#sometimes at peeace#sometime enraged#to sing about their most bitter disappointments and the harshest lessons they've learned over the years#and I can't stop thinking about it
300 notes
·
View notes
Text
🕒🐯Mit dir an meiner Seite🐯🕒
My submission for the lawsanlaw anthology of 2024 If the text is too hard to read, I put it in the alt text as well :)
This was so much fun to do. My initial submission grew so much in size that I wouldn't have been able to finish on time. I WILL finish it though, just on my own time without the stress, haha. Thank you for reading, I hope you liked it :)
#lawsan#sanlaw#opfanart#one piece fanart#trafalgar law#blackleg sanji#lawsanlaw#fancomic#fairy tale ish#woodenelafanart#mit dir an meiner seite#lawsanlaw anthology 2024#the web event was fun! did you check it out?#the monsters and mice were so much fun to draw. love the texture and the cute little lumps#page number 8 is my favourite out of all of them. just love the colors and the contrast#I for sure learned a lot lmao. so much I would do differently now#only means I'm more prepared for next time B)#also once again got punched in the gut by how slow I'm drawing :') but it's fine I love taking my time#it's only a problem if there is a deadline#I'm also so bad at screentones holy shit. how do others do that? chose them? mix them up? witchcraft honestly
113 notes
·
View notes
Text
I understand now, the "C." in his name stands for Cu-- Coming to another spooky game as a guest character.
It's been a while but I still remember how to draw him. Thank goodness.
#it's always wild for me to hop back between smol dainty shonen-animu megas to beeg scary stronge belmondos#the sudden contrast in their proportions makes it a fun challenge ( o vo)b#anywho doodling those locks again is always so noice :)#and that particular close-up shot of trevor's profile is absolutely perfect it's everything I've always wanted to convey in him#(I ended up giving his eyes not quite the same look as the shot but it's good enough they're all for practice again anyway)#doodle-daas#akumajou dracula#trevor c belmont#ralph c belmont#really the day we truly learn trevor's full middle name is the day we learn simon's natural hair color :Y#tw// blood#tw: blood#anti netflixvania
80 notes
·
View notes