#AI21Labs
Explore tagged Tumblr posts
Text
AI21 Labs’ Jamba 1.5 Models Available On Amazon Bedrock
Amazon Bedrock now offers the AI21 Labs Jamba 1.5 models.
Jamba 1.5 family
We are pleased to inform that Amazon Bedrock now offers the potent new Jamba 1.5 family of large language models (LLMs) from AI21 Labs. With these models, long-context language capabilities have advanced significantly and are now more performant, fast, and useful for a variety of applications. Two variants in the Jamba 1.5 family are the Jamba 1.5 Mini and Jamba 1.5 Large. Both models allow the consumption of document objects, function calls, and a 256K token context window with structured JSON output.
Building foundation models and artificial intelligence (AI) systems for the enterprise is a specialty of AI21 Labs. By working strategically together, AI21 Labs and AWS are enabling clients in a variety of industries to develop, implement, and grow generative AI systems that address pressing issues and spur creativity. Customers may use LLMs in a safe setting to influence how people process information, communicate, and learn going forward by utilizing AI21 Labs’ cutting-edge, production-ready models in conjunction with Amazon’s specialized services and robust infrastructure.
What is Jamba 1.5?
The transformer model architecture and Structured State Space model (SSM) technology are combined in a novel hybrid architecture used by Jamba 1.5 models. With this novel method, Jamba 1.5 models can manage lengthy context windows up to 256K tokens while retaining the high-performance features of conventional transformer models. This hybrid SSM/transformer architecture is covered in greater detail in the whitepaper Jamba: A Hybrid Transformer-Mamba Language Model.
Amazon Bedrock now supports two new Jamba 1.5 models from AI21:
Jamba 1.5 Large: Jamba 1.5 For applications that demand high-quality results on both long and short inputs, Large is the perfect choice because it performs exceptionally well on complex reasoning tasks across all prompt lengths.
Jamba 1.5 Mini: Fast examination of large documents and data is made possible by Jamba 1.5 Mini’s low-latency processing of long prompts.
The Jamba 1.5 models’ main advantages are as follows:
Extended context handling – Jamba 1.5 models, with their 256K token context length, can enhance the performance of enterprise applications, including extensive document summarizing and analysis, as well as agentic and RAG workflows.
Multilingual: Hebrew, Arabic, German, Dutch, Spanish, French, Portuguese, Italian, and English are all supported.
Friendly to developers: It can automatically handle structured JSON output, invoke functions, and process document objects.
Efficiency and speed: AI21 evaluated the Jamba 1.5 models, reporting that the models outperform other models of similar sizes by up to 2.5X when it comes to inference on lengthy contexts.
Start using Jamba 1.5 models on Amazon Bedrock now
To begin using the new Jamba 1.5 models, navigate to the Amazon Bedrock console, select Model access from the pane on the bottom left, and submit a request to access Jamba 1.5 Small or Jamba 1.5 Large.Image credit to AWS
Select the Text or Chat playground from the left menu pane in the Amazon Bedrock dashboard to test the Jamba 1.5 models. Next, click Select model, choose AI21 as the category, and choose between Jamba 1.5 Mini and Jamba 1.5 Large for the model.
AWS SDKs can be used to access accessible models, and you can use a variety of programming languages to develop your apps.
For use cases like paired document analysis, compliance analysis, and lengthy document question-answering, the Jamba 1.5 models are ideal. They can effortlessly process lengthy or complex papers, evaluate data from several sources, and determine if passages adhere to particular rules. The AI21-on-AWS GitHub repository contains sample code. Visit AI21’s documentation to find out more about efficiently prompting Jamba models.
Presently accessible
Currently, the Jamba 1.5 model family from AI21 Labs is widely accessible in the US East (North Virginia) AWS Region’s Amazon Bedrock. Visit the AI21 Labs in Amazon Bedrock product and pricing pages to find out more.
Read more on govindhtech.com
#AI21Labs#Jamba15Models#AmazonBedrock#Jamba15Large#Amazon#pricingpages#generativeAi#AI21documentation#Buildingfoundation#technology#technews#news#govindhtech
0 notes
Text
Dive into the world of Jamba, the first production-grade Mamba-based model. Discover how it’s redefining AI with its unique architecture and superior throughput. From large context windows to resource efficiency, Jamba is setting new standards.
#AI#Jamba#Mamba#SSM#ArtificialIntelligence#NLP#LanguageModels#AI21Labs#artificial intelligence#open source#machine learning#machinelearning
0 notes
Link
AI21 Studio é uma plataforma para desenvolvedores que permite o uso de modelos de linguagem natural Jurassic-1.
0 notes
Video
vimeo
Wordtune Spices - Directors cut from Kobi Vogman on Vimeo.
Client: AI21Labs Copy: Amos Merton
Director and Creative: Kobi Vogman Producer: Omer Ben-David Cinematographer: Omri Barzilai Editor: Aviv Cohen Art director: Liran Koren Lead Animator: Liron Narunsky Concept artist and Character design: Tom Apfel Puppet maker: Moria Koren Art assistants: Liron Narunsky, Shany Dahan Sound design: Noam Ofir Voice-over: Damon Webb Cat voices: Liron Narunsky Sound recording: Lenny Cohen Motion design: Ran Daskal, Gur Margalit Compositing: Ofeq Shemer Color grading: Kobi Vogman Production assistants: Alon Avidar Runner: Ofer Holan
0 notes
Text
AI Links
On the Opportunities and Risks of Foundation Models
Abstract: AI is undergoing a paradigm shift with the rise of models (e.g., BERT, DALL-E, GPT-3) that are trained on broad data at scale and are adaptable to a wide range of downstream tasks. We call these models foundation models to underscore their critically central yet incomplete character. This report provides a thorough account of the opportunities and risks of foundation models, ranging from their capabilities (e.g., language, vision, robotics, reasoning, human interaction) and technical principles(e.g., model architectures, training procedures, data, systems, security, evaluation, theory) to their applications (e.g., law, healthcare, education) and societal impact (e.g., inequity, misuse, economic and environmental impact, legal and ethical considerations). Though foundation models are based on standard deep learning and transfer learning, their scale results in new emergent capabilities,and their effectiveness across so many tasks incentivizes homogenization. Homogenization provides powerful leverage but demands caution, as the defects of the foundation model are inherited by all the adapted models downstream. Despite the impending widespread deployment of foundation models, we currently lack a clear understanding of how they work, when they fail, and what they are even capable of due to their emergent properties. To tackle these questions, we believe much of the critical research on foundation models will require deep interdisciplinary collaboration commensurate with their fundamentally sociotechnical nature.
or in other words
Stanford's ~entire AI Department has just released a 200 page 100 author Neural Scaling Laws Manifesto. They're pivoting to positioning themselves as #1 at academic ML Scaling (e.g. GPT-4) research.
Program Synthesis with Large Language Models
Abstract: This paper explores the limits of the current generation of large language models for program synthesis in general purpose programming languages. We evaluate a collection of such models (with between 244M and 137B parameters) on two new benchmarks, MBPP and MathQA-Python, in both the few-shot and fine-tuning regimes. Our benchmarks are designed to measure the ability of these models to synthesize short Python programs from natural language descriptions. The Mostly Basic Programming Problems (MBPP) dataset contains 974 programming tasks, designed to be solvable by entry-level programmers. The MathQA-Python dataset, a Python version of the MathQA benchmark, contains 23914 problems that evaluate the ability of the models to synthesize code from more complex text. On both datasets, we find that synthesis performance scales log-linearly with model size. Our largest models, even without finetuning on a code dataset, can synthesize solutions to 59.6 percent of the problems from MBPP using few-shot learning with a well-designed prompt. Fine-tuning on a held-out portion of the dataset improves performance by about 10 percentage points across most model sizes. On the MathQA-Python dataset, the largest fine-tuned model achieves 83.8 percent accuracy. Going further, we study the model's ability to engage in dialog about code, incorporating human feedback to improve its solutions. We find that natural language feedback from a human halves the error rate compared to the model's initial prediction. Additionally, we conduct an error analysis to shed light on where these models fall short and what types of programs are most difficult to generate. Finally, we explore the semantic grounding of these models by fine-tuning them to predict the results of program execution. We find that even our best models are generally unable to predict the output of a program given a specific input.
Twitter thread
Jurassic-1: Technical Details and Evaluation
Abstract: Jurassic-1 is a pair of auto-regressive language models recently released by AI21 Labs, consisting of J1-Jumbo, a 178B-parameter model, and J1-Large, a 7B-parameter model. We describe their architecture and training, and evaluate their performance relative to GPT-3. The evaluation is interms of perplexity, as well as zero-shot and few-shot learning. To that end, we developed a zero-shot and few-shot test suite, which we made publicly available (https://github.com/ai21labs/lm-evaluation) as a shared resource for the evaluation of mega language models
Announcement and use cases
Evaluating Large Language Models Trained on Code
Abstract: We introduce Codex, a GPT language model fine-tuned on publicly available code from GitHub, and study its Python code-writing capabilities. A distinct production version of Codex powers GitHub Copilot. On HumanEval, a new evaluation set we release to measure functional correctness for synthesizing programs from docstrings, our model solves 28.8% of the problems, while GPT-3 solves 0% and GPT-J solves 11.4%. Furthermore, we find that repeated sampling from the model is a surprisingly effective strategy for producing working solutions to difficult prompts. Using this method, we solve 70.2% of our problems with 100 samples per problem. Careful investigation of our model reveals its limitations, including difficulty with docstrings describing long chains of operations and with binding operations to variables. Finally, we discuss the potential broader impacts of deploying powerful code generation technologies, covering safety, security, and economics.
Blog post
I Beta Tested OpenAI’s Codex, and the Results Are Spooky Good.
It won’t replace human programmers, but it will make them far more powerful
Measuring Coding Challenge Competence With APPS
Abstract: While programming is one of the most broadly applicable skills in modern society, modern machine learning models still cannot code solutions to basic problems. Despite its importance, there has been surprisingly little work on evaluating code generation, and it can be difficult to accurately assess code generation performance rigorously. To meet this challenge, we introduce APPS, a benchmark for code generation. Unlike prior work in more restricted settings, our benchmark measures the ability of models to take an arbitrary natural language specification and generate satisfactory Python code. Similar to how companies assess candidate software developers, we then evaluate models by checking their generated code on test cases. Our benchmark includes 10,000 problems, which range from having simple one-line solutions to being substantial algorithmic challenges. We fine-tune large language models on both GitHub and our training set, and we find that the prevalence of syntax errors is decreasing exponentially as models improve. Recent models such as GPT-Neo can pass approximately 20% of the test cases of introductory problems, so we find that machine learning models are now beginning to learn how to code. As the social significance of automatic code generation increases over the coming years, our benchmark can provide an important measure for tracking advancements.
Twitter thread
Gwern’s comment about AI-based code generation
Accelerating progress in brain recording tech
When will programs write programs for us?
It has been clear to me since GPT-3, that reality is way ahead of community prediction (lower 25% Apr 2023; median 2026; upper 75% 2032).
0 notes