#IntelGPU | Explore Tumblr posts and blogs

govindhtech · 2 months ago

Text

Roofline AI: Unlocking The Potential Of Variable Hardware

What is Roofline AI?

Edge AI is implemented with the help of a software development kit (SDK) called Roofline AI. It was developed by Roofline AI GmbH, a spin-off from RWTH Aachen University.

The following is made easier with RooflineAI’s SDK:

Flexibility: Models from any AI framework, including ONNX, PyTorch, and TensorFlow, may be imported.

Roofline AI provides excellent performance.

Usability: RooflineAI is simple to use.

RooflineAI makes it possible to deploy on a variety of hardware, such as CPUs, MPUs, MCUs, GPUs, and specialized AI hardware accelerators.

RooflineAI’s retargetable AI compiler technology fosters collaborations with chip suppliers and the open-source community.

A computer science technique called the Roofline model aids programmers in figuring out a computation’s compute-memory ratio. It is employed to evaluate AI architectures’ memory bandwidth and computational efficiency.

To redefine edge AI deployment

Edge AI is developing quickly. Rapidly emerging novel models, like LLMs, make it difficult to foresee technological advancements. Simultaneously, hardware solutions are becoming more complicated and diverse.

Conventional deployment techniques are unable to keep up with this rate and have turned into significant obstacles to edge AI adoption. They are uncomfortable to use, have limited performance, and are not very adaptable.

With a software solution that provides unparalleled flexibility, superior performance, and user-friendliness, Roofline transforms this procedure. With a single Python line, import models from any framework and distribute them across various devices.

Benefits

Flexible

Install any model from any framework on various target devices. Innovative applications may be deployed on the most efficient hardware with to the retargetable compiler.

Efficient

Unlock your system’s full potential. Without sacrificing accuracy, it provide definite performance benefits, including up to 4x reduced memory consumption and 1.5x lower latency.

EASY

Deployment is as simple as a Python call with us. All of the necessary tools are included in to SDK. Unfold them if you’d want, or let us handle the magic from quantization to debugging.

How RooflineAI works

Roofline AI showed how their compiler converts machine learning models from well-known frameworks like PyTorch and TensorFlow into SPIR-V code, a specific language for carrying out parallel computation operations, during the presentation.

As a consequence, developers may more easily get optimal performance without requiring unique setups for every kind of hardware with to a simplified procedure that permits quick, optimized AI model deployment across several platforms.

OneAPI’s ability to enable next-generation AI is demonstrated by Roofline AI’s dedication to improving compiler technology. Roofline AI is not only enhancing AI deployment but also establishing a new benchmark for AI scalability and efficiency with to its unified support for many devices and seamless connectivity with the UXL ecosystem.

Roofline AI is establishing itself as a major force in the development of scalable, high-performance AI applications by pushing the limits of AI compiler technology.

The Contribution of Roofline AI to the Development of Compiler Technology with oneAPI

The oneAPI DevSummit is an event centered around the oneAPI specification, an open programming paradigm that spans industries and was created by Intel to accommodate a variety of hardware architectures.

The DevSummit series, which are held all around the world and are frequently organized by groups like the UXL Foundation, bring together developers, researchers, and business executives to discuss the real-world uses of oneAPI in fields including artificial intelligence (AI), high-performance computing (HPC), edge computing, and more.

Roofline AI took center stage at the recent oneAPI DevSummit, which was organized by the UXL Foundation and Intel Liftoff member, to showcase its creative strategy for improving AI and high-performance HPC performance.

Through RooflineAI’s integration with the UXL framework, they were able to fulfill a key demand in the AI and HPC ecosystem: effective and flexible AI compiler support that can blend in with a variety of devices.

In order to connect AI models and the hardware that runs them, AI compilers are essential. The team from Roofline AI stressed in their pitch that they have developed a strong compiler that facilitates end-to-end model execution for the UXL ecosystem by utilizing the open-source Multi-Level Intermediate Representation (MLIR). With this architecture, developers can map and run AI models on many devices with unmatched flexibility and efficiency.

It’s a clear advancement in device-agnostic AI processing, especially for sectors with a range of hardware requirements. A lightweight runtime based on the Level Zero API, which makes kernel calls and efficiently manages memory, is the foundation of their approach.

In addition to optimizing performance, Roofline AI‘s runtime guarantees compatibility with a variety of Level Zero-compatible hardware, such as Intel GPUs. Because of this interoperability, developers may use their software to control devices outside of the box, reducing the requirement for configuration and increasing the range of hardware alternatives.

Read more on govindhtech.com

#RooflineAI #VariableHardware #TensorFlow #EdgeAI #AImodel #oneAPI #IntelLiftoff #IntelGPU #news #CompilerTechnology #technology #technews #govindhtech

0 notes

mymobilemag · 2 years ago

Photo

Intel #Graphics Processing Unit: #ARC Platform Unveiled. What do you know about Intel? What is your take on Graphics Processing Unit? What do you mean by ARC #Platforms? Find out. Link Mentioned In Bio!!!! @intel #mnbile #technology #tech #mymobileindia #arcgpus #arcgpustaff #GPUs #gpuserver #gpushortage #intel #intelligence #intellectualproperty #intelgpu #intelgpu2022 #intelgpugaming #intelgpuforcreators #intelgpusoftwareengineeringinternship https://www.instagram.com/p/CjILRxsvANx/?igshid=NGJjMDIxMWI=

0 notes

telepathlc · 3 years ago

Text

@elonmusk

@elonmuskfans intel gpus are already cool and its the only stock not taking a dip lets fucking gooooo meme

good song blunt one march of the elves bandcamp

#elon elonmusk tesla spacex stockmanipulation stock stocks intel intelgpu intelalchemist alchemy elongatedsyrupwasan@ checkmypag

0 notes

gozealouscloudcollection · 5 years ago

Text

Intel：GPU是我們第二重要的產品 2020發布首款獨顯

CPU在未來毫無疑問依然是Intel最重要的產品，但哪些產品是Intel第二重要的呢？ Intel官方的回答有些出人意料。

日前在出席花旗銀行全球技術大會時，Intel雲計算業務副總裁傑森·格雷貝回答了Xeon之後哪款產品會是Intel第二重要的問題，他的答案是“GPU”，也就是說GPU顯卡在地位在Intel心目中是僅次於CPU的。

至於為什麼，原因也很簡單，傑森·格雷貝認為GPU的應用範圍比專業加速器產品更為廣泛。

Intel現在已經是全球最大的GPU廠商，不過這個是說核顯的，而傑森·格雷貝的回答顯然也不是指核顯GPU，而是高性能獨顯GPU，也就是Xe架構為代表的新一代GPU產品。

根據Intel的計劃，該公司將在2020年發布首款獨顯GPU，基於10nm工藝，首發的主要有遊戲GPU廠商，而2021年��會有7nm工藝的高性能GPU，主要用於數據中心，從首發7nm工藝上也可以看出Intel的重視了。

訪問購買頁面:

英特爾旗艦店

from Intel：GPU是我們第二重要的產品 2020發布首款獨顯 via KKNEWS

#業內資訊 #2020發布首款獨顯 #cnBeta #Intel 英特爾 #Intel：GPU是我們第二重要的產品 2020發布首款獨顯 #IntelGPU是我們第二重要的產品

0 notes

hacknews · 5 years ago

Photo

First Intel-Made Discrete GPU “DG1” Shown Off At CES 2020 #ces2020 #dg1 #intel #inteldiscretegraphics #intelgpu #intelxe #hacking #hacker #cybersecurity #hack #ethicalhacking #hacknews

0 notes

govindhtech · 3 months ago

Text

PyTorch 2.5: Leveraging Intel AMX For Faster FP16 Inference

Intel Advances AI Development through PyTorch 2.5 Contributions

New features broaden support for Intel GPUs and improve the development experience for AI developers across client and data center hardware.

PyTorch 2.5 supports new Intel data center CPUs. Inference capabilities on Intel Xeon 6 processors are improved by Intel Advanced Matrix Extensions(Intel AMX) for eager mode and TorchInductor, which enable and optimize the FP16 datatype. Windows AI developers can use the TorchInductor C++ backend for a better experience.

Intel Advanced Matrix Extensions(Intel AMX)

Overview of Intel Advanced Matrix Extensions (Intel AMX) to fulfill the computational needs of deep learning workloads, Intel Corporation AMX extends and speeds up AI capabilities. The Intel Xeon Scalable CPUs come with this inbuilt accelerator.

Use Intel AMX to Speed Up AI Workloads

A new built-in accelerator called Intel AMX enhances deep learning training and inference performance on the CPU, making it perfect for tasks like image recognition, recommendation systems, and natural language processing.

What is Intel AMX?

Your AI performance is improved and made simpler using Intel AMX. Designed to meet the computational demands of deep learning applications, it is an integrated accelerator on Intel Xeon Scalable CPUs.

AI Inference Performance Enhancement

Improvement of AI Inference Performance Fourth-generation Intel Xeon Scalable processors with Intel AMX and optimization tools were used by Alibaba Cloud‘s machine learning platform (PAI). When compared to the prior generation, this enhanced end-to-end inferencing.

Optimizing Machine Learning (ML) Models

Improving Models for Machine Learning (ML)Throughput increases using the BERT paradigm over the previous generation were shown by Intel and Tencent using Intel AMX. Tencent lowers total cost of ownership (TCO) and provides better services because to the streamlined BERT model.

Accelerate AI with Intel Advanced Matrix Extensions

Use Intel Advanced Matrix Extensions to Speed Up AI. AI applications benefit from Intel AMX’s performance and power efficiency. It is an integrated accelerator specifically designed for Intel Xeon Scalable CPUs.

PyTorch 2.5

PyTorch 2.5, which was recently published with contributions from Intel, offers artificial intelligence (AI) developers enhanced support for Intel GPUs. Supported GPUs include the Intel Data Center GPU Max Series, Intel Arc discrete graphics, and Intel Core Ultra CPUs with integrated Intel Arc graphics.

These new capabilities provide a uniform developer experience and support, and they aid in accelerating machine learning processes inside the PyTorch community. PyTorch with preview and nightly binary releases for Windows, Linux, and Windows Subsystem for Linux 2 may now be installed directly on Intel Core Ultra AI PCs for researchers and application developers looking to refine, infer, and test PyTorch models.

What is PyTorch 2.5?

A version of the well-known PyTorch open-source machine learning framework is called PyTorch 2.5.

New Featuers of PyTorch 2.5

CuDNN Backend for SDPA: SDPA users with H100s or more recent GPUs may benefit from speedups by default with to the CuDNN Backend for SDPA.

Increased GPU Support: PyTorch 2.5 now supports Intel GPUs and has additional tools to enhance AI programming on client and data center hardware.

Torch Compile Improvements: For a variety of deep learning tasks, Torch.compile has been improved to better inference and training performance.

FP16 Datatype Optimization: Intel Advanced Matrix Extensions for TorchInductor and eager mode enable and optimize the FP16 datatype, improving inference capabilities on the newest Intel data center CPU architectures.

TorchInductor C++ Backend: Now accessible on Windows, the TorchInductor C++ backend improves the user experience for AI developers working in Windows settings.

SYCL Kernels: By improving Aten operator coverage and execution on Intel GPUs, SYCL kernels improve PyTorch eager mode performance.

Binary Releases: PyTorch 2.5 makes it simpler for developers to get started by offering preview and nightly binary releases for Windows, Linux, and Windows Subsystem for Linux 2.Python >= 3.9 and C++ <= 14 are supported by PyTorch 2.5.

Read more on govindhtech.com

#PyTorch25 #LeveragingIntelAMX #intel #IntelXeon #FasterFP16Inference #AlibabaCloud #PyTorch #MachineLearning #ML #IntelGPU #ai #gpu #MachineLearningModels #technology #technews #news #govindhtech

0 notes

govindhtech · 3 months ago

Text

TorchDynamo Method For Improving PyTorch Code Performance

Introduction of Using TorchDynamo to Write PyTorch Programs Faster. Presenters Yuning Qiu and Zaili Wang discuss the new computational graph capture capabilities in PyTorch 2.0 in their webinar, Introduction to Getting Faster PyTorch Programs with TorchDynamo.

TorchDynamo is designed to keep flexibility and usability while speeding up PyTorch scripts with little to no code modifications. It’s important to note that while TorchDynamo was originally used to describe the whole functionality, it is now known by its API name “torch.compile” in the most recent PyTorch documentation. This nomenclature is also used in this lesson.

Principles of Design and Motivation

PyTorch functions mostly in a “imperative mode” (sometimes called eager mode), which is why data scientists and academics have embraced it so enthusiastically due to its Pythonic philosophy and simplicity of use. This mode makes debugging simple and flexible by executing user code step-by-step. For large-scale model deployment, however, imperative execution may not be the best option.

In these cases, performance improvements are often obtained by assembling the model into an efficient computational network. Although they provide graph compilation, traditional PyTorch techniques like FX and TorchScript (JIT) have a number of drawbacks, especially when it comes to managing control flow and backward graph optimization. TorchDynamo was created to solve these issues by offering a more smooth graph capture procedure while maintaining PyTorch’s natural flexibility.

Torch Dynamo: Synopsis and Essential Elements

TorchDynamo works by tying into the frame evaluation process of Python, which is made possible by PEP 523, and examining Python bytecode while it is running. This enables it to execute in eager mode and dynamically capture computational graphs. PyTorch code must be converted by TorchDynamo into an intermediate representation (IR) so that a backend compiler like TorchInductor may optimize it. It functions with a number of important technologies:

AOTAutograd: Enhances training and inference performance by concurrently tracing forward and backward computational graphs in advance. These graphs are divided into manageable chunks by AOTAutograd so that they may be assembled into effective machine code.

PrimTorch: Reduces the original PyTorch operations to a set of around 250 primitive operators, hence simplifying and reducing the number of operators that backend compilers must implement. Thus, PrimTorch improves the built PyTorch models’ extensibility and portability on many hardware platforms.

TorchInductor: The backend compiler that converts the computational graphs that are recorded into machine code that is optimized. Both CPU and GPU optimizations are supported by TorchInductor, including Intel’s contributions to CPU inductor and Triton-based GPU backend optimizations.

Contributions of Intel to TorchInductor

An important factor in improving PyTorch model performance on CPUs and GPUs has been Intel:

CPU Optimizations: For more than 94% of inference and training kernels in PyTorch models, Intel has provided vectorization utilizing the AVX2 and AVX512 instruction sets. Significant gains in performance have resulted from this; depending on the precision utilized (FP32, BF16, or INT8), speedups have ranged from 1.21x to 3.25x.

GPU Support via Triton: OpenAI’s Triton is a domain-specific language (DSL) for Python that is used to write GPU-accelerated machine learning kernels. By using SPIR-V IR to bridge the gap between Triton’s GPU dialect and Intel’s SYCL implementations, Intel has expanded Triton to accommodate their GPU architectures. Triton may be used to optimize PyTorch models on Intel GPUs because to its extensibility.

Guard Systems and Caching

In order to manage dynamic control flow and reduce the need for recompilation, TorchDynamo provides a guard mechanism. Guards monitor the objects that are referred to in every frame and make sure that the graphs that are cached are only utilized again when the calculation has not changed. A guard will recompile the graph, dividing it into subgraphs if needed, if it notices a change. In doing so, the performance overhead is reduced and the accuracy of the compiled graph is guaranteed.

Adaptable Forms and Scalability

Support for dynamic forms is one of TorchDynamo’s primary features. TorchDynamo is capable of handling dynamic input shapes without the need for recompilation, in contrast to earlier graph-compiling techniques that often had trouble with input-dependent control flow or shape fluctuations. This greatly increases PyTorch models’ scalability and adaptability, enabling them to better adjust to changing workloads.

Examples and Use Cases

During the webinar, a number of real-world use cases were shown to show how useful TorchDynamo and TorchInductor are. For example, when optimized with TorchDynamo and TorchInductor, ResNet50 models trained on Intel CPUs using the Intel Extension for PyTorch (IPEX) demonstrated significant increases in performance. Furthermore, comparable performance advantages for models deployed on Intel GPU architectures are promised by Intel’s current efforts to expand Triton for Intel GPUs.

In summary

TorchDynamo and related technologies provide a major step forward in PyTorch’s capacity to effectively aggregate and optimize machine learning models. Compared to older methods like TorchScript and FX, TorchDynamo provides a more adaptable and scalable solution by integrating with Python’s runtime with ease and enabling dynamic shapes.

The contributions from Intel, especially in terms of maximizing performance for both CPUs and GPUs, greatly expand this new framework’s possibilities. As they continue to be developed, researchers and engineers who want to implement high-performance PyTorch models in real-world settings will find that TorchDynamo and TorchInductor are indispensable resources.

Read more on govindhtech.com

#SynxFlowProject #CUDA #SYCL #scienceworkflows #riskassessment #IntelGPU #IntelDPC #IntelMPIlibrary #oneAPI #InteloneAPIMathKernelLibrary #IntelMPI #IntelVTuneProfiler #intel #gpu #technology #technews #news #govindhtech

0 notes

govindhtech · 4 months ago

Text

Intel Core Ultra 200V Series CPUs Improve AI PC Performance

For the AI PC Age, New Core Ultra Processors Offer Groundbreaking Performance and Efficiency.

Intel Core Ultra

Leading laptop makers may benefit from the exceptional AI performance, interoperability, and power efficiency of Intel Core Ultra 200V series CPUs due to their large size. The Intel Core Ultra 200V series processors are the most efficient x86 CPU family that Intel has ever released. Their performance is outstanding, they provide revolutionary x86 power efficiency, a significant improvement in graphics performance, uncompromised application compatibility, heightened security, and unparalleled AI compute.Image Credit To Intel

With more than 80 consumer designs from more than 20 of the biggest manufacturing partners in the world, including Acer, ASUS, Dell Technologies, HP, Lenovo, LG, MSI, and Samsung, the technology will power the most comprehensive and powerful AI PCs on the market.

Preorders open today, and beginning on September 24, systems will be sold both online and in-store at more than 30 international shops. Beginning in November, all designs with Intel Core Ultra 200V series CPUs and the most recent version of Windows are eligible for a free upgrade that includes Copilot+ PC capabilities.

“Intel’s most recent Core Ultra processors dispel myths about x86 efficiency and set the industry standard for mobile AI and graphics performance. With our relationships with OEMs, ISVs, and the larger tech community, only Intel has the reach to provide customers an AI PC experience that doesn’t compromise.

Customers of today are more and more producing, interacting, playing, and learning while on the road. They need a system with outstanding performance, extended battery life, uncompromised application compatibility, and improved security. It should also be able to use AI hardware via widespread software enablement.

Intel Core Ultra Platform

With up to 50% lower package power and up to 120 total platform TOPS (tera operations per second) across central processing unit (CPU), graphic processing unit (GPU), and neural processing unit (NPU) to deliver the most performant and compatible AI experiences across models and engines, Intel Core Ultra 200V series processors were designed with all of that in mind. With up to four times the power of its predecessor, the fourth-generation NPU is perfect for energy-efficiently performing AI tasks over an extended period of time.

As part of its AI PC Acceleration Program, Intel works with over 100 integrated software suppliers (ISVs) and developers to activate industry-leading platform TOPS in more than 300 AI-accelerated features.

Through carefully calibrated power management and entirely redesigned Performance-cores (P-core) that are optimized for performance per power per area, the new processors provide efficient and remarkable core performance. Additionally, Intel’s most potent Efficient-cores (E-cores) can now handle a greater workload, guaranteeing silent and cool operation.

With a 30% average performance boost, Intel’s new X 2 graphics microarchitecture, which is included in the Intel Core Ultra 200V line of CPUs, represents a considerable improvement in mobile graphics performance. Support for three 4K displays, eight new 2nd Gen Xe-cores, eight upgraded ray tracing units, and new integrated Intel XMX AI. engines with up to 67 TOPS are all included in the integrated Intel Arc GPU. Enhanced XSS kernels allow the AI engines to power creative applications and improve gaming performance.

Intel Core Ultra 200V series processors

A great PC must be a great PC before it can be a great AI PC. With up to three times the performance per thread, an 80% peak performance boost, and up to 20 hours of battery life in productivity use scenarios, Intel Core Ultra 200V series processors are productivity powerhouses. These fantastic PCs are the next step in the AI PC’s progression. With over 500 optimized AI models, extensive ecosystem support, and collaborations with top ISVs, PCs equipped with the newest Intel Core Ultra CPUs enable customers to fully benefit from AI. The new CPUs, with their several powerful AI engines, deliver:

Content Generation: To make video editing simpler and quicker, work more quickly by automatically recognizing changes in the video scene. Use word prompts to unleash your imagination and create beautiful vector and raster art.

Safety: Check whether videos on the internet have been manipulated by using local AI deep-fake detection. AI screening, identification, and safeguarding of important files against dangerous programs and users may protect your PC’s personal data.

Efficiency: One-time video presentation recordings save time, and fresh audio and video including fresh conversation minimize the need for retakes.

Video games: Enhance gaming experiences and increase frames-per-second performance by using AI to provide upscaled, high-quality pictures.

Concerning Intel Evo Edition Utilizing the most recent Intel Core Ultra Processors: The majority of laptop designs with Intel Core Ultra 200V series CPUs will be Intel Evo Edition models, which are rigorously tested and co-engineered with Intel’s partners to provide the best possible AI PC experience.

These laptops are designed to help eliminate latency, limit distractions, and lessen reliance on battery charges by integrating essential platform technologies with system improvements. This ensures amazing experiences from any location. Intel Evo designs, which are new this year, have to achieve improved metrics for quieter and cooler operation.

Features consist of:

Performance and responsiveness in ultra-thin designs that are cooler and quieter.

Extended battery life in practice.

integrated security that reduces vulnerabilities and aids in stopping malware assaults.

Integrated Intel Arc graphics provide faster game development and more fluid gameplay, even while playing on the fly.

Connectivity that is lightning fast thanks to Intel Wi-Fi 7 (5 Gig).

The ability to use Thunderbolt Share to charge a PC, transmit data, and connect it to numerous displays.

Wake up instantly and charge quickly.

The highest accreditation for sustainability, EPEAT Gold.

What’s Next: Starting today, consumers may pre-order consumer devices equipped with Intel Core Ultra 200V series processors. Commercial products based on the Intel vPro platform will be released in the Next year.

IFA 2024 conference

Image Credit To Intel

The next generation of Intel Core Ultra processors, code-named Lunar Lake, was introduced ahead of the IFA 2024 conference by Jim Johnson, senior vice president and general manager of the Client Business Group, and Michelle Johnston Holthaus, executive vice president and general manager of Intel’s Client Computing Group. Partners from Intel joined them in launching a line of processors that redefines mobile AI performance.

The executives of Intel demonstrated how the new processors’ remarkable core performance, remarkable x86 power efficiency, revolutionary advances in graphics performance, and AI processing capacity provide users everything they need to create, connect, play, or study on the move.

Read more on govindhtech.com

#mobileai #wifi7 #IntelCoreUltra #intelvpro #intelevo #iocalai #ai #gpu #intelgpu #neuralprocessingunit #CopilotPC #CoreUltraProcessors #86cpu #cpu #pc #technology #technews #news #govindhtech

0 notes

govindhtech · 5 months ago

Text

Utilizing llama.cpp, LLMs can be executed on Intel GPUs

The open-source project known as llama.cpp is a lightweight LLM framework that is gaining greater and greater popularity. Given its performance and customisability, developers, scholars, and fans have formed a strong community around the project. Since its launch, GitHub has over 600 contributors, 52,000 stars, 1,500 releases, and 7,400 forks. More hardware, including Intel GPUs seen in server and consumer products, is now supported by llama.cpp as a result of recent code merges. Hardware support for GPUs from other vendors and CPUs (x86 and ARM) is now combined with Intel’s GPUs.

Georgi Gerganov designed the first implementation. The project is mostly instructional in nature and acts as the primary testing ground for new features being developed for the machine learning tensor library known as ggml library. Intel is making AI more accessible to a wider range of customers by enabling inference on a greater number of devices with its latest releases. Because Llama.cpp is built in C and has a number of other appealing qualities, it is quick.

16-bit float compatibility

Support for integer quantisation (four-, five-, eight-, etc.)

Absence of reliance on outside parties

There are no runtime memory allocations.

Intel GPU SYCL Backend

GGM offers a number of backends to accommodate and adjust for various hardware. Since oneAPI supports GPUs from multiple vendors, Intel decided to construct the SYCL backend using their direct programming language, SYCL, and high-performance BLAS library, oneMKL. A programming model called SYCL is designed to increase hardware accelerator productivity. It is an embedded, single-source language with a domain focus that is built entirely on C++17.

All Intel GPUs can be used with the SYCL backend. Intel has confirmed with:

Flex Series and Data Centre GPU Max from Intel

Discrete GPU Intel Arc

Intel Arc GPU integrated with the Intel Core Ultra CPU

In Intel Core CPUs from Generations 11 through 13: iGPU

Millions of consumer devices can now conduct inference on Llama since llama.cpp now supports Intel GPUs. The SYCL backend performs noticeably better on Intel GPUs than the OpenCL (CLBlast) backend. Additionally, it supports an increasing number of devices, including CPUs and future processors with AI accelerators. For information on using the SYCL backend, please refer to the llama.cpp tutorial.

Utilise the SYCL Backend to Run LLM on an Intel GPU

For SYCL, llama.cpp contains a comprehensive manual. Any Intel GPU that supports SYCL and oneAPI can run it. GPUs from the Flex Series and Intel Data Centre GPU Max can be used by server and cloud users. On their Intel Arc GPU or iGPU on Intel Core CPUs, client users can test it out. The 11th generation Core and later iGPUs have been tested by Intel. While it functions, the older iGPU performs poorly.

The memory is the only restriction. Shared memory on the host is used by the iGPU. Its own memory is used by the dGPU. For llama2-7b-Q4 models, Intel advise utilising an iGPU with 80+ EUs (11th Gen Core and above) and shared memory that is greater than 4.5 GB (total host memory is 16 GB and higher, and half memory could be assigned to iGPU).

Put in place the Intel GPU driver

There is support for Windows (WLS2) and Linux. Intel suggests Ubuntu 22.04 for Linux, and this version was utilised for testing and development.

Linux:sudo usermod -aG render username sudo usermod -aG video username sudo apt install clinfo sudo clinfo -l

Output (example):Platform #0: Intel(R) OpenCL Graphics -- Device #0: Intel(R) Arc(TM) A770 Graphics

orPlatform #0: Intel(R) OpenCL HD Graphics -- Device #0: Intel(R) Iris(R) Xe Graphics \[0x9a49\]

Set the oneAPI Runtime to ON

Install the Intel oneAPI Base Toolkit first in order to obtain oneMKL and the SYCL compiler. Turn on the oneAPI runtime next:

First, install the Intel oneAPI Base Toolkit to get the SYCL compiler and oneMKL. Next, enable the oneAPI runtime:

Linux: source /opt/intel/oneapi/setvars.sh

Windows: “C:\Program Files (x86)\Intel\oneAPI\setvars.bat\” intel64

Run sycl-ls to confirm that there are one or more Level Zero devices. Please confirm that at least one GPU is present, like [ext_oneapi_level_zero:gpu:0].

Build by one-click:

Linux: ./examples/sycl/build.sh

Windows: examples\sycl\win-build-sycl.bat

Note, the scripts above include the command to enable the oneAPI runtime.

Run an Example by One-Click

Download llama-2–7b.Q4_0.gguf and save to the models folder:

Linux: ./examples/sycl/run-llama2.sh

Windows: examples\sycl\win-run-llama2.bat

Note that the scripts above include the command to enable the oneAPI runtime. If the ID of your Level Zero GPU is not 0, please change the device ID in the script. To list the device ID:

Linux: ./build/bin/ls-sycl-device or ./build/bin/main

Windows: build\bin\ls-sycl-device.exe or build\bin\main.exe

Synopsis

All Intel GPUs are available to LLM developers and users via the SYCL backend included in llama.cpp. Kindly verify whether the Intel laptop, your gaming PC, or your cloud virtual machine have an iGPU, an Intel Arc GPU, or an Intel Data Centre GPU Max and Flex Series GPU. If so, llama.cpp’s wonderful LLM features on Intel GPUs are yours to enjoy. To add new features and optimise SYCL for Intel GPUs, Intel want developers to experiment and contribute to the backend. The oneAPI programming approach is a useful project to learn for cross-platform development.

Read more on govindhtech.com

#PyTorch2.4 #AI #IntelGPUs #GPU #OneAPI #NeuralNetwork #AIPC #GenerativeAI #LLM #GitHub #Pytorch #news #technews #technology #technologynews #technologytrends #govindhtech

0 notes