go on then, show me how that has anything to do with Multiply-Accumulate units
Intel's Neural Processing Unit (NPU) is a specialized hardware component designed to accelerate AI workloads. It consists of four main components:
Global Control: This is a crossbar that works alongside a memory mapping unit (MMU) and Direct Memory Access (DMA).
Scratchpad RAM: A small, fast memory used for temporary storage of calculations.
Two Neural Compute Engines (NCEs): Each NCE contains a programmable Digital Signal Processor (DSP) and an inference pipeline.
The inference pipeline is the main resource for number crunching. It contains a MAC (Multiply-Accumulate) array, which is a set of logic components that accelerate matrix multiplication and convolution operations. These operations are fundamental to many AI and machine learning algorithms. The MAC array supports INT8 and FP16 data types and can perform up to 2,048 MAC operations per cycle.
In addition to the MAC array, the inference pipeline includes fixed-function hardware for data conversion and array activation. Each NCE also contains two Very Long Instruction Word (VLIW) programmable DSPs that support a wide range of data types.
The NPU is designed to be up to 8 times more power-efficient at performing an AI workload than if the same task were handled by the integrated GPU or CPU cores. This efficiency is one of the reasons why Intel developed the NPU instead of simply deploying XMX accelerators on the integrated GPU coupled with GFNI and DLBoost on the Compute tile[1].
M1, M2 & M3 have been very succesful, partly because they include similar functions. There is an advantage in using the CPU because it can share main memory. NVIDIA are using VRAM as market segmentation. 24Gb VRAM is the most you'll find on desktop GPUs.
Intel tried before with Larabee, which was the project of the current CEO and he sees it as a mis-step it was not followed through as it might have competed with NVIDIA in that space.
Your chip already has lots of specialist functions you don't use. BCD functions for example.
Your x87/MMX registers are there but are not used in x64 mode, along with including saving/restoring of segment registers on the stack, saving/restoring of all registers (PUSHA/POPA), decimal arithmetic, BOUND and INTO instructions, and "far" jumps and calls with immediate operands.
Do you run in 32bit protected mode much? How about 16-bit protected mode ?
go on then, show me how that has anything to do with Multiply-Accumulate units
Intel's Neural Processing Unit (NPU) is a specialized hardware component designed to accelerate AI workloads. It consists of four main components:
The inference pipeline is the main resource for number crunching. It contains a MAC (Multiply-Accumulate) array, which is a set of logic components that accelerate matrix multiplication and convolution operations. These operations are fundamental to many AI and machine learning algorithms. The MAC array supports INT8 and FP16 data types and can perform up to 2,048 MAC operations per cycle.
In addition to the MAC array, the inference pipeline includes fixed-function hardware for data conversion and array activation. Each NCE also contains two Very Long Instruction Word (VLIW) programmable DSPs that support a wide range of data types.
The NPU is designed to be up to 8 times more power-efficient at performing an AI workload than if the same task were handled by the integrated GPU or CPU cores. This efficiency is one of the reasons why Intel developed the NPU instead of simply deploying XMX accelerators on the integrated GPU coupled with GFNI and DLBoost on the Compute tile[1].
Citations:
[1] https://www.techpowerup.com/review/intel-meteor-lake-technical-deep-dive/4.html
[2] https://www.theverge.com/2023/12/14/23998215/intel-core-ultra-cpu-specs-availability
M1, M2 & M3 have been very succesful, partly because they include similar functions. There is an advantage in using the CPU because it can share main memory. NVIDIA are using VRAM as market segmentation. 24Gb VRAM is the most you'll find on desktop GPUs.
Intel tried before with Larabee, which was the project of the current CEO and he sees it as a mis-step it was not followed through as it might have competed with NVIDIA in that space.
Your chip already has lots of specialist functions you don't use. BCD functions for example.
Do you do a lot of AES encryption? Your CPU has dedicated instructions for it.
Your x87/MMX registers are there but are not used in x64 mode, along with including saving/restoring of segment registers on the stack, saving/restoring of all registers (PUSHA/POPA), decimal arithmetic, BOUND and INTO instructions, and "far" jumps and calls with immediate operands.
Do you run in 32bit protected mode much? How about 16-bit protected mode ?