HiAI Mobile Computing Platform Overview
HiAI is one of the world's first mobile AI computing platforms with a dedicated Neural-network Processing Unit (NPU).
The HiAI API library, released as a unified binary file, is set to enable fast neural network-based computing through HiAI Heterogeneous Computing Platform, which is integrated within Kirin SoC.
With the HiAI APIs, developers can focus on developing new AI applications instead of focusing on performance tuning.
HiAI Mobile Computing Platform Technologies
The HiAI mobile computing platform supports a dedicated set of AI instructions for neural network model operations that allow more efficient parallel execution of more neural network operators within minimal clock cycles.
The HiAI mobile computing platform can compile a variety of neural network operators, such as convolution, pooling, activation, and full connecting, into dedicated AI instruction sequences for the NPU in an offline setting, with data and weight rearrangement for optimized performance. The Instructions and the data are then combined together to generate an offline execution model. Furthermore, during offline compilation, cross-layer operators can be fused together (convolution, ReLU, and pooling), in order to reduce the read-write bandwidth of the DDR and thus improve performance.
The HiAI mobile computing platform supports sparse model acceleration. The NPU can skip the multiply-add algorithms with a coefficient of zero, which can greatly improve the computing efficiency and reduce the bandwidth while maintaining computing precision.
The HiAI mobile computing platform supports 8-bit and 1-bit quantization, effectively reducing the computing bandwidth and storage consumption and improving energy efficiency.
HiAI Mobile Computing Platform Execution
As shown in Figure 1, by using compilation tools, a trained neural network model is converted into an offline model that can be efficiently executed on the HiAI mobile computing platform, and output as a binary file, the Offline Model.
The main purpose of compiling the standard neural network model (such as Caffe) into an offline model is to optimize the network configuration. After compilation, an optimized target file, called the offline model, is generated. The offline model is serialized and stored on the disk. As a result, when the inference is performed, the optimized target file is used, which is faster.
As shown in Figure 2, during the offline model computing, the offline model is loaded from the file, and data entered by users is copied to HiAI's NPU for computing. User data only needs to be imported from the DDR to the NPU once for each inference during computing.
The HiAI mobile computing platform supports a variety of frameworks, including Caffe, TensorFlow, and others. The developer needs to specify the framework being used.