*Summary: Running Computer Vision Models on NPUs* What is an NPU? (0:37) - NPUs are specialized silicon chips optimized for running neural network computations, especially matrix multiplications. - Unlike CPUs and GPUs, they can't run general-purpose programs, focusing purely on neural network inference. - Many different names exist for these chips, including LPU, TPU, VPU, etc., but they share the core idea of accelerating neural network calculations. Why Use NPUs? (2:29) - Main advantages: Reduced power consumption, lower device cost, potential for significant speedups compared to CPU/GPU for specific tasks. - Main disadvantages: Increased development complexity, limited choice of neural network architectures, more intricate deployment and testing processes. Challenges of working with NPUs: - Diverse Ecosystem: (7:42) A vast landscape of vendors, frameworks, and boards makes finding a perfect solution difficult. Each vendor typically offers its own custom framework. - Model Export and Compatibility: (10:09) - Requires careful preparation, including specific patches and quantization, to adapt your model to the target NPU architecture. - Non-maximum suppression (NMS) (18:59) often needs to be handled outside the NPU, requiring separate code or fallback mechanisms. - Memory Limitations: (20:54) - Limited memory size on NPUs restricts model size and complexity. - Memory access speed and structure significantly impact performance. - Preprocessing: (22:46) May need to be performed separately on the CPU, GPU, or dedicated accelerator depending on the NPU and its capabilities. - Transformer Support: (23:58) Limited or non-existent on many NPUs, often requiring model adjustments or alternative convolutional architectures. - Layer Support: (25:23) - Advertised layer support can be misleading due to merged layers or limited functionalities. - Always verify compatibility and performance for your specific model layers. - Quantization: (27:33) - Essential for many NPUs to reduce model size and accelerate inference. - Can be complex and lead to accuracy degradation, requiring careful fine-tuning and evaluation. - Benchmarks: (30:30) - Often don't reflect real-world performance. - Always test on your target hardware and specific model for accurate results. Additional considerations: - CPUs play a vital role in data transfer, image decoding, preprocessing, and fallback mechanisms, impacting overall performance (36:43). - C++ is the dominant language for inference on most NPUs, while Python prevails in model training and export (38:45). - Training on NPUs is possible but involves a separate class of processors and different considerations (39:51). i used gemini 1.5 pro
@zorqis3 ай бұрын
Good summary and useful for passers by. However, the video contains some small remarks that contain a lot of useful information, so I still recommend watching the whole video.
@boltvalley3076Ай бұрын
Thank you.
@shakhizatnurgaliyev93557 ай бұрын
good one!
@diegosantos97577 ай бұрын
Dear, tks for the content. Which sbc would you recommend for somente just starting with computer vision?
@AntonMaltsev7 ай бұрын
Depends on your budget. The smooth experience is with Jetsons or Intel-based boards. In the case of a low budget, I recommend some RockChip-based solutions.
@diegosantos97577 ай бұрын
Tks mate, I will check the rockchip!
@andreyl27057 ай бұрын
awesome)
@עינהרע7 ай бұрын
You gonna test the new Hailo GenAI m.2 board?
@AntonMaltsev7 ай бұрын
It's difficult to buy one piece for home use, and none of my friends or colleagues are using it right now, so I have no chance to borrow. So, it's not in the plans. But if there is a chance, I will try.
@AntonMaltsev7 ай бұрын
But the next video will probably be about my experience of using Hailo in production (more about framework and Hailo-8)
@ДенисСлепцов-ь6п7 ай бұрын
Здравствуйте, давно слежу за Вашим творчеством. Прошу Вас, продолжайте в том же духе! Очень интересно. Могли бы Вы сказать, доводилось ли Вам размещать нейронную сеть на FPGA ? Если да, то могли бы Вы, пожалуйста, поделиться своим опытом ?
@AntonMaltsev7 ай бұрын
Добрый день, спасибо! Пару раз хотел потестить xilinx kria, но меня каждый раз отговаривали со словами что это полный хлам. В целом FPGA дефолтовый не то что хорошо ложиться на архитектуру сетей. Так что не очень понятен смысл даже...