FPGAs are (not) Good at Deep Learning [Invited]

Рет қаралды 22,520

Crossroads 3D-FPGA Academic Research Center

Күн бұрын

Пікірлер: 21

@prat1024 2 жыл бұрын

The presentation was extraordinary!! I am a student at the university of Stuttgart as well and this post randomly came across my feed.

@مقاطعمترجمة-ش8ث 2 ай бұрын

same lol

@eafindme 10 ай бұрын

Imagine that you have 3 binary files, each represent a FPGA binary for different DNN models. Then you have an FPGA. Instead of making hardware architecture universal that could support all 3 DNN models, like GPU or ASIC, you could just optimize each DNN model for FPGA via software codesign, and reprogram the FPGA on the fly, such that 3 of the DNN models has distinctive hardware optimization. Now the FPGA has the same flex as ASIC yet cost way less space and money. This is where the fun begins.

@wangshuoleon4400 2 ай бұрын

but i fell development such algorithms on fpga cost much more time compared with that on GPU,

@مقاطعمترجمة-ش8ث 2 ай бұрын

@@wangshuoleon4400 That's the dilemma, fpga save you ASIC cost, more flexible, GPU faster to develop

@MrTweetyhack 2 жыл бұрын

"If you can build it in ASIC, it won't be competitive on an FPGA" So what can't be built in ASIC? Actually, this has been know for a long long time

@gm7361 Жыл бұрын

it means if you have the resources and the budget.

@vicktorioalhakim3666 Жыл бұрын

The problem is that ML engineering is a dynamic discipline: models change all the time, and are updated. So, if one wants to map their model in an efficient way to hardware wrt power usage, resource usage, throughput, latency, etc, then the hardware must also be flexible and dynamic. If you design an ASIC-based accelerator, you kinda have to make it as general as possible to support various changes to topology and parameters of the model. Because the architecture of this accelerator is fixed, this means that often you will have underutilization (resource waste, higher power usage, etc..) or overutilization (lower throughput, higher latency, etc). Now, if you have to tape out many ASICs for different types of models, then this will become costly quite quickly, and quite frankly a waste since newer models will come up, quickly deprecating the design. This is where the power of FPGAs can come in handy: here you have the power to customize your HW arch on the fly, such that it suits the given model best. The biggest difficulty is coming up with a good HW "compiler", so that you minimize the amount of manual labor involved in mapping a model to the HW, including the pre and post-processing stages.

@shashwatkhandelwal367 2 жыл бұрын

Loved the talk!👏 Some very cool ideas!

@shaikon5617 2 жыл бұрын

Great presentation. Thanks a lot for sharing. Is the Intel project publicly available ?

@enkidughom2508 10 ай бұрын

Excellent!! Is there a technical report following this? Would l9ve to dive into the details and try to reproduce some results

@aqf0786 11 ай бұрын

If you knew the fundamental difference in area, speed and power of an FPGA vs ASIC, why not just focus on the key architectural improvements and make an ASIC? Surely, Intel would be able to do so?

@harishabibullah1286 2 жыл бұрын

Thanks for the talk Mr. Abdelfattah. Is there any course / training to learn these stages of custom h/w kernel development for deep learning ? I am also in a similar field, and my approach is simply to import the hardware from the Synthesizing tool, like Vitis HLS. I am intrested in defining or tweeking some paramteres to make a more customized hardware.

@mabdelfattah88 Жыл бұрын

My course on ML HW & SYS (www.youtube.com/@mabdelfattah88) could help give you an overview but we don't really go deep into the hardware design part of it. I am preparing a new FPGA-focused course now which should cover the detailed design of HW accelerators - I hope to also post parts of it online. Stay tuned!

@vatsan2483 Жыл бұрын

@@mabdelfattah88 Looking forward to this course.. but based on the above presentation a quick question sir.. on the topic of co-design for DNN, you had suggested that FPGA-X can achieve 100imgs/s for imagenet classification rather than DLA can achieve 80imgs/s for this ResNet-50.. basically more generic model for a larger class than specialised/tuned for specific testcase.. But isnt the underlying purpose of DNN itself is rather specific than of broader notion? Like tuning of parameters by nature is a subject of its input data isnt?

@jacoblin0820 Жыл бұрын

@@mabdelfattah88 Looking forward to the new course!

@rulekop Жыл бұрын

Very interesting and clearly presented!

@chriswysocki8816 7 ай бұрын

did I hear that right, mr. presenter? you did this project while working at Intel? And you were not using Intel/Altera FPGAs but Xilinx. Why???? As a former Altera/Intel manager in the FPGA group I feel disappointed :)

@MrMaguuuuuuuuu 2 ай бұрын

Oops. Altera can’t give away their parts 😂

@BharatIndiaHindustan628 Жыл бұрын

Hi Mohamed, I'm a beginner at AI and deep learning. And I have just started to learn these things. In order to build some deep learning hardware applications/IPs for practice and hands on purpose. I'm really fascinated with the things that AI can do in field of health monitoring and medical diagnostics. I'll be really grateful and happy if you can provide me your mail id. I would like to keep in touch with you for guidance and mentorship. Thanks