Пікірлер
@chriswysocki8816
@chriswysocki8816 4 ай бұрын
did I hear that right, mr. presenter? you did this project while working at Intel? And you were not using Intel/Altera FPGAs but Xilinx. Why???? As a former Altera/Intel manager in the FPGA group I feel disappointed :)
@enkidughom2508
@enkidughom2508 7 ай бұрын
Excellent!! Is there a technical report following this? Would l9ve to dive into the details and try to reproduce some results
@eafindme
@eafindme 7 ай бұрын
Imagine that you have 3 binary files, each represent a FPGA binary for different DNN models. Then you have an FPGA. Instead of making hardware architecture universal that could support all 3 DNN models, like GPU or ASIC, you could just optimize each DNN model for FPGA via software codesign, and reprogram the FPGA on the fly, such that 3 of the DNN models has distinctive hardware optimization. Now the FPGA has the same flex as ASIC yet cost way less space and money. This is where the fun begins.
@aqf0786
@aqf0786 8 ай бұрын
If you knew the fundamental difference in area, speed and power of an FPGA vs ASIC, why not just focus on the key architectural improvements and make an ASIC? Surely, Intel would be able to do so?
@BenjaminGatti
@BenjaminGatti 8 ай бұрын
Sing song la de da.
@BharatIndiaHindustan628
@BharatIndiaHindustan628 10 ай бұрын
Hi Mohamed, I'm a beginner at AI and deep learning. And I have just started to learn these things. In order to build some deep learning hardware applications/IPs for practice and hands on purpose. I'm really fascinated with the things that AI can do in field of health monitoring and medical diagnostics. I'll be really grateful and happy if you can provide me your mail id. I would like to keep in touch with you for guidance and mentorship. Thanks
@User_1795
@User_1795 11 ай бұрын
The 90's called they want their mic back.
@rossoneill5576
@rossoneill5576 11 ай бұрын
🎯 Key Takeaways for quick navigation: 00:00 📅 *Introduction to FPGA and Deep Learning* - FPGA's initial attempt in deep learning in 2016. - Comparison to GPUs and the development of deep learning accelerators. - Overview of FPGA's early optimization strategies. 10:00 🧠 *Challenges and Competition from GPUs* - The rapid evolution of GPUs in deep learning with low precision arithmetic. - FPGAs falling behind ASICs and GPUs in performance. - The limitations of FPGA customization and adaptability. 20:00 🔄 *Exploring Future Possibilities for FPGAs in Deep Learning* - The concept of co-design in deep learning hardware and software. - Advocating for a flexible approach to optimizing both hardware and neural network architecture. - The potential of automated machine learning for FPGA-based deep learning. 20:46 🧠 *Automated Machine Learning and Neural Architecture Search* - Different deep neural networks (DNNs) can perform the same task. - Automated machine learning and neural architecture search are common practices in industry. - FPGAs offer the ability to customize both neural network architecture and hardware parameters simultaneously, leading to better performance. 28:09 ⚙️ *Logic Neural Networks and Logic Shrinkage* - Logic neural networks involve transforming DNNs into circuits using look-up tables (LUTs). - Logic shrinkage optimizes circuit netlists for FPGA implementation, resulting in higher efficiency. - Fine-grained pruning and training in the LUT domain can lead to significant FPGA area efficiency improvements. 36:00 🖥️ *FPGA DLA Devices and End-to-End Deep Learning Workloads* - Deep learning workloads are heterogeneous and consist of more than just DNNs. - Accelerating the entire end-to-end deep learning workload, including pre-processing and post-processing, is essential. - Optimizing system architecture for hardware acceleration beyond DNNs is necessary for overall performance improvements. 42:33 🚀 *FPGA's role in deep learning acceleration* - FPGA's suitability for accelerating deep neural networks (DNNs). - The importance of reconfigurable hardware in data centers. 45:32 🔄 *Hybrid FPGA-DLA devices for heterogeneous workloads* - The need for hybrid FPGA-DLA devices for heterogeneous workloads. - Implementing custom pre-processing and post-processing on FPGAs. - Research questions and challenges in developing these hybrid devices. 50:18 🌐 *Embedded Networks-on-Chip (NoC) for FPGAs* - Introduction to Embedded Networks-on-Chip (NoC) for FPGAs. - Solving FPGA designer challenges in system-level interconnects. - Benefits and efficiency of using an embedded NoC in FPGAs. 53:21 💡 *Using NoC-enabled FPGAs for pre and post-processing in deep learning* - Leveraging NoC-enabled FPGAs for pre and post-processing in deep learning workloads. - Connecting efficiently to deep learning accelerators. - The potential of FPGA devices in accelerating deep learning applications. Made with HARPA AI
@sergiosimonis
@sergiosimonis Жыл бұрын
*promo sm* 🎶
@rulekop
@rulekop Жыл бұрын
Very interesting and clearly presented!
@cmporeddy
@cmporeddy Жыл бұрын
Where can I download this presentation PPT?
@jasenq6986
@jasenq6986 Жыл бұрын
Software defined data movement is a great way to put it
@moeinmaleki7859
@moeinmaleki7859 Жыл бұрын
If I could clap virtually, I would stand up and clap in the end. what an amazing presentation and interesting topic. thank you for sharing this video!
@prat1024
@prat1024 Жыл бұрын
The presentation was extraordinary!! I am a student at the university of Stuttgart as well and this post randomly came across my feed.
@wayne1950
@wayne1950 Жыл бұрын
Most Important for me 🙏😙😙💘!!! Grow your online following - *promo sm* !!!
@littlecandle328
@littlecandle328 2 жыл бұрын
Please can u help me in XSG matlab by fpga
@MrTweetyhack
@MrTweetyhack 2 жыл бұрын
"If you can build it in ASIC, it won't be competitive on an FPGA" So what can't be built in ASIC? Actually, this has been know for a long long time
@gm7361
@gm7361 Жыл бұрын
it means if you have the resources and the budget.
@vicktorioalhakim3666
@vicktorioalhakim3666 9 ай бұрын
The problem is that ML engineering is a dynamic discipline: models change all the time, and are updated. So, if one wants to map their model in an efficient way to hardware wrt power usage, resource usage, throughput, latency, etc, then the hardware must also be flexible and dynamic. If you design an ASIC-based accelerator, you kinda have to make it as general as possible to support various changes to topology and parameters of the model. Because the architecture of this accelerator is fixed, this means that often you will have underutilization (resource waste, higher power usage, etc..) or overutilization (lower throughput, higher latency, etc). Now, if you have to tape out many ASICs for different types of models, then this will become costly quite quickly, and quite frankly a waste since newer models will come up, quickly deprecating the design. This is where the power of FPGAs can come in handy: here you have the power to customize your HW arch on the fly, such that it suits the given model best. The biggest difficulty is coming up with a good HW "compiler", so that you minimize the amount of manual labor involved in mapping a model to the HW, including the pre and post-processing stages.
@jasoncheung1388
@jasoncheung1388 2 жыл бұрын
Awesome work! Thanks for the talk.
@harishabibullah1286
@harishabibullah1286 2 жыл бұрын
Thanks for the talk Mr. Abdelfattah. Is there any course / training to learn these stages of custom h/w kernel development for deep learning ? I am also in a similar field, and my approach is simply to import the hardware from the Synthesizing tool, like Vitis HLS. I am intrested in defining or tweeking some paramteres to make a more customized hardware.
@mabdelfattah88
@mabdelfattah88 Жыл бұрын
My course on ML HW & SYS (www.youtube.com/@mabdelfattah88) could help give you an overview but we don't really go deep into the hardware design part of it. I am preparing a new FPGA-focused course now which should cover the detailed design of HW accelerators - I hope to also post parts of it online. Stay tuned!
@vatsan2483
@vatsan2483 Жыл бұрын
@@mabdelfattah88 Looking forward to this course.. but based on the above presentation a quick question sir.. on the topic of co-design for DNN, you had suggested that FPGA-X can achieve 100imgs/s for imagenet classification rather than DLA can achieve 80imgs/s for this ResNet-50.. basically more generic model for a larger class than specialised/tuned for specific testcase.. But isnt the underlying purpose of DNN itself is rather specific than of broader notion? Like tuning of parameters by nature is a subject of its input data isnt?
@jacoblin0820
@jacoblin0820 Жыл бұрын
@@mabdelfattah88 Looking forward to the new course!
@shaikon5617
@shaikon5617 2 жыл бұрын
Great presentation. Thanks a lot for sharing. Is the Intel project publicly available ?
@shashwatkhandelwal367
@shashwatkhandelwal367 2 жыл бұрын
Loved the talk!👏 Some very cool ideas!