Lecture - 12 GPU Acceleration

  Рет қаралды 5,008

Deep Learning Systems Course

Deep Learning Systems Course

Күн бұрын

Пікірлер: 4
@lianghui4353
@lianghui4353 2 жыл бұрын
34:00 line 15 array sA should be sB
@usefcompf571
@usefcompf571 2 жыл бұрын
28:18 each thread uses temp [threadIdx.x: threadIdx.x+2RADIUS]. it means that each thread should load A[base+threadIdx.x-RADIUS] into temp[threadIdx.x] and A[base+threadIdx.x+RADIUS] into temp[threadIdx.x+2RADIUS].The code on the slide works in a different way. So it seems it does not work corrently.
@frankj6650
@frankj6650 Жыл бұрын
I think there is nothing wrong, the array A is also begin of index 0, you can try to understand it by aligning the two sequences in the ppt.
@weizhang5424
@weizhang5424 6 ай бұрын
Yes, the PPT illustration is a little misleading in that Input (A) and Output (B) are supposed to be left-aligned. Several invariants: len(Output)=n, len(Output)=n+2*RADIUS. Output[k]=\sum_{i=k}^{i=k+2*RADIUS}Input[k]. The "cooperatively-fetching" logic simply asks each thread i to load Input[i] (modulo base), and asks the first 2*RADIUS threads in each thread block to each load 1 extra input element to maintain the len(Output)=n+2*RADIUS invariant. Personally, it is probably more natural to have let the last 2*RADIUS threads each load the 1 extra input element. But the code as it is is fine.
GPUs: Explained
7:29
IBM Technology
Рет қаралды 341 М.
Lecture 20 - Transformers and Attention
1:10:16
Deep Learning Systems Course
Рет қаралды 9 М.
БУ, ИСПУГАЛСЯ?? #shorts
00:22
Паша Осадчий
Рет қаралды 2,5 МЛН
When u fight over the armrest
00:41
Adam W
Рет қаралды 26 МЛН
Amazing remote control#devil  #lilith #funny #shorts
00:30
Devil Lilith
Рет қаралды 15 МЛН
Lecture 13 - Hardware Acceleration Implemention
50:09
Deep Learning Systems Course
Рет қаралды 3,7 М.
Lecture 15 - Training Large Models
46:37
Deep Learning Systems Course
Рет қаралды 2,7 М.
Fundamentals of GPU Architecture: Programming Model Part 1
38:37
CoffeeBeforeArch
Рет қаралды 12 М.
CPU vs GPU vs TPU vs DPU vs QPU
8:25
Fireship
Рет қаралды 1,8 МЛН
Lecture 14 - Implementing Convolutions
1:19:42
Deep Learning Systems Course
Рет қаралды 4,7 М.
Writing Code That Runs FAST on a GPU
15:32
Low Level
Рет қаралды 567 М.
Lecture 11 - Hardware Acceleration
45:22
Deep Learning Systems Course
Рет қаралды 4,8 М.
Lecture 16 - Generative Adversarial Networks
38:05
Deep Learning Systems Course
Рет қаралды 1,7 М.
CPU vs GPU (What's the Difference?) - Computerphile
6:39
Computerphile
Рет қаралды 896 М.
Lecture 5 - Automatic Differentiation Implementation
1:05:57
Deep Learning Systems Course
Рет қаралды 9 М.
БУ, ИСПУГАЛСЯ?? #shorts
00:22
Паша Осадчий
Рет қаралды 2,5 МЛН