Trends in Deep Learning Hardware: Bill Dally (NVIDIA)

Рет қаралды 23,274

Paul G. Allen School

Күн бұрын

Пікірлер: 25

@FeintMotion Жыл бұрын

Let's see Paul G. Allen's computer science lecture

@DestinyAdvisor 6 ай бұрын

Look at that clear, engaging explanation. The depth of knowledge presented. Oh, my god. It even includes practical examples.

@lance31415 Жыл бұрын

Great to have this posted - not going to catch it all the first time through

@alfinal5787 Жыл бұрын

Great talk.

@goldnutter412 Жыл бұрын

Still barely half way through, this is epic

@EnginAtik Жыл бұрын

If we can figure out the optimum topology of the NN by pruning it could be possible to replace the whole NN with an analog circuit which would be a MIMO analog controller. If that is ever possible we might gain some theoretical understanding of what NN is modeling by inspecting the transfer functions. Practical economic considerations like power consumption, speed, industrial chip manufacturing at scale are important but we also want to gain some theoretical understanding of why NNs are good at what they do: A table of values of the trigonometric functions can be unwieldy but they are the result of the simple Pythagorean formula. Similarly with AI we are basically generating huge tables of values and the race to be the leader is directing us to deal with bigger and bigger tables. Tiniest insects can do real time image processing, trajectory planning and flight control at minuscule energy costs with hardly any digital computation involved. We need an Occams Razor to to start simplifying things and pruning and sparcity techniques could be a first step for simplification.

@virushk 10 ай бұрын

Legend!

@binjianxin7830 11 ай бұрын

There are tons of engineering know-hows, while others (Google) are trying to catch up with designing by machine learning but still seem to fall behind.

@stefanogrillo6040 Жыл бұрын

FYI DNNs are multipurpose trained processors, but yeah you can call them what you want

@nauy 9 ай бұрын

If this is what NVidia thinks is the trend, they better watch their back. They’re still stuck in the von Neumann box and not addressing the big elephant in the room. Someone will leap frog them. It’s a matter of time.

@MCStairsOfDeath 8 ай бұрын

yep

@The_Uncertainity_Principal 6 ай бұрын

Could you explain this statement ? Not disagreeing at all just curious on your perspective

@The_Uncertainity_Principal 6 ай бұрын

Is the ‘elephant in the room’ fundamentally different architecture (i.e, groq) vs all this optimization at the edges ?

@nauy 6 ай бұрын

@@The_Uncertainity_PrincipalThe computational principle of neural networks is fundamentally very different from digital computers. The von Neumann architecture in digital computers segregates the processor from memory. This is optimized for running a long stream of instructions on the same data. Compute is heavy on each piece of data and centralized. Information is localized in each piece of data and stored in large memory connected by a bus to the processor. The capacity for this type of compute is a function of processor speed. In neural networks, compute is light on each piece of data and decentralized. Information is distributed over a large set of processors (nodes) and memory (weights). The capacity for this type of compute is a function of the connectivity, ie I/O bandwidth. Currently the weighted connectivity between nodes in most artificial neural networks (other than neuromorphic processors eg from Rain), is simulated via matrix multiplication, which has O(n^3) time complexity. GPU or SIMD only improves it to O(n^2). In real neural networks (like the one in your head) or neuromorphic processors, the weighted connectivity is implemented via physical connections, which has O(1) complexity. Now neuromorphic computers is better at solving this I/O bandwidth problem but at the cost of being inflexible in being able to modify the network architecture at any time. Also, current implementations are pure electrical, which puts a limit on the scale of the connectivity (impedance problem). I believe Rain processors have to use sparse connections (random subset and star) to get around this problem on even very modest network sizes. There are optical systems that are in research stages which can do matrix multiply in O(1). That’s the kind of leapfrog technologies I was referring to.

@nauy 6 ай бұрын

@@The_Uncertainity_PrincipalThe elephant in the room is I/O bandwidth. To scale up neural networks, new architecture is needed because the current processor/memory architectures, even the SIMD versions, are ill suited to extremely distributed computations in neural networks.

@pandoorapirat8644 11 ай бұрын

lol 26:04.7

@goldnutter412 Жыл бұрын

Big numbers are awesome, can't get over how cool this video and how he explains it Group theory, abstraction, and the 196,883-dimensional monster @3Blue1Brown