Graph Representation Learning (Stanford university)

Рет қаралды 96,368

Күн бұрын

Пікірлер: 46

@deterministicalgorithmslab1744 4 жыл бұрын

NOTES/ Important Points :- 1.) Graphlets are undirected motifs. 2.) The purpose of graph representation learning is to allow automatic feature engineering by embedding each node into d-dimensional space such that similarities in the network(*) correspond to closeness(cosine/Euclidean distance) in the embedding space. 3.) Images==Grid graphs ; Text, audio == Chain graphs . Adjacency matrix representation of a graph is not permutation(of rows/columns) invariant. But the graph is. 4.) *We define this similarity in the network on the basis of random walks. Similarity(u,v) = probability of visiting node v on a random walk starting at node u. Similarity depends on the random walk strategy (R) that you choose to use. 5.) You take fixed size random walks from each node, to make it's neighborhood sets. 6.) You try to maximize the probability of neighborhood sets given the node. Where, probability of node v occurring in neighborhood set of node u(depends on the similarity b/w embeddings of node u and v) is defined as a soft-max that operates over cosine similarities between embedding of node u and embeddings of all other nodes. [ 26:20 ] 7.) We use negative sampling to approximate the soft-max with difficult to compute denominator. 8.) In negative sampling, nodes are sampled randomly in proportion to their degree. 9.) But, random walks from a node, only encode similarities that correspond to "nearness" of two nodes in the network.. so how to encode other types of similarities ? 10.) Node2Vec :- 11.) We can have 2 different types of exploration:- DFS(macro view) like walk. Or BFS(micro view) like walk. 12.) More precisely, 2nd order random walks (i.e., walk has one unit of memory, which tell where you came from to the current node). Also you need to estimate the distances(from the starting node) of all nodes in vicinity of starting node. And then, you choose to move towards, away, or remain at the same distance from starting node in the ratio 1/p : 1/q : 1 respectively. [ 45 :00 ] 13.) Depending on the values of p and q we can get embeddings where communities of nodes have nearby embeddings in embedding space.. or where nodes with similar structural roles are closer together in embedding space. [ 48:02 ] 14.) These embeddings are very consistent. That is, removing/adding ~50% edges leads to only small drop in accuracy of node classification problem. 15.) The node embeddings can be aggregated over various nodes to classify a set of nodes. 16.) It has been observed Node2Vec performs better on node classification task than link prediction. 17.) Must choose definition of node similarity(i.e., random walk(possibly with p,q) neighborhood, normal neighborhood etc.) that matches your application. 18.) EMBEDDING ENTIRE GRAPHS :- 19.) For molecules.. or where lots of small networks are available. 20.) Just calculate normal node embeddings, and sum all of them. 21.) Make a super-node that connects to a sub-graph. The embedding of super-node, is embedding of sub-graph. 22.) :- These are transformations of random walks that are independent of what nodes actually occur on the path ; but try to capture the local structure around a node in the repetitions that occur on the random walk. So, A->B->C is transformed as 1->2->3 . A->B->A is transformed as 1->2->1 , and so on. The number of possible anonymous walks grows exponentially with number of steps in random walk and the size of the set of nodes reachable from a particular node. 23.) Approach 1 :- Graph feature vector = vector having frequencies of different possible anonymous walks. 24.) How many walks needed to get good estimate [ 1:08:10 ] 25.) Approach 2 :- Learn embeddings of all possible anonymous walks(of fixed length) and concatenate/avg them to get embedding of graph. To learn these anonymous walk embeddings, try to predict the probability distribution over next random walk from a node, using anonymous walk embeddings corresponding to the anon. walks corresponding to previous [delta] random walks. As in [ 1:14:02 ]. 26.) Anonymous walks are based on the assumption that all important properties/structures of the graph can be recovered from the statistics of the random anonymous walks that can be undertaken from various nodes.

@凡贾 5 жыл бұрын

Graph is so general, explainable, convenient and elegant. Also it's pretty difficult. : )

@veselindion2 4 жыл бұрын

Pretty difficult indeed :D

@dharshanak4118 4 жыл бұрын

i never knew Slavoj Zizek used to teach Machine learning when he was young

@MrSouravmondal 4 жыл бұрын

:D :D :D

@TheMateyl 4 жыл бұрын

Thicc Slovenian accent

@dogaarmangil 2 жыл бұрын

Great lecture. 37:50 It is a misnomer to call this process "the finding of similarity between nodes" when in fact what you are doing is finding node groups. In a group, nodes are rarely similar to each other: you won't find 2 CEOs in a company, atom nuclei are surrounded by electrons rather than other nuclei, and so on.

@sm_xiii 4 жыл бұрын

@Machine Learning TV Can you please give some numbering to the lecture videos? So that we can figure out at which order we should watch those.

@Alkis05 4 жыл бұрын

That is assignment number 3: create a program that go through the lectures and sort them based on the download of the transcript.

@vikankshnath8068 4 жыл бұрын

He is a such a great explainer

@MahmoudOuf 5 жыл бұрын

Great lecture, thanks a lot, The article is also so much interesting and infromative.

@gunarto90 5 жыл бұрын

Great lecture and Q&A sessions

@josephzhu5129 Жыл бұрын

Great lecture, he knows how to explain complicated ideas, thanks a lot!

@linfengyang8834 4 жыл бұрын

Great lecture, very informative and clear.

@hossein.amirkhani 5 жыл бұрын

Great lecture. But just one point: in deepwalk part, he says frequently that he wants to maximize the cost function, but really he wants to minimize it. In addition, his explanation of negative sampling is not completely correct. In fact, he should explain it by mentioning that the multi-class problem is changed to a two-class problem.

@LeenaBora 4 жыл бұрын

Hi Hossein, Yes. It was indeed great lecture. I also didn't understand the part of negative sampling. Can you please elaborate it more?

@ghensao4027 2 жыл бұрын

Exactly. log(probability) is negative infinity to 0, taking negative of this would have a range of 0 to infinity. You want to minimize instead

@feishi4932 4 жыл бұрын

Great lecture. It helps me a lot.

@deepbayes6808 5 ай бұрын

32:40 shouldn't this be: log of sum_k instead of sum_k of log?

@tomlarey 4 жыл бұрын

Great lecture, props to Jure Leskovec :)

@soumyadbanik 2 жыл бұрын

What kind of data can have graph representations and what types of data can not be represented as a graph?

@jimmyshenmusic 4 жыл бұрын

This is a great talk.

@muhammadaneeqasif572 6 ай бұрын

amazing great to see some good content again thank yt algorithm keep it up

@ClosiusBeg 3 жыл бұрын

What if make the number of random walks also random? And simulate n ramdom wolks with constant number of walks over m random constant numbers of walks

@yandajiang1744 10 ай бұрын

Awesome explanation

@ShikhaMallick 4 жыл бұрын

Hi, I don't get the intuition behind softmax. Can someone please help me understand?

@muratcan__22 4 жыл бұрын

you can read the wikipedia I think it is well explained there.

@vik24oct1991 4 жыл бұрын

like sigmoid is smooth approximation to sign function or step function , the softmax is smooth approximation for argmax function , like in argmax the output will be 1 for index of input having max value and zero otherwise , in case of softmax the output for index with max value is closer to one but difference is that the second largest or third index will also have non zero based on their values while argmax has one hot output , the softmax has smooth distribution where all index have some value greater than zero based on their inputs but as exponents are used if the max value is far greater than other value , then it basically becomes like the argmax. This is particularly used so that the loss function is differentiable which the argmax is not.

@zhenyusha7123 5 жыл бұрын

where are the left class videos?

@MachineLearningTV 5 жыл бұрын

Watch the new video we uploaded yesterday.

@georgeigwegbe7258 5 жыл бұрын

@@MachineLearningTV Hello @MLTV can we get the whole videos on cs224w?

@0xkellan 5 жыл бұрын

@@georgeigwegbe7258 snap.stanford.edu/class/cs224w-2018/ find "resource" part on this page, you can find lecture notes and videos.

@MrSajjadathar 5 жыл бұрын

which video you uploaded yesterday

@ZephyrineFreiberg 3 жыл бұрын

Is this guy from Switzerland ?

@lloydacquayethompson6789 4 жыл бұрын

This guy is awesome

@ufuoma833 3 жыл бұрын

Surprisingly, I can follow his discussion. Or maybe I'm just tripping.

@arefy100 2 жыл бұрын

Great lecture, thanks a lot

@muhammadibrahimnejadian9265 2 жыл бұрын

Thanks a lot for sharing

@christophchristoph9909 3 жыл бұрын

awesome! thanks for sharing!

@distrologic2925 2 жыл бұрын

This is about learning *from* graphs but how do we learn the actual graph structures just from raw data? This is the question AI should be answering.

@MCRuCr 3 жыл бұрын

Random walking is somehow like the random firing of biological neurons

@o_lakisgavalas 2 жыл бұрын

misleading title as no (generative) representation is discussed. E.g. "Supervised learning on graphs", or "learning features of graphs" would have been more suitable imho.