NOTES/ Important Points :- 1.) Graphlets are undirected motifs. 2.) The purpose of graph representation learning is to allow automatic feature engineering by embedding each node into d-dimensional space such that similarities in the network(*) correspond to closeness(cosine/Euclidean distance) in the embedding space. 3.) Images==Grid graphs ; Text, audio == Chain graphs . Adjacency matrix representation of a graph is not permutation(of rows/columns) invariant. But the graph is. 4.) *We define this similarity in the network on the basis of random walks. Similarity(u,v) = probability of visiting node v on a random walk starting at node u. Similarity depends on the random walk strategy (R) that you choose to use. 5.) You take fixed size random walks from each node, to make it's neighborhood sets. 6.) You try to maximize the probability of neighborhood sets given the node. Where, probability of node v occurring in neighborhood set of node u(depends on the similarity b/w embeddings of node u and v) is defined as a soft-max that operates over cosine similarities between embedding of node u and embeddings of all other nodes. [ 26:20 ] 7.) We use negative sampling to approximate the soft-max with difficult to compute denominator. 8.) In negative sampling, nodes are sampled randomly in proportion to their degree. 9.) But, random walks from a node, only encode similarities that correspond to "nearness" of two nodes in the network.. so how to encode other types of similarities ? 10.) Node2Vec :- 11.) We can have 2 different types of exploration:- DFS(macro view) like walk. Or BFS(micro view) like walk. 12.) More precisely, 2nd order random walks (i.e., walk has one unit of memory, which tell where you came from to the current node). Also you need to estimate the distances(from the starting node) of all nodes in vicinity of starting node. And then, you choose to move towards, away, or remain at the same distance from starting node in the ratio 1/p : 1/q : 1 respectively. [ 45 :00 ] 13.) Depending on the values of p and q we can get embeddings where communities of nodes have nearby embeddings in embedding space.. or where nodes with similar structural roles are closer together in embedding space. [ 48:02 ] 14.) These embeddings are very consistent. That is, removing/adding ~50% edges leads to only small drop in accuracy of node classification problem. 15.) The node embeddings can be aggregated over various nodes to classify a set of nodes. 16.) It has been observed Node2Vec performs better on node classification task than link prediction. 17.) Must choose definition of node similarity(i.e., random walk(possibly with p,q) neighborhood, normal neighborhood etc.) that matches your application. 18.) EMBEDDING ENTIRE GRAPHS :- 19.) For molecules.. or where lots of small networks are available. 20.) Just calculate normal node embeddings, and sum all of them. 21.) Make a super-node that connects to a sub-graph. The embedding of super-node, is embedding of sub-graph. 22.) :- These are transformations of random walks that are independent of what nodes actually occur on the path ; but try to capture the local structure around a node in the repetitions that occur on the random walk. So, A->B->C is transformed as 1->2->3 . A->B->A is transformed as 1->2->1 , and so on. The number of possible anonymous walks grows exponentially with number of steps in random walk and the size of the set of nodes reachable from a particular node. 23.) Approach 1 :- Graph feature vector = vector having frequencies of different possible anonymous walks. 24.) How many walks needed to get good estimate [ 1:08:10 ] 25.) Approach 2 :- Learn embeddings of all possible anonymous walks(of fixed length) and concatenate/avg them to get embedding of graph. To learn these anonymous walk embeddings, try to predict the probability distribution over next random walk from a node, using anonymous walk embeddings corresponding to the anon. walks corresponding to previous [delta] random walks. As in [ 1:14:02 ]. 26.) Anonymous walks are based on the assumption that all important properties/structures of the graph can be recovered from the statistics of the random anonymous walks that can be undertaken from various nodes.
@凡贾5 жыл бұрын
Graph is so general, explainable, convenient and elegant. Also it's pretty difficult. : )
@veselindion24 жыл бұрын
Pretty difficult indeed :D
@dharshanak41184 жыл бұрын
i never knew Slavoj Zizek used to teach Machine learning when he was young
@MrSouravmondal4 жыл бұрын
:D :D :D
@TheMateyl4 жыл бұрын
Thicc Slovenian accent
@dogaarmangil2 жыл бұрын
Great lecture. 37:50 It is a misnomer to call this process "the finding of similarity between nodes" when in fact what you are doing is finding node groups. In a group, nodes are rarely similar to each other: you won't find 2 CEOs in a company, atom nuclei are surrounded by electrons rather than other nuclei, and so on.
@sm_xiii4 жыл бұрын
@Machine Learning TV Can you please give some numbering to the lecture videos? So that we can figure out at which order we should watch those.
@Alkis054 жыл бұрын
That is assignment number 3: create a program that go through the lectures and sort them based on the download of the transcript.
@vikankshnath80684 жыл бұрын
He is a such a great explainer
@MahmoudOuf5 жыл бұрын
Great lecture, thanks a lot, The article is also so much interesting and infromative.
@gunarto905 жыл бұрын
Great lecture and Q&A sessions
@josephzhu5129 Жыл бұрын
Great lecture, he knows how to explain complicated ideas, thanks a lot!
@linfengyang88344 жыл бұрын
Great lecture, very informative and clear.
@hossein.amirkhani5 жыл бұрын
Great lecture. But just one point: in deepwalk part, he says frequently that he wants to maximize the cost function, but really he wants to minimize it. In addition, his explanation of negative sampling is not completely correct. In fact, he should explain it by mentioning that the multi-class problem is changed to a two-class problem.
@LeenaBora4 жыл бұрын
Hi Hossein, Yes. It was indeed great lecture. I also didn't understand the part of negative sampling. Can you please elaborate it more?
@ghensao40272 жыл бұрын
Exactly. log(probability) is negative infinity to 0, taking negative of this would have a range of 0 to infinity. You want to minimize instead
@feishi49324 жыл бұрын
Great lecture. It helps me a lot.
@deepbayes68085 ай бұрын
32:40 shouldn't this be: log of sum_k instead of sum_k of log?
@tomlarey4 жыл бұрын
Great lecture, props to Jure Leskovec :)
@soumyadbanik2 жыл бұрын
What kind of data can have graph representations and what types of data can not be represented as a graph?
@jimmyshenmusic4 жыл бұрын
This is a great talk.
@muhammadaneeqasif5726 ай бұрын
amazing great to see some good content again thank yt algorithm keep it up
@ClosiusBeg3 жыл бұрын
What if make the number of random walks also random? And simulate n ramdom wolks with constant number of walks over m random constant numbers of walks
@yandajiang174410 ай бұрын
Awesome explanation
@ShikhaMallick4 жыл бұрын
Hi, I don't get the intuition behind softmax. Can someone please help me understand?
@muratcan__224 жыл бұрын
you can read the wikipedia I think it is well explained there.
@vik24oct19914 жыл бұрын
like sigmoid is smooth approximation to sign function or step function , the softmax is smooth approximation for argmax function , like in argmax the output will be 1 for index of input having max value and zero otherwise , in case of softmax the output for index with max value is closer to one but difference is that the second largest or third index will also have non zero based on their values while argmax has one hot output , the softmax has smooth distribution where all index have some value greater than zero based on their inputs but as exponents are used if the max value is far greater than other value , then it basically becomes like the argmax. This is particularly used so that the loss function is differentiable which the argmax is not.
@zhenyusha71235 жыл бұрын
where are the left class videos?
@MachineLearningTV5 жыл бұрын
Watch the new video we uploaded yesterday.
@georgeigwegbe72585 жыл бұрын
@@MachineLearningTV Hello @MLTV can we get the whole videos on cs224w?
@0xkellan5 жыл бұрын
@@georgeigwegbe7258 snap.stanford.edu/class/cs224w-2018/ find "resource" part on this page, you can find lecture notes and videos.
@MrSajjadathar5 жыл бұрын
which video you uploaded yesterday
@ZephyrineFreiberg3 жыл бұрын
Is this guy from Switzerland ?
@lloydacquayethompson67894 жыл бұрын
This guy is awesome
@ufuoma8333 жыл бұрын
Surprisingly, I can follow his discussion. Or maybe I'm just tripping.
@arefy1002 жыл бұрын
Great lecture, thanks a lot
@muhammadibrahimnejadian92652 жыл бұрын
Thanks a lot for sharing
@christophchristoph99093 жыл бұрын
awesome! thanks for sharing!
@distrologic29252 жыл бұрын
This is about learning *from* graphs but how do we learn the actual graph structures just from raw data? This is the question AI should be answering.
@MCRuCr3 жыл бұрын
Random walking is somehow like the random firing of biological neurons
@o_lakisgavalas2 жыл бұрын
misleading title as no (generative) representation is discussed. E.g. "Supervised learning on graphs", or "learning features of graphs" would have been more suitable imho.