Hello Dr. Raissi, Great videos by the way! You said that to get the Nearest Neighbors of the "play" vector in the biLM model we would have to push it through the LSTMs and produce the hidden state activations h_k,j for each layer of the LSTM. Then we get the h_k,j for that particular word play in the sentence. But, wouldn't it still need to be trained on a specific task to get the ELMo_k word representations for it and hence the nearest neighbors for the other sentences in the SemCor dataset they used in the paper? Since the ELMo_k^task has to have the trained gamma^task and s^task parameters I would think you would need to fine tune this to the specific task to get these nearest neighbor sentences with the "play" word vector.