Here, regarding the pairwise ranking loss, it’s actually based on the softmax probability: P(y_w is preferred over y_l) = exp(r(x, y_w; theta) - r(x, y_l; theta)) / ( exp(r(x, y_w; theta) - r(x, y_l; theta)) + exp(r(x, y_l; theta) - r(x, y_w; theta)) ) So, it’s standard in for ranking model