Рет қаралды 2,572
As presented in this video, BM25 can return negative values if we have very frequent terms, or a doc with only very frequent terms. One solution to this is to compute IDF by adding 1 before taking the log:
log( (N-n_i+0.5)/(n_i+0.5) + 1)
As is done in Lucene: opensourceconn...
You can see other approaches and formulations of BM25 here:
cs.uwaterloo.c...