ElasticSearch of Bender

查询 review 的需求

  1. match title OR text. title has higher score,
  2. filter by ratings,
  3. consider positive factor (comments_count, useful_count), and negative factor (useless_count),
  4. (futher) time decay

下面是带着需求看文档的杂乱记录

match title and content, give title a boost of 2

{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "title": {
              "query": "吴京",
              "boost": 2
            }
          }
        },
        {
          "match": {
              "text": "吴京"
          }
        }
      ]
    }
  }
}

使用 field_value_factor 使 n_comments 影响 score

{
  "query": {
    "function_score": {
      "query": {
        "multi_match": {
          "query": "吴京",
          "fields": ["title", "content"]
        }
      },
      "field_value_factor": {
        "field": "n_comments"
      }
    }
  }
}

问题:

  1. 在使用 function_score 时,怎样结合 title boost?
  2. 我想对多个 field 联合使用 field_value_factor 怎么办?
  3. 我想让 n_useless 对 score 产生 negtive 的影响怎么办?

1. 结合 title boost

把 function_score 的 query 改成并列的 bool.should 就可以了:

{
  "query": {
    "function_score": {
      "query": {
        "bool": {
          "should": [
            {
              "match": {
                "title": {
                  "query": "吴京",
                  "boost": 2
                }
              }
            },
            {
              "match": {
                "content": "吴京"
              }
            }
          ]
        }
      },
      "field_value_factor": {
        "field": "n_comments"
      }
    }
  }
}

2. 使用 functions 组合多个 factor

{
  "query": {
    "function_score": {
      "query": {
          ...
      },
      "functions": [
        {
          "field_value_factor": {
            "field": "n_comments",
            "modifier": "log1p"
          }
        },
        {
          "field_value_factor": {
            "field": "n_useful",
            "modifier": "log1p"
          }
        }
      ]
    }
  }
}

3. 通过 function modifier / weight 让 n_useless 产生 negative 影响

既然正向的 modifier 是 \(log(1+factor*n_useful)\) ,那么反向的可以简单粗暴使用 \(\frac 1 {log(1+factor*n_useless)}\) ,这样当有用、反对票数一样多时被中和成 1 。

如果之后对两者权值有不同的考虑,可以再分别调整 factor 。

遇到一个坑:

  1. es 2.3 里,field_value_factor 的 log1p 是 log10(1+x) ,而 script_score 的 log1p 是 ln(1+x) ,两个地方矛盾了。
  2. 然后 python math lib 里的 log 是 ln, log10 才是 base 10 .
  3. google plotting 的 log 是 log10,自然对数是 ln

functions 里可以放一个 field_value_factor 处理 n_comments ,然后并列一个 script_score 来组合处理 n_usefuln_useless

剩下的就是调参数的事情了。

一个粗糙例子:

"functions": [
  {
    "field_value_factor": {
      "field": "n_comments",
      "modifier": "ln2p"
    }
  },
  {
    "script_score": {
      "script": "n_useful = doc['n_useful'].value; n_useless = doc['n_useless'].value;\nif(n_useful > n_useless) return log(E+n_useful-n_useless);\nreturn 1/log(E+n_useless-n_useful);"
    }
  }
]

上面查询 3. 的explain 结果

explain 真是 weapons of mass destruction

查询:

{
  "explain": true,
  "query": {
    "function_score": {
      "query": {
        "bool": {
          "should": [
            {
              "match": {
                "title": {
                  "query": "吴京",
                  "boost": 2
                }
              }
            },
            {
              "match": {
                "content": "吴京"
              }
            }
          ]
        }
      },
      "functions": [
        {
          "field_value_factor": {
            "field": "n_comments",
            "modifier": "log1p"
          }
        },
        {
          "field_value_factor": {
            "field": "n_useful",
            "modifier": "log1p"
          }
        },
        {
          "field_value_factor": {
            "field": "n_useless",
            "modifier": "log1p"
          }
        }
      ]
    }
  }
}

结果:

{
    "_shard": 3,
    "_node": "O-3ZUGLTTwKtNlsTYDj17Q",
    "_index": "dae_bender",
    "_type": "movie_review",
    "_id": "AV27Z9ykkc_UHefLmOTb",
    "_score": 28.713114,
    "_source": {
        "parent_id": 26363254,
        "n_comments": 828,
        "text": "很差,很失望。感觉看到了终结者里斯瓦辛格拿着机关枪突突突,突突突,突突突。。。。没有我们想看到那种特种兵应该有的感觉。剧情像是几十年前那种美国电影。抱歉我不专业不能说出具体是哪种电影。译制片时代的那种吧。装备也奇怪。黑人加ak一般在电影里都是被秒的。剧情没有什么逻辑性,导弹发射那段硬加持了爱国情怀buff。总之是差。。还有个事,这电影不应该评到7.4分。水军什么的真的没必要,每个人心中都有数。7.4有多少水分?爱国要给一点分,吴京要给一点分,宣传要给一点分,那电影呢?真正给我们看的电影是几分?希望豆瓣真的能扛起影评大旗,毕竟我们这些并不专业,只是普通的观众信任豆瓣。不要这么多套路。我也愿意相信,这些水分和水军和吴京都没有关系。他不知道。所以我还是一如既往的喜欢他。但是就电影,我给一颗星。",
        "title": "我喜欢吴京,但是这电影我要给1星。",
        "rating": 1,
        "timestamp": "2017-07-29T09:05:07+08",
        "n_useful": 392,
        "n_useless": 1280,
        "id": 8703258
    },
    "_explanation": {
        "value": 28.713116,
        "description": "sum of:",
        "details": [
            {
                "value": 28.713116,
                "description": "function score, product of:",
                "details": [
                    {
                        "value": 1.220278,
                        "description": "product of:",
                        "details": [
                            {
                                "value": 2.440556,
                                "description": "sum of:",
                                "details": [
                                    {
                                        "value": 2.440556,
                                        "description": "sum of:",
                                        "details": [
                                            {
                                                "value": 1.6378679,
                                                "description": "sum of:",
                                                "details": [
                                                    {
                                                        "value": 0.81893396,
                                                        "description": "weight(title:吴 in 49) [PerFieldSimilarity], result of:",
                                                        "details": [
                                                            {
                                                                "value": 0.81893396,
                                                                "description": "score(doc=49,freq=1.0), product of:",
                                                                "details": [
                                                                    {
                                                                        "value": 0.4536995,
                                                                        "description": "queryWeight, product of:",
                                                                        "details": [
                                                                            {
                                                                                "value": 2,
                                                                                "description": "boost",
                                                                                "details": []
                                                                            },
                                                                            {
                                                                                "value": 7.220056,
                                                                                "description": "idf(docFreq=66, maxDocs=33683)",
                                                                                "details": []
                                                                            },
                                                                            {
                                                                                "value": 0.03141939,
                                                                                "description": "queryNorm",
                                                                                "details": []
                                                                            }
                                                                        ]
                                                                    },
                                                                    {
                                                                        "value": 1.805014,
                                                                        "description": "fieldWeight in 49, product of:",
                                                                        "details": [
                                                                            {
                                                                                "value": 1,
                                                                                "description": "tf(freq=1.0), with freq of:",
                                                                                "details": [
                                                                                    {
                                                                                        "value": 1,
                                                                                        "description": "termFreq=1.0",
                                                                                        "details": []
                                                                                    }
                                                                                ]
                                                                            },
                                                                            {
                                                                                "value": 7.220056,
                                                                                "description": "idf(docFreq=66, maxDocs=33683)",
                                                                                "details": []
                                                                            },
                                                                            {
                                                                                "value": 0.25,
                                                                                "description": "fieldNorm(doc=49)",
                                                                                "details": []
                                                                            }
                                                                        ]
                                                                    }
                                                                ]
                                                            }
                                                        ]
                                                    },
                                                    {
                                                        "value": 0.81893396,
                                                        "description": "weight(title:吴 京 in 49) [PerFieldSimilarity], result of:",
                                                        "details": [
                                                            {
                                                                "value": 0.81893396,
                                                                "description": "score(doc=49,freq=1.0), product of:",
                                                                "details": [
                                                                    {
                                                                        "value": 0.4536995,
                                                                        "description": "queryWeight, product of:",
                                                                        "details": [
                                                                            {
                                                                                "value": 2,
                                                                                "description": "boost",
                                                                                "details": []
                                                                            },
                                                                            {
                                                                                "value": 7.220056,
                                                                                "description": "idf(docFreq=66, maxDocs=33683)",
                                                                                "details": []
                                                                            },
                                                                            {
                                                                                "value": 0.03141939,
                                                                                "description": "queryNorm",
                                                                                "details": []
                                                                            }
                                                                        ]
                                                                    },
                                                                    {
                                                                        "value": 1.805014,
                                                                        "description": "fieldWeight in 49, product of:",
                                                                        "details": [
                                                                            {
                                                                                "value": 1,
                                                                                "description": "tf(freq=1.0), with freq of:",
                                                                                "details": [
                                                                                    {
                                                                                        "value": 1,
                                                                                        "description": "termFreq=1.0",
                                                                                        "details": []
                                                                                    }
                                                                                ]
                                                                            },
                                                                            {
                                                                                "value": 7.220056,
                                                                                "description": "idf(docFreq=66, maxDocs=33683)",
                                                                                "details": []
                                                                            },
                                                                            {
                                                                                "value": 0.25,
                                                                                "description": "fieldNorm(doc=49)",
                                                                                "details": []
                                                                            }
                                                                        ]
                                                                    }
                                                                ]
                                                            }
                                                        ]
                                                    }
                                                ]
                                            },
                                            {
                                                "value": 0.8026881,
                                                "description": "weight(title:京 in 49) [PerFieldSimilarity], result of:",
                                                "details": [
                                                    {
                                                        "value": 0.8026881,
                                                        "description": "score(doc=49,freq=1.0), product of:",
                                                        "details": [
                                                            {
                                                                "value": 0.44917676,
                                                                "description": "queryWeight, product of:",
                                                                "details": [
                                                                    {
                                                                        "value": 2,
                                                                        "description": "boost",
                                                                        "details": []
                                                                    },
                                                                    {
                                                                        "value": 7.1480823,
                                                                        "description": "idf(docFreq=71, maxDocs=33683)",
                                                                        "details": []
                                                                    },
                                                                    {
                                                                        "value": 0.03141939,
                                                                        "description": "queryNorm",
                                                                        "details": []
                                                                    }
                                                                ]
                                                            },
                                                            {
                                                                "value": 1.7870206,
                                                                "description": "fieldWeight in 49, product of:",
                                                                "details": [
                                                                    {
                                                                        "value": 1,
                                                                        "description": "tf(freq=1.0), with freq of:",
                                                                        "details": [
                                                                            {
                                                                                "value": 1,
                                                                                "description": "termFreq=1.0",
                                                                                "details": []
                                                                            }
                                                                        ]
                                                                    },
                                                                    {
                                                                        "value": 7.1480823,
                                                                        "description": "idf(docFreq=71, maxDocs=33683)",
                                                                        "details": []
                                                                    },
                                                                    {
                                                                        "value": 0.25,
                                                                        "description": "fieldNorm(doc=49)",
                                                                        "details": []
                                                                    }
                                                                ]
                                                            }
                                                        ]
                                                    }
                                                ]
                                            }
                                        ]
                                    }
                                ]
                            },
                            {
                                "value": 0.5,
                                "description": "coord(1/2)",
                                "details": []
                            }
                        ]
                    },
                    {
                        "value": 23.529978,
                        "description": "min of:",
                        "details": [
                            {
                                "value": 23.529978,
                                "description": "function score, score mode [multiply]",
                                "details": [
                                    {
                                        "value": 2.9185545,
                                        "description": "function score, product of:",
                                        "details": [
                                            {
                                                "value": 1,
                                                "description": "match filter: *:*",
                                                "details": []
                                            },
                                            {
                                                "value": 2.9185545,
                                                "description": "field value function: log1p(doc['n_comments'].value * factor=1.0)",
                                                "details": []
                                            }
                                        ]
                                    },
                                    {
                                        "value": 2.5943925,
                                        "description": "function score, product of:",
                                        "details": [
                                            {
                                                "value": 1,
                                                "description": "match filter: *:*",
                                                "details": []
                                            },
                                            {
                                                "value": 2.5943925,
                                                "description": "field value function: log1p(doc['n_useful'].value * factor=1.0)",
                                                "details": []
                                            }
                                        ]
                                    },
                                    {
                                        "value": 3.1075492,
                                        "description": "function score, product of:",
                                        "details": [
                                            {
                                                "value": 1,
                                                "description": "match filter: *:*",
                                                "details": []
                                            },
                                            {
                                                "value": 3.1075492,
                                                "description": "field value function: log1p(doc['n_useless'].value * factor=1.0)",
                                                "details": []
                                            }
                                        ]
                                    }
                                ]
                            },
                            {
                                "value": 3.4028235e+38,
                                "description": "maxBoost",
                                "details": []
                            }
                        ]
                    }
                ]
            },
            {  // 这里是匹配 _type:movie_review 的。在单 _type 查询时,对最终的 score 没有影响。
                "value": 0,
                "description": "match on required clause, product of:",
                "details": [
                    {
                        "value": 0,
                        "description": "# clause",
                        "details": []
                    },
                    {
                        "value": 0.03141939,
                        "description": "_type:movie_review, product of:",
                        "details": [
                            {
                                "value": 1,
                                "description": "boost",
                                "details": []
                            },
                            {
                                "value": 0.03141939,
                                "description": "queryNorm",
                                "details": []
                            }
                        ]
                    }
                ]
            }
        ]
    }
},