Post Processing of Ranking in Search
Jun Xu, Chinese Academy of Sciences
In search there are many situations in which one wants to `twist' the search results given by the basic ranking model, i.e., to conduct post processing of ranking, which we call post ranking. For example, the query is about a hot topic and one wants to boost a webpage about the topic from news channels to the top three positions, no matter how the ranking model does (note that it is usually hard to add such control into a learning to rank model). In another example, a web page is reported to be likely a spam page, and an immediate action is required to demote the position of the page, without change of the ranking model. In practice, post ranking needs to be carried out not only from the viewpoint of enhancing search quality, but also due to operational, commercial, and even political reasons.
Post ranking is normally conducted at web search engines in ad-hoc manners. The key challenge lies in the difficulty of formalizing the problem in a theoretically sound, effective, and efficient way. The original search result might be very different from the rules of post processing, and the rules might also be contradictory to each other. In this talk, I will introduce our recent work on post ranking which formalizes post ranking as a constrained optimization problem. Given the ranking result of a query by the ranking model, re-ranking of the result is achieved by minimizing an objective function under a number of constraints, where the constraints represent the rules for post processing and the objective function represents the trade-off between agreement with the original ranking and satisfaction of the constraints. As the first study, Bradley-Terry model is used for calculating the probability of a ranking list. It realizes the optimization problem as minimizing the negative log conditional probability of the original ranking list and the negative log conditional probability of the constraints given a Bradley-Terry model. Experiments using the LETOR benchmark datasets and a dataset from an enterprise search engine indicate that the proposed method consistently and significantly outperform the baseline methods, indicating that it is better to employ the method in post ranking.