Average Precision, Recall and Precision are the main metrics of Information Retrieval (IR) systems performance. Using Mathematical and empirical analysis, in this paper, we show the properties of those metrics. Mathematically, it is demonstrated that all those parameters are very sensitive to relevance judgment which is not usually very reliable. We show that position shifting downwards of the relevant document within the ranked list is followed by Average Precision decreasing. The variation of Average Precision parameter value is highly present in the positions 1 to 10, while from the 10th position on, this variation is negligible. In addition, we try to estimate the regularity of the Average Precision value changes, when we assume that we are switching the arbitrary number of relevance judgments within the existing ranked list, from non-relevant to relevant. Empirically, it is shown hat 6 relevant documents at the end of the 20 document list, have approximately same Average Precision value as a single relevant document at the beginning of this list, while Recall and Precision values increase linearly, regardless of the document position in the list. Also, we show that in the case of Serbian-to-English human translation query followed by English-to-Serbian machine translation, relevance judgment is significantly changed and therefore, all the parameters for measuring the IR system performance are also subject to change.
Keywords
Information Retrieval (IR) Systems, query, Ranking, Precision, Average Precision.
User
Font Size
Information