Subsections


8.1 Evaluating the Retrieval Mechanisms

Retrieval mechanisms play an important role in locating reusable components that match reuse queries. This section presents the evaluation and comparison of two retrieval mechanisms: the Okapi mechanism (Section 6.1.1) based on probabilistic models, and the LSA mechanism (Section 6.1.1), used by the Fetcher agent of CodeBroker for the retrieval of conceptually similar components.

8.1.1 The Concept of Recall and Precision

Conventionally, information retrieval systems are measured by recall and precision. Recall indicates the ability of the system to present all relevant documents, and precision indicates the ability of the system to present only the relevant documents. They can be computed by the following equations:

RECALL = $\displaystyle {\frac{{Number of documents retrieved and relevant}}{{Number of total relevant documents in collection}}}$ (8.1)

PRECISION = $\displaystyle {\frac{{Number of documents retrieved and relevant}}{{Number of documents retrieved}}}$ (8.2)

Recall and precision are not absolute, objective measurements of an information retrieval systems because

  1. the definition of relevance between documents and queries is subjective, and
  2. even if the relevance of a document is unanimously agreed, it may not be of interest to one particular user if that user knows the document.
Nonetheless, when the relevance of documents is agreed, these two measures can be used to compare the performance of two retrieval systems.

8.1.2 Recall and Precision in CodeBroker

The purpose of computing recall and precision of the CodeBroker system is to compare the two retrieval mechanisms to find the better one. The data should not be taken as an absolutely objective measurement of the effectiveness of retrieval mechanisms implemented in the CodeBroker system because sample queries were not random enough.

8.1.2.1 Reuse Queries and Relevant Components

In total, 19 reuse queries were selected. Among them, 10 queries were created by me, 4 queries were chosen from questions asked in newsgroups related to Java programming with phrases such as ``How do I'', ``Can someone tell me how to'' removed, and 5 queries were extracted from the evaluation experiments (Section 8.2) without any words changed.

The relevance of components was determined as follows:

  1. For queries created by me, I chose those components that I thought could be used to implement the task.
  2. For queries from newsgroups, those components suggested by responders were considered relevant in addition to components that I chose from the JGL library that not all Java programmers are using.
  3. For queries from experiments, only those components that were used by the programmers were considered relevant.
The sets of relevant components determined by above criteria are by no means extensive. However, for the purpose of comparison, they can provide sufficient evidence.8.2

8.1.2.2 Computed Recall and Precision

Two retrieval mechanisms--Okapi and LSA--are supported by CodeBroker to locate components that are conceptually similar to queries extracted from doc comments. The two mechanisms can be combined to retrieve components by being given different weights to the similarity value computed by each. I tried the system by using LSA only, Okapi only, and the average of both (Mixed). The average precision values at different recall values are shown in Table 8.1. Figure 8.1 shows the recall-precision curves which are constructed by plotting the precision values against the recall values. Superimposing recall-precision curves of different retrieval mechanisms in the same graph can determine which retrieval mechanism is superior. In general, the curve closest to the upper right-hand corner indicates the best performance [Salton and McGill, 1983].


Table 8.1: Average precision and recall values for LSA, Mixed (average of LSA and Okapi), and Okapi
  Average Precision
Recall LSA Mixed Okapi
0% 35.77% 32.80% 45.82%
10% 31.86% 32.80% 45.82%
20% 30.89% 32.75% 45.82%
30% 25.62% 28.63% 41.20%
40% 20.62% 24.66% 41.01%
50% 20.44% 24.66% 40.74%
60% 13.86% 21.95% 37.46%
70% 13.82% 20.70% 37.46%
80% 13.82% 20.63% 32.71%
90% 12.32% 19.90% 32.19%
100% 12.32% 17.86% 29.43%


Figure 8.1: Recall-precision curves

\includegraphics[width=0.75\linewidth]{figs/trec-eval.eps}

8.1.2.3 Conclusions

It is easy to see from Table 8.1 and Figure 8.1 that Okapi has better retrieval performance than LSA and the mixed one (the average of both). The result is somehow unexpected because other researchers have reported that LSA has better performance than other retrieval methods [Deerwester et al., 1990]. The unexpected low performance of LSA might be caused by insufficient training documents used in CodeBroker because LSA performance is largely dependent on the quality and volume of training documents. Because the evaluation shows that Okapi has the best performance, the default setting of CodeBroker is to use Okapi only. Okapi is also favored over LSA because in Okapi, the system can find the terms that contribute most to the relevance between components and queries, and those terms can be shown to programmers, when they move the mouse cursor over the descriptions of components in the RCI-display, to help them refine their queries (Figure 7.4, Section 7.4.1). In LSA, in contrast, the reason a component is determined to be relevant is obscure because of the semantic space.


Ph.D. Dissertation by Yunwen Ye, April 20, 2001, Department of Computer Science, University of Colorado