Subsections
8.1 Evaluating the Retrieval Mechanisms
Retrieval mechanisms play an important role in locating reusable
components that match reuse queries. This section presents the evaluation and
comparison of two retrieval mechanisms: the Okapi mechanism (Section 6.1.1) based on
probabilistic models, and the LSA mechanism (Section 6.1.1), used by the Fetcher agent of
CodeBroker for the retrieval of conceptually similar components.
Conventionally, information retrieval systems are measured by
recall and precision. Recall indicates the ability of the
system to present all relevant documents, and precision indicates the
ability of the system to present only the relevant documents. They can
be computed by the following equations:
RECALL =  |
(8.1) |
PRECISION =  |
(8.2) |
Recall and precision are not absolute, objective measurements of an
information retrieval systems because
- the definition of relevance between documents and queries is
subjective, and
- even if the relevance of a document is unanimously agreed, it
may not be of interest to one particular user if that user knows the document.
Nonetheless, when the relevance of documents is agreed, these two
measures can be used to compare the performance of two retrieval systems.
The purpose of computing recall and precision of the CodeBroker system is to
compare the two retrieval
mechanisms to find the better one.
The data should not be taken as an absolutely
objective measurement of the effectiveness of retrieval mechanisms
implemented in the CodeBroker system because sample queries were not random
enough.
In total, 19 reuse queries were selected. Among them, 10 queries were
created by me, 4 queries were chosen from questions asked in
newsgroups related to Java programming with phrases such as ``How do
I'', ``Can someone tell me how to'' removed, and 5 queries were
extracted from the
evaluation experiments (Section 8.2) without any words changed.
The relevance of components was determined as follows:
- For queries created by me, I chose those components that I thought
could be used to implement the task.
- For queries from newsgroups, those components suggested
by responders were considered relevant in addition to components that
I chose from the JGL library that not all Java programmers are using.
- For queries from experiments, only those components that were used by
the programmers were considered relevant.
The sets of relevant components determined by above criteria are by no
means extensive. However, for the purpose of comparison, they can
provide sufficient evidence.8.2
Two retrieval mechanisms--Okapi and LSA--are supported by CodeBroker to
locate components that are conceptually similar to queries extracted
from doc comments. The two mechanisms can be
combined to retrieve components by being given different weights to the similarity
value computed by each. I tried the system by using LSA only, Okapi
only, and the average of both (Mixed). The average precision values
at different recall values are shown in
Table 8.1. Figure 8.1 shows the
recall-precision curves which are constructed by plotting the precision values
against the recall values. Superimposing recall-precision curves of
different retrieval mechanisms in the same graph can determine which
retrieval mechanism is superior. In general, the curve closest to the upper
right-hand corner indicates the best performance [Salton and McGill, 1983].
Table 8.1:
Average precision and recall values for LSA, Mixed (average of LSA and
Okapi), and Okapi
| |
Average Precision |
| Recall |
LSA |
Mixed |
Okapi |
| 0% |
35.77% |
32.80% |
45.82% |
| 10% |
31.86% |
32.80% |
45.82% |
| 20% |
30.89% |
32.75% |
45.82% |
| 30% |
25.62% |
28.63% |
41.20% |
| 40% |
20.62% |
24.66% |
41.01% |
| 50% |
20.44% |
24.66% |
40.74% |
| 60% |
13.86% |
21.95% |
37.46% |
| 70% |
13.82% |
20.70% |
37.46% |
| 80% |
13.82% |
20.63% |
32.71% |
| 90% |
12.32% |
19.90% |
32.19% |
| 100% |
12.32% |
17.86% |
29.43% |
|
Figure 8.1:
Recall-precision curves
|
|
It is easy to see from Table 8.1 and Figure 8.1
that Okapi has better retrieval performance than
LSA and the mixed one (the average of both).
The result is somehow unexpected because other researchers have
reported that LSA has better performance than other retrieval
methods [Deerwester et al., 1990]. The unexpected low performance of
LSA might be caused by insufficient training documents used in CodeBroker because LSA
performance is largely dependent on the quality and volume of training
documents.
Because the evaluation shows that Okapi has the best performance,
the default setting of CodeBroker is to use Okapi only. Okapi is also
favored over LSA because in Okapi, the system can find the terms
that contribute most to the relevance between components and queries,
and those terms can be shown to programmers, when they move the mouse cursor
over the descriptions of components in the RCI-display, to help them refine
their queries (Figure 7.4,
Section 7.4.1). In LSA, in contrast, the reason a component is
determined
to be relevant is obscure because of the semantic space.
Ph.D. Dissertation by Yunwen Ye, April 20, 2001, Department of Computer Science, University of Colorado