Subsections


7.5 The Retrieval-by-Reformulation Mechanism

To complement the incompleteness of reuse queries, CodeBroker supports two forms of retrieval-by-reformulation (Section 3.4.3): direct manipulation and query refinement. After examining the components initially delivered by CodeBroker, programmers can either refine the query to improve its completeness and preciseness or directly manipulate the delivered components by removing apparently irrelevant ones.


7.5.1 Direct Manipulation

Direct manipulation of the delivered components serves two purpose: to facilitate the easy choice of components and to augment the discourse model or the user model. Each component in the RCI-display is associated with a float menu, the Skip Components Menu (Figure 7.8), which pops up as the component name is right-clicked.

Figure 7.8: The Skip Components Menu

\includegraphics[width=6cm]{figs/skipMenu.eps}

The Skip Components Menu allows programmers to remove those components that are apparently not related to their current development task so that they can find needed information easier. The first item of the menu is the method component itself; the second, its class; and the third, its package. If programmers want to remove the method or all of the components in the class or the package from the RCI-display, they can choose the appropriate item. Each item has three choices: This Buffer Only, This Session Only, and All Sessions.

When the command This Buffer Only is chosen, the corresponding components are removed from the RCI-display. When the command This Session Only is chosen, the components are not only removed from the RCI-display, they are also added to the discourse model and will not be delivered later in this development session. The discourse model is empty when a development session starts, and it gets incrementally increased by programmers as they interact with the system. When the command All Sessions is chosen, the components are removed from the current RCI-display and are added to the user model. Components added to user models through the Skip Components Menu do not have the use time field (see the last line in Figure 7.6).

With this design, the system can obtain information to evolve discourse models and user models without adding too much extra work for programmers, who also gain the immediate benefit because the choice of needed components becomes easier by removing those apparently irrelevant components. For example, in Figure 7.9(a), in response to the doc comment, CodeBroker delivers some components (No. 1 through No. 4) belonging to the class java.awt.Cardlayout (a GUI class) due to the term ``card.'' However, the current task is not related to the class java.awt.Cardlayout, so the programmer can remove it through the direct manipulation interface. This manipulation brings the needed component randomShuffle, obscured previously, to the salient fourth place (Figure 7.9(b)). The fact that the programmer is not interested in the class java.awt.Cardlayout can be added to the discourse model at the same time if the programmer chooses the This Session Only command, and then no components from the class java.awt.Cardlayout will be delivered later in this development session, even if the programmer uses the word ``card'' in doc comments again, which is quite possible because the programmer is developing programs about card shuffling.

Figure 7.9: The Direct Manipulation interface

\includegraphics[width=.95\linewidth]{figs/directManipulation.eps} $\textstyle \parbox{.8\linewidth}{\small{Parts (a) and (b) show the delivered components
before and after direct manipulation, respectively.}}$


7.5.2 Query Refinement

Query refinement is invoked by choosing the Query Refinement command in the same pop-up menu, or directly typing it in as an Emacs command. A buffer (Figure 7.10) will appear for programmers to start another round of component locating after having refined the automatically extracted reuse queries.

Figure 7.10: The Query Refinement interface

\includegraphics[width=4.65in]{figs/refinement.eps} $\textstyle \parbox{.8\linewidth}{\small{The components in the \textit{RCI-displ...
..., depending on how the programmer wants to restructure
his or her data types.}}$

Programmers can refine the concept query by choosing more appropriate terms, or they can modify the constraint query to make it less restrictive or more restrictive depending on the situation. To narrow the searching range of relevant components, the query refinement interface also provides two additional fields:
Filtered Components:
for specifying classes or packages that are not of interest, and
Interested Components:
for instructing the system to return components from the specified classes or packages only.

Component repository systems could provide a mechanism to let programmers specify either of these fields previous to the initial use of systems. However, programmers who do not know the structure of the repository well enough may not be able specify these two fields. Even a system-guided dialog mechanism to solicit user specifications as explored in the KID system [Nakakoji, 1993], is not suitable for repository systems because component repositories are often very large and it will take a long time to get a meaningful specification. The CodeBroker system does not assume that programmers know the repository structure well enough, and it solicits user input only after its delivered components have acquainted programmers with the structure of the component repository, especially the structure of the part of the repository that might be relevant to the task at hand.

7.5.3 Comparing Retrieval-by-Reformulation and Relevance Feedback

The retrieval-by-reformulation mechanism in CodeBroker is a more comprehensive approach to improving the retrieval performance than the relevance feedback mechanism used in many information retrieval systems [Buckley et al., 1994]. Through the adjustment of terms used in a query by query expansion or other techniques, relevance feedback of information retrieval systems focuses mainly on the improvement of the retrieval process itself. Instead, the focus of retrieval-by-reformulation is to improve the relevance of information to the working context of programmers, not to the query per se. The direct manipulation tries to establish a shared understanding of the context between the component repository system and the programmer. It uses programmers' previous interactions with the system as filters for later deliveries. Although it does not affect what the Fetcher agent returns, it does modify what gets shown. The system also takes advantage of the fact that software components are organized into a hierarchy (packages, classes, and methods) according to their application domains to let programmers limit the retrieval range to their interests.


Ph.D. Dissertation by Yunwen Ye, April 20, 2001, Department of Computer Science, University of Colorado