Subsections

7.4 Presenter

Retrieved components are shown to programmers by the Presenter agent in the RCI-display in decreasing order of similarity value.


7.4.1 Layered Information Presentation

Information about components is presented to programmers in different layers of abstraction due to the following two considerations. First, because the RCI-display has only a limited size (otherwise, it would take up too much working space of programmers), presented information should be condensed to accommodate as many components as possible for programmers to choose. Second, the evaluation of the usefulness of information by programmers consists of two stages: information discernment, whereby they grossly determine whether the component is relevant, and detailed evaluation, whereby they study the component thoroughly (see Section 3.4.3). Therefore, information shown in the RCI-display should contain the essential information on components only, and more detailed information should be displayed to programmers when they show interest in a particular component [Ye, 2001b].

The CodeBroker system presents information on components to programmers in three layers. The first layer is the RCI-display in which each component is accompanied with its rank of similarity, its similarity value, its name, and a short description (Figures 7.2 and 7.3). The presentation of the second layer of information is triggered by the mouse movements of programmers. Component names and short descriptions in the RCI-display are mouse-sensitive. When the mouse cursor is moved over the component name, the signature of the component is shown in the mini-buffer (see the last lines in Figures 7.2 and 7.3); and when the mouse cursor is over the short description, terms contributing to the concept similarity between the component and the concept query are shown in the mini-buffer (Figure 7.4) to reveal why this component is retrieved and to help programmers refine their queries if necessary. The third layer of information, the most complete description of a component, is shown in an external HTML browser, such as Netscape Navigator. When the programmer left-clicks on the component, the full Javadoc documentation for the component is displayed in the browser. The HTML tag extracted at the time of indexing is used so that the browser can display the exact place of the component description.

Figure 7.4: Presenting more information triggered by mouse movement

\includegraphics[width=.8\linewidth]{figs/mouseSensitiveKeywords.eps} $\textstyle \parbox{.8\linewidth}{\small{The mini-buffer shows the keywords (ter...
...ity between the first component
and the reuse query, which is not shown here.}}$

7.4.2 Larger Context-Sensitive Presentation

The task model, created by the Listener agent from doc comments and signatures, describes the immediate programming task, namely, the module that the programmer is going to develop. However, programmers often do not give a complete description about their tasks in doc comments. Furthermore, a module is only a part of the whole development task, and the functionality of this module is deeply connected with other modules that have been developed so far. As mentioned in Section 5.2.3, if the component repository system knows what the whole development task is and the larger context under which the current module development is conducted, it can provide more appropriate information on reusable components.

CodeBroker captures this larger context in a discourse model (Section 5.2.3) that represents the previous interactions between the programmer and the system in one development session. The discourse model is used by Presenter as a filter to remove components in which the programmer is not interested in the current development session, although they are retrieved by Fetcher based on incomplete task models.

Java component repositories are organized hierarchically according to packages and classes, and packages and classes are often designed for particular application domains. For most programming tasks, only a part of the repository is involved. CodeBroker uses negative discourse models to capture what part of the repository is not of interest to programmers because discourse models are incrementally evolved by programmers during their interactions with the CodeBroker system, and in many cases it takes less effort for programmers to identify apparent irrelevant components. Section 7.5 explains in detail how the discourse model is incrementally augmented by programmers.

A discourse model in CodeBroker is in the format of a Lisp association list (Figure 7.5). It specifies packages or classes in which the programmer has no interest for the current development session. Before components retrieved by Fetcher are delivered to programmers, Presenter compares each component against the discourse model, and if the component belongs to a class or a package in the discourse model, it is removed.

Figure 7.5: An example discourse model

\includegraphics[width=4.65in]{figs/discourseModel.eps} $\textstyle \parbox{.8\linewidth}{\small{A discourse model is a Lisp list of ite...
...the whole package or the
whole class should not be delivered in this session.}}$

Discourse models also reduce the delivery of irrelevant components caused by polysemy--a difficult problem for any information retrieval systems--by limiting searching domains because polysemous words often have different meanings in totally different domains. For example, if the programming task is to shuffle a deck of cards, the programmer may use the word ``card'' in doc comments. That would make the system deliver components from the class java.awt.CardLayout, a GUI (Graphic User Interface) class in which ``card'' means a graphical element. If the current development project does not involve interface building, this whole class is irrelevant. The programmer can add the class (java.awt.CardLayout) or even the whole package (java.awt) to the discourse model to prevent components belonging to it from being delivered in this development session.


7.4.3 Personalized Component Presentation

The goal of active delivery in CodeBroker is meant to inform programmers of those components that fall into L3 (reuse-by-anticipation) and the area of (L4 - L3) (information islands) in Figure 4.1 (Section 4.1.2). Delivery of components from L2 (reuse-by-recall) and especially from L1 (reuse-by-memory) might be of little use, with the risk of making the unknown, really needed components less salient. Therefore, the system needs to know what components the programmer already knows. CodeBroker uses user models (Section 5.3.1) to represent programmers' knowledge about the component repository to ensure user-specific delivery of components. User models in CodeBroker are both adaptable and adaptive [Thomas, 1996,Fischer and Ye, 2001].

User models in CodeBroker contain a list of components known to the programmer, namely, those components from L1 and L2. An example user model is shown in Figure 7.6. Each item in the list is a package, a class, or a method. Each component retrieved from the component repository is looked up in the user model before it is delivered. If a method component matches a method in the user model, and the user model indicates the programmer has used it more than three times (this number is adjustable by the programmer), the system assumes the programmer knows it already and removes it from the delivery. If the method has no use time, it means the method was added by the programmer, who had claimed he or she had known it very well and did not want it delivered. If the class of the method (which has no method list in the user model), or the package of the method (which has no class list) is included in the user model, the method is removed as well.

Figure 7.6: An example user model

\includegraphics[width=4.5in]{figs/userModel.eps} $\textstyle \parbox{.8\linewidth}{\small{A user model is a Lisp list of items wi...
...e} or
\texttt{method-name} areas mean the whole package or class is included.}}$

7.4.3.1 Adaptable User Models

Programmers can explicitly update their user models through interactions with CodeBroker. If they find a known component is delivered, they can invoke the retrieval-by-reformulation interface (see Section 7.5 for more details) to tell the system that they know the component already.


7.4.3.2 Adaptive User Models

Due to the large volume of components and the constantly evolving nature of repositories, it is a time-consuming task for programmers to maintain their user models. To reduce the difficulty of maintaining user models, user models in CodeBroker are also adaptive. As mentioned in Section 7.2.3, the Listener agent continuously monitors programmers' input in the programming editor. In addition to modeling the programming task, Listener detects what components are used by a programmer and updates user models through the following three heuristic steps.

Figure 7.7: An illustrative program for adaptive user modeling

\includegraphics[width=4.6in]{figs/adaptiveModeling.eps} $\textstyle \parbox{.8\linewidth}{\small{This program is excerpted from a user e...
...s
slightly modified. The line number is added to make the explanation easier.}}$

Step 1: Extracting Method Names

A method invocation in a Java program is followed by a pair of parentheses between which parameters are passed. When a left parenthesis ( is entered in the editor, Listener scans backward to extract the identifier preceding the left bracket. For example, in Figure 7.7, when the left parenthesis (where the cursor is placed)7.1 is entered, Listener extracts the identifier addElement. After that, Listener scans back further to determine if this identifier is a legal method name using the following rules.
  1. If the identifier is a Java keyword, such as the for in line 8, it is not a method name.
  2. If the identifier follows another word or a right square bracket, it is not a method invocation either; instead, it is the name of a new method developed by the programmer, such as the findDuplicates in line 5.
  3. If the identifier follows a dot (.), it is a method invocation, and the identifier is a legal method name. Listener scans further back to extract all characters preceding the dot until a white space is met. If these characters constitute a legal class name in Java, the method is a class method instead of an object method; otherwise, these characters are recognized by Listener as a variable name which will be used to find what class the method belongs to (described in Step 2).
  4. If the identifier follows termination characters of a Java statement, such as a semicolon (;), a left brace ({), or a right parenthesis, it is a class method name.

Step 2: Finding the Class of an Object Method

The class name of a variable can be extracted from the variable declaration statement. A variable declaration statement is recognized by Listener based on the following BNF syntax of Java [Gosling et al., 1996]:
VariableDeclarationStatement := LocalVariableDeclaration; 
LocalVariableDeclaration := final_opt Type VariableDeclarators
VariableDeclarators := VariableDeclarator |                        
                       VariableDeclarators, VariableDeclarator
VariableDeclarator := VariableDeclaratorId |
                      VariableDeclaratorId = VariableInitializer
VariableDeclaratorId := Identifier | VariableDeclaratorId [ ]
VariableInitializer := Expression | ArrayInitializer
Type := PrimitiveType | ReferenceType
ReferenceType := TypeName | Type []
TypeName := Identifier | PackageOrTypeName.Identifier
PackageOrTypeName := Identifier | PackageOrTypeName.Identifier
Each time a new variable is declared by a programmer, the Listener stores the variable name and its class name in an association list. When an object method name and its variable name are extracted, Listener looks up the variable-class association list to find to which class the method belongs. For example, in Figure 7.7, the variable name for the method addElement (line 9) is tmpVec, which is declared as a Vector (line 6).

Step 3: Finding the Package of a Class

Because not all class names in a Java program include its package name, and class names are not unique, Listener needs to find to which package the class belongs. Listener first finds all packages that include the class from the list of indexed components, created by CodeIndexer at the time of indexing. If only one package is found, it is assumed to be the package of the class. If several packages are found, Listener will pick the package imported by the programmer in the package import statements (lines 1 and 2 in Figure 7.7). Whenever a package import statement is entered, Listener recognizes it based on its BNF syntax shown below:
  ImportDeclaration := import TypeName; | 
                       import PackageOrTypeName.*;
and creates a list of imported packages and classes. If the package of a class is unique in the imported package list, then the imported package becomes the package of the class; otherwise, the programmer has probably made a mistake,7.2and the extracted method is ignored.

To make it easier to understand here, three steps were described in the reverse order of their execution. Listener creates the list of imported packages first, followed by creating the variable-class list, followed by extracting method names. When Listener successfully extracts the method name and determines its class name and package name, it adds the component, including its class and package, with the current time as use time, to the user model. Listener adds only methods to the user model; it does not add a class or a package because the use of a class or a package does not mean that the developer knows the whole class or package.

7.4.3.3 Initializing User Models

Initial user models are created by analyzing the Java programs that programmers have written so far. CodeBroker analyzes Java programs to extract each method used in the same way as adaptive user modeling, except that it is a batch process.


Ph.D. Dissertation by Yunwen Ye, April 20, 2001, Department of Computer Science, University of Colorado