Subsections


8.2 Empirical Evaluations of the CodeBroker System

To understand the effectiveness of the CodeBroker system in supporting reuse-within-development, formal evaluation experiments have been conducted. The structure of the experiments is described in this section, and findings and conclusions are presented in the next four sections.

8.2.1 Subjects of Experiments

Subjects were recruited from undergraduate and graduate students from the Computer Science Department. As mentioned in Chapter 2, programming involves a wide range of knowledge. Because the design goal of CodeBroker is to provide knowledge about reusable components to programmers, to minimize other factors that contribute to the difficulty of programming in general, only students who already had extensive programming knowledge and experience were recruited as subjects. Because CodeBroker is developed as an add-on to the existing programming environment, Emacs in Unix, a basic working knowledge of Emacs and Unix was also required so that subjects could easily learn the operations of the system and experiments could be focused on the support provided by the system.

Five subjects voluntarily participated in the evaluation experiments. All but one programmer had extensive knowledge in other programming languages, such as C and C++. Two had worked as professional programmers. Three were regular contributors to several Open Source projects. Their expertise in Java programming varied, ranging from medium to expert level. All of them knew the syntax of Java very well; the difference of their expertise came from the range of reusable components (classes and methods in API libraries) they knew. Table 8.2 summarizes their background knowledge about programming in general and Java in particular. In that table, small (abbreviated as S) projects refer to projects similar to semester projects, requiring 1 or 2 man-months; medium projects (abbreviated as M) refer to projects requiring 3 to 5 man-months; large projects (abbreviated as L) refer to projects requiring more than 6 man-months.


Table 8.2: Programming knowledge and expertise of subjects
=
Subject S1 S2 S3 S4 S5
Years of general programming 3 or 4 5 or 6 8 10+ 10+
Programming experience in general (measured in number of projects) 3S, 1L 10S 7M, 1L 10+SM, 2L 10+L
Current major programming language C++ Java Java Java Java
Years of Java programming 10 months 4 4 7 5
Self-evaluation of Java expertise (1: Beginner - 10: Expert) 4 7 7 or 8 10 7
Recent frequency of programming in Java Not active for 3 months. Not active for 3 months. Every week Every day Not active for months


8.2.2 Structure of Experiments

Subjects were asked to implement two or three programming tasks with the CodeBroker system. Days before the experiments, CodeBroker created an initial user model (see Section 7.4.3 for the method, and Figure 7.6 for an exmaple) for each subject by scanning programs the subject had written recently. Because many of the programs the subjects had written were for companies and thus were not available, no user models were complete. Nonetheless, the number and range of components included in the user models were consistent with the subjects' self-evaluations of Java expertise.

After analyzing their user models, the subjects were assigned tasks whose implementation involved components they probably had not known well enough. In the beginning of the experiments, the main functionality of the CodeBroker system was briefly introduced with a running example after the subjects had signed the Informed Consent Form for participating in the experiments. This took about 5 minutes. Previous to the implementation of each task, programmers were asked to describe briefly how they would implement the task, and after each task had been finished, simple questions such as ``Did you know this component before?'' and ``Why did you choose this component?'' were asked regarding their programming activities. At the end of the experiments, a post-experiment interview8.3 was conducted to capture the subjects' background knowledge of programming and their subjective evaluation of the CodeBroker system based on their use.

Programmers were told to do programming in their normal way but to take advantage of the support provided by CodeBroker. They could use books, the Java API Documentation Browser, and all other support as they usually did. Two subjects actually brought and consulted their favorite ``Java in a Nutshell'' [Flanagan, 1997]. As an observer, I occasionally answered their questions about the operation of the CodeBroker system.

The CodeBroker system used the following default settings in the experiments:

  1. It adopted the Okapi retrieval mechanism and the signature-matching mechanism, with each assigned the weight of 0.5.
  2. A component was decided to be known to the programmer if the user model indicated the programmer had used it three times.
  3. In the first four experiments with the first two subjects, the system delivered 14 components in the RCI-display because the experiments were conducted on a laptop with a small monitor. In all other experiments that were conducted on a desktop with a large monitor, the system delivered 20 components.
  4. The component repository contained 673 classes and 7,338 methods from both the Core API library of Java 1.1.8 and JGL 3.0.


8.2.3 Programming Tasks

Because subjects were volunteers, large and time-consuming tasks were not very suitable. The experiments used programming tasks similar to the typical assignments of a programming language course, which could be implemented with several methods in about 20 to 60 minutes. The following tasks were used in the experiments.
Task 1

You are asked to implement a program that selectively backs up files based on a list that holds all files needed to be backed up. The list looks like:
/usr/java/private/important/letter1
/usr/joe/project/backup/getAllFiles.java
and the file name is passed as a parameter in the command line.

It requires that the back-up program retain the same hierarchical structure when the files are backed up in another directory $BACKUPDIR, which is passed as the second parameter in the command line; for example, getALLFiles.java should also be found under the directory $BACKUPDIR/usr/joe/project/backup/.

Task 2

You are asked to write a program to simulate the process of card dealing. Each card is represented by a number from 0 to 51. The program should produce a list of 52 cards, as it results from a human card dealer. Let us assume that if a person cuts a deck of cards and shuffles it 7 times, the result is satisfactory.

Task 3

Traditionally, Chinese write numbers with a comma inserted at each fourth number from the right. For example, 1,000,000 is written as 100,0000. Please implement a program that transforms the Chinese writing format (100,0000) to the western format (1,000,000).

To simplify the programming task, you don't need to read the input from the keyboard. You can assume you can get the input anywhere you like, such as a static class variable, a parameter of a method, or input from the command line.

Task 4

Jack has a long list of MP3 songs he has compiled. However, many of the songs are repeated in the list. He wants to create a new list in which each song appears only once. Assume each list has the following format TITLEa, TITLEb, TITLEc, ... where TITLEi is a string including letters only. Implement a method to create a new list with no repetitions.

Assume the list is stored somewhere; for example, you can put it into a class variable.

Task 5

Please write a program that can calculate the day of the week. We know that today is Jan. 19, 2001, Friday. Your program should be able to compute the day of the week M years from today, or N months from today. Both M and N could be negative, which means M years or N months before. Assume the convention to pass the data to your program is:
Y 10 means 10 years from today, and
M -5 means 5 months ago.

Task 6

A processor needs to respond to a series of events. Each event is assigned a distinct number. When the processor is busy, newly arrived events will be put into a waiting list. When the processor finishes processing the previous event, it picks an event in the waiting list. However, it picks the event with the largest number in the waiting list. You are asked to implement a pair of operations: one to put a new event into the waiting list, and the other to help the processor pick up the next event to be processed. (You don't need to be concerned with concurrency.)

All tasks could be implemented with different combinations of different reusable components from the repository. If the subjects know or find the right components, the implementation would be fairly easy; if they do not, they would have to use components of lower levels or even basic statements. Therefore, those tasks can allow us to observe how the delivery of the system changes the programming process of subjects.

8.2.4 Methods of Observation and Analysis

The CodeBroker system has an automatic log mechanism that logs the reuse queries extracted by Listener, the components retrieved by Fetcher, the components removed by Presenter, and both system-initiated and user-initiated changes to discourse models and user models.

All experiments, including interviews, were videotaped. Subjects were asked to think aloud during the experiments. However, because thinking aloud may interfere with normal programming practice, this was not stressed. Analysis of the system was based on the log data, the video tapes, and transcribed interviews. The purpose of the analysis was not about the quality and productivity of programming; instead, it was about how CodeBroker affects the process of programming by encouraging programmers to reuse. Quantitative assessment was based on log data, and qualitative evaluation was based on interviews and think-aloud protocols.


Ph.D. Dissertation by Yunwen Ye, April 20, 2001, Department of Computer Science, University of Colorado