Subsections


3.4 Understanding the Cognitive Difficulties of Component Reuse

The exciting advance in the creation of reusable components cannot lead to the materialization of reuse if programmers are not able to reuse them during their own system development. To create appropriate tools to assist programmers in reusing, we need to identify the tasks faced by programmers when they try to reuse and the cognitive skills required to perform those tasks.

3.4.1 Cognitive Engineering

Component reuse is a cognitive activity, which is a goal-directed problem-solving effort. To understand the complexity of cognitive activities, Norman [Norman, 1986] has developed a method called cognitive engineering that applies what is known from cognitive science to the design and construction of tools that assists cognitive activities of human beings. Such cognitive tools, including reusable component repository systems, must address the discrepancy between a user's goal, expressed in terms relevant to the user and his or her task, and the tool's mechanism, expressed in terms relative to it. This discrepancy creates two gulfs: the gulf of execution and the gulf of evaluation. The gulf of execution is the gap from goals to tools, and it must be bridged by three consecutive efforts:

The gulf of evaluation is the gap from the tool output to the intended goal, and it must be bridged by another three consecutive efforts:

3.4.2 A Cognitive Model of Component Reuse

Like other cognitive activities, component reuse starts with a goal in the mind of a programmer. To achieve such goals, programmers have to translate their internal intentions into a series of physical actions constrained by a component repository system. Figure 3.1 illustrates the actions needed for the reuse process to succeed based on the method of cognitive engineering [Ye et al., 2000]:

Figure 3.1: A cognitive model of the component reuse process

\includegraphics[width=0.8\linewidth]{figs/cognitiveModel.eps}

The cognitive model in Figure 3.1 is a refinement of the location-comprehension-modification cycle (LCM-cycle) of a reuse process in Figure 1.1 [Fischer et al., 1991], with special emphasis on the location process, which is the focus of this research. Although the LCM-cycle acknowledges and stresses the difficulty of formulating appropriate reuse queries in the location process, it does not point out that formulating reuse queries must be preceded by the forming of reuse intentions. Furthermore, the comprehension step in the LCM-cycle does not differentiate two different levels of comprehension: comprehension for choosing components and comprehension for integrating components. Comprehension for choosing is still a part of the location effort because it may result in the reformulation of queries.


3.4.3 Cognitive Challenges of Component Reuse

Each action in this cognitive model of component reuse poses challenges to programmers and may deter the success of reuse without appropriate tool support.


3.4.3.1 Vocabulary Learning: A Prerequisite for Forming Reuse Intentions

Reusable components help programmers think at higher levels of abstraction and increase the ``vocabulary'' programmers can use to create and interpret program designs [Krueger, 1992]. However, programmers must learn the syntax and the semantics of the new vocabulary to take advantage of reusable components. At the least, programmers should know the existence of the components; otherwise, they are not able to form reuse intentions in the first place, and reuse would fail in this very first step. Vocabulary learning is a major part of the cognitive barrier to reuse [Brooks, 1995,Fichman and Kemerer, 1997].

Unlike the syntax of programming languages, which can be learned through schooling or tutoring before programmers start working, the mastery of reusable components cannot be completed in classrooms or merely by reading books [Ye, 1998]. Due to the large volume of reusable components and their constantly evolving nature, total coverage is impossible and obsolescence is unavoidable. Moreover, learning components is less effective when the components are separated from their use context; components are better learned when they are needed for a programming task. Therefore, component learning needs to be integrated with working where the components are reused, and programmers should learn components on demand--that is, learn the component when it is needed [Fischer, 1991]. However, a pitfall for the learning-on-demand model is that programmers need to learn a component because they do not know it, but because they do not know it, they may settle on a suboptimal solution by creating their own program instead of reusing the existing component. To support learning-on-demand, component repository systems should be able to identify learning (and reusing) opportunities by connecting programmers to the components that can be reused in their current task.


3.4.3.2 Conceptual Gap in Formulating Reuse Queries

Formulating reuse queries refers to transforming internal reuse intentions into explicit, external reuse queries. Reuse intentions, derived from development activities, are conceptualized in the situation model [Kintsch, 1998] that is related to the application task to be solved and to the concerns of the programmer. A situation model is the mental model programmers have of their environment. A system model is the ``actual'' model of a computer system. For a component to be retrieved, these intentions need to be mapped from the user's situation model onto the system model, namely the repository system [Fischer et al., 1991]. Without enough knowledge about the system model of component repository systems, programmers cannot formulate reuse queries appropriately. This conceptual gap between situation model and system model is another cognitive barrier to reuse.

There are two types of conceptual gap between situation model and system model: vocabulary mismatch and abstraction mismatch. The vocabulary mismatch refers to the inherent ambiguity in most natural languages. Thanks to the richness of natural languages, people use a variety of words to refer to the same concept. Based on their systematic study of word use of ordinary people in different domains, Furnas et al. have found that the probability that two persons choose the same word to describe a concept is less than 20% [Furnas et al., 1987]. Even well-trained indexing experts have a 20% disparity on average in choosing terms to describe the same document [Harman, 1995].

The abstraction mismatch refers to the difference of abstraction levels in requirements and component descriptions. Programmers deal with concrete problems and thus tend to describe their requirements concretely, whereas reusable components are often described in abstract concepts because they are designed to be generic so they can be reused in many different situations. For example, in one experiment to evaluate the CodeBroker system, one subject described his task as follows: 3.4

/** This class contains methods for converting between 
    western-style numbers (three numbers in a set with a 
    comma) and Chinese style numbers (four numbers followed 
    by a comma). For example, 1,000,000--> 100,0000. */
Another subject initially described the same task in a similar way:3.5
/** Takes a string with a Chinese formatted number and 
    outputs a western formatted number. */
This task can be easily implemented by setting the group size to 4 with the method setGroupingSize of the class java.text.DecimalFormat. However, the description of this method (as follows) is abstract: It describes grouping of numbers without mentioning western style or Chinese style in particular.
public void setGroupingSize(int newValue)
   Set the grouping size. Grouping size is the number of 
   digits between grouping separators in the integer 
   portion of a number. For example, in the number 
   "123,456.78", the grouping size is 3.

3.4.3.3 Effective Retrieval Mechanisms in Retrieving

The retrieval process finds the components that match given reuse queries. An effective retrieval mechanism--including a representation schema for indexing and a matching criterion between a query and a component--is essential.

Many retrieval mechanisms have been proposed in the past (for more detailed descriptions, see related work in Section 9.2). There are three major approaches: text-based, descriptor-based and formal specification-based. In text-based approaches, components are represented by their textual documents and information retrieval technology is used to match components to queries [Maarek et al., 1991]. In descriptor-based approaches, components are represented by a set of selected descriptors. The semantic relationships among those descriptors are captured in a predetermined structure that can be specified by a semantic network [Henninger, 1997], an AI frame [Ostertag et al., 1992], a taxonomic category system [Devanbu et al., 1991], or a fuzzy set theory [Damiani et al., 1997]. In specification-based approaches, components are represented with formal specification languages, and automatic theorem-proving systems [Zaremski and Wing, 1997] or specification refinement systems [Mili et al., 1997a] are used to determine whether a component matches a query, written in formal specification languages too.

In terms of complexity of representation schemata, the text-based approach is the simplest and the specification-based approach the most complicated. In general, a complete and precise representation can make the matching more precise and retrieval more effective. However, because the same representation is also used by programmers to specify their reuse queries, the schema of representation is greatly limited by programmers' willingness to formulate long and precise queries. There is no point in representing every bit of relevant information about a component if a programmer barely has the patience for typing string search regular expressions [Mili et al., 1995].


3.4.3.4 Retrieval by Reformulation in Retrieving and Choosing

Because effective use of any information retrieval system requires users be fairly familiar with the structure of the information systems and their representation schemata, it is difficult for most users to create a well-defined query on their first attempt [Jones, 1997]. Component repository systems can, at best, retrieve those components that match the queries submitted by programmers, but not necessarily match their intentions, many of which are not articulated. Retrieval by reformulation is a mechanism that allows users to incrementally improve their queries to match their intentions after they have interpreted and evaluated the retrieved results and have explored the underlying structure of the information system [Williams et al., 1982,Fischer and Nieper-Lemke, 1989]. If programmers cannot find the needed component from the first retrieval result, they can reformulate their query by using more appropriate terms that they learn from retrieved components, or they can narrow the search range by taking advantage of the structure of the repository, which they may not have known before exploring the retrieved results.


3.4.3.5 Component Comprehension in Choosing and Integrating

Being able to comprehend components is necessary both for choosing the right component and for integrating the chosen component.

Comprehension for choosing is focused on what the component does, and it is conducted in two stages: information discernment and detailed evaluation [Carey and Rusli, 1995]. At the stage of information discernment, programmers avoid spending too much time by quickly scanning the component and its description to decide whether this component is related to their current task, and thereby also avoid any deep understanding at this point [Lange and Moher, 1989]. The process of information discernment may result in the reformulation of queries if programmers find the retrieval results are not satisfactory. Only when a promising component is found do programmers start to evaluate the components extensively.

To integrate a component into their programs, programmers need to understand the component's functionality, its usage, and even its implementation details, especially in cases of white-box reuse and glass-box reuse (Section 3.1.2). Executable examples that use the component prove to be very useful to help programmers quickly understand how to reuse the component in their own programming task [Redmiles, 1992,Aoki et al., 2001].


Ph.D. Dissertation by Yunwen Ye, April 20, 2001, Department of Computer Science, University of Colorado