Subsections
As a part of the knowledge-intensive programming process,
reuse is a process of applying the knowledge of reusable
components into programs. Because few programmers
know all about reusable components, component repository
systems are introduced to facilitate the easy application of reusable
components during programming. Based on the source of the knowledge of
reusable components, three modes of reuse exist:
reuse-by-memory, reuse-by-recall, and reuse-by-anticipation.
- Reuse-by-Memory. In the reuse-by-memory mode, while designing
a new program, programmers notice similarities between the
new program and reusable components that they have learned in the past and know
very well. Therefore, they can reuse these known components easily
during the programming,
even without the support of a component repository system, because their
memory assumes the role of the repository system.
- Reuse-by-Recall. In the reuse-by-recall mode, while developing a
new program,
programmers vaguely recall that the repository contains
some reusable components with similar functionality, but they do not
remember exactly which components they are. They need to search the
repository to find what they need. In this mode, programmers are
often determined to find the needed components. An effective
retrieval mechanism is the main concern for component repository systems
supporting this mode. The successful operation of reuse in this mode needs both knowledge
from programmers and knowledge from the repository.
- Reuse-by-Anticipation. In the reuse-by-anticipation mode,
programmers formulate reuse intentions based on their anticipation
of the existence of certain reusable components. Even though they are
not certain that relevant components exist, their knowledge
of the domain, the programming environment, and the repository
is enough to motivate them to search in hopes of finding relevant
components. In this mode, if programmers cannot find quickly enough what they want
from the repository, they will soon give up reuse [Mili et al., 1995].
Repository is the main source of knowledge for the successful operation of
reuse in this mode.
Programmers have little resistance to the first two modes of
reuse.
As has been reported by Isoda, programmers reuse those components
repeatedly once they have learned them [Isoda, 1995]. Lange and Moher,
in their empirical study on programming and reuse strategies, have found that
programmers search extensively
for the components they know exist even if they may not be
able to name them a priori [Lange and Moher, 1989].
This explains why individual ad
hoc reuse has been taking place while organization-wide systematic
reuse has not received the same success:
programmers have individual reuse repositories
in their memories so they can reuse by memory or
reuse by recall [Mili et al., 1995].
For those components that have not yet been internalized into
their memories, programmers have to resort to the mode
of reuse-by-anticipation. The activation of the reuse-by-anticipation
mode relies on two enabling factors:
- Programmers anticipate the existence of reusable components.
- Programmers perceive that the cost of the reuse process is
cheaper than that of programming from scratch.
4.1.2 Information Islands in Component Repositories
Unfortunately, programmers' anticipation of available
reusable components does not always match real repository systems.
Empirical studies on the use of high-functionality computer
systems (component repository systems being typical examples of them) have found
there are four levels of users' knowledge about
a computing system (Figure 4.1) [Fischer, 2001].
Figure 4.1:
Different levels of programmers' knowledge about a component repository
|
|
In Figure 4.1, ovals represent the collection of components
that are in a particular knowledge level of programmers,
and the rectangle
represents the actual information space (namely, the whole collection
of items in an information system), labeled L4. L1 includes
those reusable components that are well known, easily
employed, and regularly reused by a programmer. L1 corresponds
to the reuse-by-memory mode. L2 contains components known vaguely
and reused only occasionally by a programmer; they often
require further confirmation before being reused. L2 corresponds
to the reuse-by-recall mode. L3 represents what programmers believe
about the repository. L3 corresponds to the reuse-by-anticipation
mode.
Many components exist in the area of (L4 - L3), and their existence
is not known to programmers. Consequently, there is no possibility
for programmers to reuse them simply because people do not ask for what
they do not know [Fischer and Reeves, 1995]. Components in (L4
- L3) thus become information islands [Engelbart, 1990,Ye and Fischer, 2000],
inaccessible to programmers without appropriate tools.
Repositories are not static--it is
expected that they will evolve over time, and this will increase
the size of (L4 - L3).
Many reports about reuse experiences of industrial software
companies illustrate this inhibiting factor of reuse. Devanbu et al. have
reported that because developers are unaware
of reusable components, they repeatedly re-implement the same
function--in one case, this occurred ten
times [Devanbu et al., 1991].
This kind of behavior is also observed as typical among the four companies
investigated by Fichman and Kemerer [Fichman and Kemerer, 1997]. From the experience of
promoting reuse, Rosenbaum and DuCastel have
concluded that making components known to developers is a
key factor for successful reuse [Rosenbaum and DuCastel, 1995].
Human beings often try to be utility-maximizers in the decision-making
process [Reisberg, 1997], and programmers are no exception. When programmers
perceive that reuse utility, which is the ratio of reuse value to reuse cost,
is too low, they do not make an attempt to
reuse [Sen, 1997].
Because there is no easy way for programmers to estimate
reuse value and reuse cost objectively, the estimation made by
programmers during programming is quite subjective and
suffers from cognitive biases against reuse; they tend to
underestimate reuse value and overestimate reuse cost.
The value of reuse is multifold. As stated in
Section 2.4, reuse value includes:
- (1)
- reduced development time
- (2)
- improved quality
- (3)
- easy maintenance
- (4)
- improved evolvability
- (5)
- increased problem-framing ability
However, not all programmers recognize reuse value when they are under
a tight schedule to finish their current program.
Most of the reuse value is long-term and shows its benefit only after
the program has been developed; for programmers,
what interests them most are the short-term benefits.
In his investigation on reuse in NTT (Nippon
Telegraph and Telephone Corporation), Isoda concludes that unless programmers
find the immediate benefits of applying reusable components, they
will not, of their own free will, perform reuse [Isoda, 1995].
It is human
nature to pay attention to the immediate benefits only and ignore
long-term benefits [Grudin, 1994] because human
beings are unable to think coherently about the remote future and
particularly about the distant consequences of their actions [Simon, 1996].
To encourage programmers
to recognize the full benefits of reuse, many researchers have called
for reuse education. Despite its importance, reuse education alone
has not brought reuse to fruition [Joos, 1994] because being told
that ``it is for your own good'' seldom provides adequate motivation
for programmers to change their behavior [Simon, 1996].
Some organizations have also tried to provide monetary
rewards to programmers who reuse, which has not been successful
either [Frakes and Fox, 1995].
As analyzed in Section 3.4.3, the cost of reuse
caused at reuse time includes:
- (1)
- the cost of forming reuse intentions
- (2)
- the cost of formulating reuse queries
- (3)
- the cost of operating the repository system to retrieve components
- (4)
- the cost of choosing components
- (5)
- the cost of understanding and modifying components
- (6)
- the cost of integrating components
In addition, when reuse repository systems are separated from current
programming environment, reuse cost includes
the cost associated with switching back and forth between the programming
environment and the reuse repository system, which causes the loss of
working memory and the disruption of workflow.
Depending on the reuse mode, only some of these costs may be involved. In
the reuse-by-memory mode, the cost of reuse is reduced to the cost of (6) only.
In the reuse-by-recall mode, the costs of (1), (2), and (4) are quite small
because programmers know what to look for and where to find the components.
In the reuse-by-anticipation mode, all of these costs are involved, and due
to the following two cognitive biases--Einstellung and
loss aversion--against reuse, those costs are often
overestimated.
- Einstellung.
Human beings often display Einstellung in problem solving.
Einstellung,
the German word for ``attitude,'' refers to the mechanization of
problem-solving strategy. Once problem solvers discover a strategy
that ``gets the job done,'' they are less likely to discover new
strategies until they are completely stuck [Reisberg, 1997]. Due to
Einstellung,
human beings often stick with what they know best. As the
term production paradox [Carroll and Rosson, 1987] suggests, even though
there is an effective strategy of solving a problem, most people are not
motivated to learn this new strategy and will ``play it safe'' by using a
suboptimal solution that they personally consider to be safe.
Even today, for most programmers, building programs from scratch is still
the proven strategy. This partially explains the observed phenomenon
of ``programmer machoism''--programmers have a tendency to chronically
underestimate how difficult a programming task is and overestimate
the cost of reuse [Graham, 1995].
- Loss Aversion.
Another known phenomenon in the decision-making process of human
beings is loss aversion--the tendency to be far more sensitive
to potential loss than to potential gain [Reisberg, 1997]. Starting
a reuse process requires a mental switch. The demand on working
memory and time is immediate, and the potential gain is unclear because
programmers are not sure whether the needed component exists, whether
they are able to find it even if it does exist, and whether they are
able to understand and modify it even if they find it.
Ph.D. Dissertation by Yunwen Ye, April 20, 2001, Department of Computer Science, University of Colorado