In everyday communication among people, understanding draws on a shared background and a shared context between speakers and listeners. Each speech act committed by the speaker is interpreted by the listener against the shared background and the shared context implicitly accessible to both [Winograd and Flores, 1986]. An implicit communication channel can be established when the workspace of users is shared with information systems because such a shared workspace can be utilized to create the shared background and context between users and information systems. Recent user actions in the workspace partially reveal what the current task is, and the actions can be understood and the goal of the task can be inferred based on the underlying domain knowledge and the relationship between the actions and the elements existing in the workspace already. Such an inference or understanding of user actions and goal can be used by systems to predict what kind of information would be interesting to the user. For example, when a kitchen designer places a stove in front of a window with curtain in a kitchen design environment, a knowledgeable observer (either a human being or a computer system) can infer that the designer is not aware of the fire hazard of the design, even if the designer does not say anything about it; and a piece of information about fire safety rules in kitchen design probably would interest the designer [Nakakoji, 1993].
The remainder of this section explains how an information system can utilize information existing in the shared workspace to locate task-relevant information. Relevance of information to a user's task can be assessed at two levels: the immediate task level and the larger context level. Most endeavors of users cannot be accomplished in one action; they often need to be divided into smaller tasks. The immediate task, or task at hand, is the portion of the whole endeavor to which a user is currently attending, and every immediate task is conducted within a context defined by previously accomplished tasks and the overall goal of the whole endeavor. For example, in a programming environment, an immediate task for a programmer is a procedure or method that he or she is currently developing, and the larger context includes those functions or methods that have been developed so far and with which the current procedure or method will interact to make a whole system.
Two kinds of approaches exist for the recognition of task plans: plan libraries and generic plans.
Some recommendation systems such as Siteseer [Rucker and Polanco, 1997] use the first assumption to recommend new information to users. Siteseer helps users discover interesting web pages. The situation of a user is defined by his or her Bookmarks (or Favorites). Users are thought to be in a similar situation if their Bookmarks have enough overlap. Within such a group of users, new web pages that have the highest overlap among the Bookmarks of other users but do not appear in one particular user's own Bookmarks are recommended to that user. Other systems, such as Grouplens [Konstan et al., 1997] and PHOAKS [Terveen et al., 1997], are also designed based on this assumption. This assumption is also widely used in e-commerce websites. For example, the Amazon.com website recommends to a book-buying customer new books that have been purchased by other customers who have bought books similar to those the customer has bought.
The second assumption--similar information--underlies many information agents, such as Remembrance Agent [Rhodes and Starner, 1996] and WebWatcher [Armstrong et al., 1995]. These information agents deliver to users new information, such as emails or web pages, that are determined to be similar to what the user is currently focusing on. For example, when a user is writing a new email in an email editor, Remembrance Agent autonomously searches the email folders and personal notes of the user and delivers messages and notes that are similar in content to what the user is currently writing.
These two approaches correspond, respectively, to the two models of long-term memory retrieval of human beings: retrieval by recognition and retrieval by association. In the process of retrieving information by recognition, information from memory is directly retrieved at the recognition of distinctive features of a fixed pattern [Simon, 1996]. Plan recognition systems simulate the same process: from features to goal recognition and then to relevant information. In the process of retrieving information by association, information strongly linked with the existing perceptual elements is activated from the long-term memory. In this process, along with information useful for the current situation, some irrelevant information may also be activated. Therefore, humans have to select, based on their existing knowledge, the information that can be correctly integrated with the current situation [Kintsch, 1998]. Following this process, similarity analysis-based systems retrieve and deliver information based on the link (the similarity or association), and it is up to the users to decide which information they need to incorporate into their task.
Table 5.1 summarizes the differences between plan recognition and similarity analysis in terms of their underlying memory retrieval models, their technical approaches, and their major shortcomings.
The primary goal of active component repository systems is to identify reuse opportunities by delivering reusable components to programmers before they have fully implemented the program. Plan recognition is more difficult because the system has to recognize program plans from partially constructed programs. Therefore, the plan recognition approach is not suitable for active component repository systems.
This research adopts the similarity analysis approach to locate reusable components that are relevant to the module (a piece of program under development) by making use of the descriptive elements existing in both modules and components.
Concept
Important concepts of a program are often contained in its informal
information structure. Software development is essentially a
cooperative process among many programmers. Programs include
both formal information for their executability and informal information
for their readability by peer programmers [Fischer and Schneider, 1984].
Informal information includes structural indentation, comments,
and identifier names [Soloway and Ehrlich, 1984]. Comments and
identifier names are important beacons for the understanding
of programs because they reveal the important concepts of
programs [Biggerstaff et al., 1994,Etzkorn and Davis, 1997a,Michail and Notkin, 1999].
The embedding of informal information in programs improves long-term
indirect communication among programmers because, unlike a separate
document for a program, information about the program is stored in the
place where it is most useful--in the program itself [Reeves, 1993].
Modern programming languages such as Java enforce this embedding of informal information further by introducing the concept of document comment (doc comment for short). A doc comment, beginning with /** and continuing until the next */, immediately precedes the declaration of a module which is either a class or a method. Doc comments are utilized by the Javadoc program to create online documentation from Java programs. Contents inside doc comments are meant to describe the functionality of the following module.
Constraint
Constraints of a program exist at four levels: syntactical level,
semantic level, architectural level, and practical level.
The syntactical constraint of a program is captured by its signature.
As the type expression of a program, a signature defines the program's
syntactical interface.
The basic form of a signature of a method or a function is:
Signature:OutputTypeExp<-InputTypeExp
where OutputTypeExp and InputTypeExp are type
expressions that
result from applying a Cartesian product constructor to all their parameter
types. For example, for the method,
int getRandomNumber (int from, int to)
the signature is
getRandomNumber: int <- int x int
A signature of a class contains all the type definitions of its attributes
and all the signatures of its methods.
The semantic constraint of a program involves the conditions with which the input and output data have to agree. In formal specification languages, these conditions are described as pre-conditions and post-conditions [Wing, 1990]. Some programming languages, such as C, use the assertion statement for programmers to express the intended semantic constraint of a program. For object-oriented programming languages, a contract model can be used to specify the semantic constraints of a class [Meyer, 1997]. A contract of a class specifies the invariant of the class and the legal order in which the methods of the class should be called. However, both pre- and post-conditions and contracts are still not widely adopted by programmers. One reason is that it is difficult to write these constraints because it requires deep mathematical knowledge, and another reason is that it is also very difficult to develop reliable yet efficient computing tools to check these constraints.
The architectural constraint of a program is introduced when a programmer wants to develop a system in conformance with a particular architectural style. In the class level, a programmer may want to confine a class to a chosen design pattern or to fit the class into an existing framework. Because design patterns and frameworks prescribe how their constituent classes interact with each other, they impose extra constraints on the interface and implementation of classes.
The practical constraint of a program includes the performance criteria required. In some critical situations, programs need to be time-efficient to respond in a timely fashion, memory-efficient to consume limited memory resources, thread-safe to assure concurrent executions, and so on.
Code
Code is meant to be executed by computers. It is the machine-executable
representation of concepts, and it must conform to all the
required constraints.
The KID system, a kitchen design environment embedded with active critiquing, includes such a specification mechanism [Nakakoji, 1993]. KID supports two types of critiquing: generic and specific. Generic critics deliver design knowledge applicable to all kitchen designs, such as accepted standards or regulations. Specific critics deliver design knowledge applicable only to the design situation currently under consideration. Prior to design actions, a user can characterize the kitchen to be designed by answering some questions, such as ``Size of the family?'' and ``Is the primary cook right-handed or left-handed?'' This specification is used later by the system to fire relevant specific critics regarding design actions as well as to exclude critics that are relevant in general but not consistent with the specification in particular.
Despite the value of a partial specification of high-level goals in delivering relevant information, some users may have difficulty articulating their requirements before they start to perform their task, especially when they are not clear about what kind of information is provided by the system. Moreover, if the structure of the information system is too complicated and too many questions needed to be answered by users to collect a meaningful partial specification, users may get annoyed and become impatient with the system. Generally speaking, users are more eager to do ``real work'' than answer a long list of questions.
Incremental discourse modeling amortizes the efforts needed from users for the specification of the global goal. For each piece of information delivered by active information systems throughout the work process, users can choose to agree with it, disagree with it, ignore it, or declare it irrelevant. If information systems have means to capture and represent these responses in a discourse model, more appropriate information can be delivered later.
A discourse model can be either positive or negative. A positive discourse model contains the type of information that has interested users and is likely to be useful. In later deliveries, information systems try to retrieve and deliver information of the same or a similar type. A negative discourse model contains the type of information in which users have not been interested. It can be used as a filter to remove the same type of information. Negative discourse modeling is particularly powerful in situations where misfits are much easier than fits to be identified by users [Alexander, 1964]. Information filtering and information retrieval are two sides of the same coin; both aim to improve the relevance of information. An information system can create a discourse model with both positive and negative representations of the larger context. In systems that have this kind of mixed discourse model, information similar to positive descriptions is considered to have higher relevance, whereas information similar to negative descriptions is considered to have lower relevance.
A discourse model can be created and augmented explicitly or implicitly. Explicit discourse modeling requires users to respond to the delivery of information with an explicit answer. For example, a piece of information can be delivered with a mechanism that allows user to specify whether the information is useful or useless, relevant or irrelevant, and the system integrates the user responses into the discourse model. Instead of asking users for direct input, implicit discourse modeling observes the users' reaction to the delivered information--whether the user uses or ignores the information--and augments discourse models based on acquisition rules that make inferences from the observation.
As an example, a discourse model can be incorporated into the ``Tip of the Day'' of Microsoft Office to improve the context relevance of delivered tips. For instance, if a user continues to use tables in his or her recent use of the Office system, an appropriate discourse modeling mechanism would be able to implicitly capture this and instruct the system to deliver tips about table operations to the user.