Subsections


5.4 Dealing with Partial, Imprecise Queries

Locating useful information from an information system to support complex design processes in wicked problem domains in general and locating reusable components from a component repository in particular are not the same as searching for data in a database system where the query is well defined and completely articulated. Instead, users are looking for useful information and the usefulness can only be determined by them, depending on how they intend to make use of it.

Users' queries as well as items in information systems are usually not directly represented. For example, in a multimedia design environment, a user wants to find an ideal image from a large image library for a design task. There is plainly no way to express the query directly--after all, if the user knows how to directly express it, he or she has it already and does not need to locate it at all! The user must rely on an abstract representation schema to describe certain attributes of the needed image. Images in the image library are also abstractly represented. But even with this abstract representation schema, users can begin with only a very vague query [Nakakoji et al., 1998]. In a reusable component repository system, reusable components are not directly represented either--a direct representation should be program code; however, they are represented by surrogates such as textual descriptions and signatures. Those abstract representations are partial, biased, and imprecise. Even in the domain of document retrieval, where the representation schema is the same as the information itself, users often ask the wrong questions [Jones, 1997].

5.4.1 Context-Aware Browsing

Given the fact that complete requirements for information are not available at first, information systems cannot locate the exact information needed by users. The problem of incomplete information requirements is more severe in active information systems because requirements are inferred. However, active information systems can heuristically reduce the searching space to the extent that users can easily browse and choose the one needed. This kind of approach to the acquisition of information is defined as context-aware browsing.

Querying and browsing are the two major information access mechanisms for most users. Querying is direct: users formulate a query and the system returns information matching the query. However, formulating queries is a cognitively challenging task because users have to overcome the gap from the situational model to the system model (See Section 3.4.3.2). In browsing, users determine the usefulness or relevance of the information currently being displayed in terms of their task and traverse its associated links. People tend to find browsing more fun than querying because they do not need to commit resources at first and can incrementally develop their requirements after evaluating the information along the way [Lieberman, 1997]. Mili et al. claims that browsing is the most predominant pattern of component repository usage because most programmers often cannot formulate clearly-defined requirements for reusable components so they rely on browsing to get acquainted with available reusable components in the repository [Mili et al., 1999].

However, browsing is not scalable due to the following reasons. First, there is an inherent dilemma in the design of the browsing structure of an information system: If links are too many, users will be puzzled by the complexity; if links are too few, information is not well connected. Second, there cannot be a structure suitable for all users and all user tasks, and the structure defined at the design time may not be the structure needed at the use time. For example, Smalltalk class library is structured according to the inheritance relationship. This structure is perfectly suitable for the execution of programs, and for locating components whose super nodes (classes or super-classes) are known. However, it is not suitable for programmers to find a method based on functionality. Some methods with similar functionality are scattered in different deep nodes of the inheritance tree [Helm and Maarek, 1991]. It is therefore very difficult for programmers to find and compare all of them in order to choose the most appropriate one to reuse. Third, in a large information system, following the right link requires users to have a very good understanding of the structure of the whole system. Most users, especially the less experienced users, may easily get lost in a complex network of nodes while tracing dozens of links [Halasz, 1988].

Context-aware browsing supported by active information systems combines the strength of both querying and browsing: the directness of querying and the lower cognitive threshold of browsing. Active information systems automatically collect and present information to users based on the task context where the information is consumed. Even though the delivered information may not be precise enough due to the incompleteness of task models, users can immediately start to browse a significantly reduced information space that is organized in accordance with their task structure.

5.4.2 Supporting Retrieval-by-Reformulation

Another approach to complement the incompleteness of information requirements is to support retrieval-by-reformulation (see Section 3.4.3). Active information systems first present users with the initial retrieval results. This initial delivery can serve the following two purposes:
  1. Users can learn how the information system stores and organizes its information by examining the retrieval results.
  2. Users can discover some requirements that were not present in the initial query by comparing the retrieval results with their intentions of use in context.
Based on newly acquired knowledge on the information system and discovery of new aspects of requirements, users can either reframe or refine the initial query to improve its completeness and preciseness, or directly manipulate the retrieval results by filtering out apparently irrelevant information so that those needed get more focused attention.

When the retrieval-by-reformulation mechanism is integrated with active information systems, the reformulation process of users can be used at the same time, as a nice side effect, to augment discourse models and user models. Details will be explained in Section 3.4.3 in the context of introducing CodeBroker.


Ph.D. Dissertation by Yunwen Ye, April 20, 2001, Department of Computer Science, University of Colorado