Users' queries as well as items in information systems are usually not directly represented. For example, in a multimedia design environment, a user wants to find an ideal image from a large image library for a design task. There is plainly no way to express the query directly--after all, if the user knows how to directly express it, he or she has it already and does not need to locate it at all! The user must rely on an abstract representation schema to describe certain attributes of the needed image. Images in the image library are also abstractly represented. But even with this abstract representation schema, users can begin with only a very vague query [Nakakoji et al., 1998]. In a reusable component repository system, reusable components are not directly represented either--a direct representation should be program code; however, they are represented by surrogates such as textual descriptions and signatures. Those abstract representations are partial, biased, and imprecise. Even in the domain of document retrieval, where the representation schema is the same as the information itself, users often ask the wrong questions [Jones, 1997].
Querying and browsing are the two major information access mechanisms for most users. Querying is direct: users formulate a query and the system returns information matching the query. However, formulating queries is a cognitively challenging task because users have to overcome the gap from the situational model to the system model (See Section 3.4.3.2). In browsing, users determine the usefulness or relevance of the information currently being displayed in terms of their task and traverse its associated links. People tend to find browsing more fun than querying because they do not need to commit resources at first and can incrementally develop their requirements after evaluating the information along the way [Lieberman, 1997]. Mili et al. claims that browsing is the most predominant pattern of component repository usage because most programmers often cannot formulate clearly-defined requirements for reusable components so they rely on browsing to get acquainted with available reusable components in the repository [Mili et al., 1999].
However, browsing is not scalable due to the following reasons. First, there is an inherent dilemma in the design of the browsing structure of an information system: If links are too many, users will be puzzled by the complexity; if links are too few, information is not well connected. Second, there cannot be a structure suitable for all users and all user tasks, and the structure defined at the design time may not be the structure needed at the use time. For example, Smalltalk class library is structured according to the inheritance relationship. This structure is perfectly suitable for the execution of programs, and for locating components whose super nodes (classes or super-classes) are known. However, it is not suitable for programmers to find a method based on functionality. Some methods with similar functionality are scattered in different deep nodes of the inheritance tree [Helm and Maarek, 1991]. It is therefore very difficult for programmers to find and compare all of them in order to choose the most appropriate one to reuse. Third, in a large information system, following the right link requires users to have a very good understanding of the structure of the whole system. Most users, especially the less experienced users, may easily get lost in a complex network of nodes while tracing dozens of links [Halasz, 1988].
Context-aware browsing supported by active information systems combines the strength of both querying and browsing: the directness of querying and the lower cognitive threshold of browsing. Active information systems automatically collect and present information to users based on the task context where the information is consumed. Even though the delivered information may not be precise enough due to the incompleteness of task models, users can immediately start to browse a significantly reduced information space that is organized in accordance with their task structure.
When the retrieval-by-reformulation mechanism is integrated with active information systems, the reformulation process of users can be used at the same time, as a nice side effect, to augment discourse models and user models. Details will be explained in Section 3.4.3 in the context of introducing CodeBroker.