Relevance feedback is a process where users identify relevant documents in an initial list of retrieved documents, and the system then creates a new query based on those sample relevant documents. Algorithms for automatic relevance feedback have been studied in IR for more than thirty years, and the research community considers them to be thoroughly tested and effective. Companies and government agencies that use IR systems also view relevance feedback as a desirable feature, but there are some practical difficulties that have delayed the general adoption of this technique.
Most of the relevance feedback experiments reported in the IR literature were based on small test collections of abstract-length documents. The central problems in relevance feedback are selecting "features" (words, phrases) from relevant documents and calculating weights for these features in the context of a new query. These problems are substantially more difficult in environments with large databases of full-text documents. In addition, people searching databases in real applications often use relevance feedback in different ways than anticipated by IR researchers. Feedback techniques were developed to improve an initial query and assumed that a few relevant documents (all those in the top ten, for example) would be provided. In many real interactions, however, users specify only a single relevant document. Sometimes that relevant document may not even be strongly related to the initial query, and the user is, in effect, browsing using feedback.
These factors mean that traditional feedback techniques can be unpredictable in operational settings. Research aimed at correcting this problem is underway and more operational systems using relevance feedback can be expected in the near future. Relevance feedback techniques are also an important part of building profiles in a routing system (issue 6), with the main difference being the number of example relevant documents available.
Information extraction techniques, primarily developed in the context of the Advanced Research Projects Agency (ARPA) Message Understanding Conferences (MUCs), are designed to identify database entities, attributes and relationships in full text. For example, for people interested in new joint ventures, an information extraction system could identify the names of the companies involved, the new company, the products, and the location, all from articles coming over a news feed. Companies and government agencies have considerable interest in these techniques, and see them as contributing significant "added-value" to the text databases they and others generate. Potential users also see these techniques as tools to help with data analysis, browsing, and mining using text databases. The current state of information extraction tools is such that it requires a considerable investment to build a new extraction application, and certain types of information are very difficult to identify. Research in this area is focused on reducing the effort required for new applications.
Extraction of simple categories of information is, on the other hand, practical and can be an important part of a text-based information system. Examples of this type of information include company and other organization names, peoples' names, locations, and dates.
Multimedia indexing and retrieval refers to techniques being developed to access image, video and sound databases without text descriptions. The perceived value of multimedia information systems is very high and, consequently, industry has a considerable interest in the development of these techniques. General solutions to multimedia indexing are very difficult and, where they currently exist, tend to be of limited utility. An example of this is indexing images by their color distribution. This technique can be effectively used in some applications, such as retrieving pictures of fabric in specified color shades, but in many other applications simply cannot be used. Some progress has been made in multimedia indexing for specific applications (for example, retrieval of photographs of faces), and in processing language-related multimedia. Examples of language-related multimedia include text in images, scanned document images, and speech. Given the number of industrial and academic research groups working in this area, steady improvement of the techniques available can be expected.