How Advanced Search Works

How Extended Search Works




Darwin-FIRST performs its Extended II facility via a pool of Search Engines. A special Procurement Agent, procurebot_ks, merges in a list a bulk of top references for the user query, for a given pair (k, s). This list is represented by the left rectangle of the figure above. This list is submitted to a two steps optimization process: by "Bad Words" and by "Priorities". Bad Words (yellow column) are those words considered discrediting if present in a Search Engine reference for a given context (subject).

First Step

Words like "courses" make a reference suspicious of being course announcements for a given curricula. Of course this reference should be considered objectionable if we are looking for some specific content: for instance when querying for "Gray Algorithms" surely we want some paper dealing with this topic, not about a course that among many others topics deals with Gray Algorithms. Bad Words are pre selected by human experts for any specific subject. Bad Words for a given subject could be Good Words for others, for instance course could be a desirable word for Learning and Teaching queries.

Bad Words are used to clean the Top list of suspicious references, "killing" those ones tinted by them as is shown in the figure. From 50 references in the example, 21 were accepted and 29 rejected for the pair {"ciphers", "Algorithms and Complexity"). The responsible Bad Word for each killing is depicted below: book cs book book

Second Step

Second step is the reordering process within the cleaned list. As we are going to finally select the Top 10, the reordering process is very important because some "hidden" references could upgrade and vice versa some privileged could be downgraded, attending the user preferences. In this Version 1.0, Darwin-FIRST uses a by default set of preferences for each subject. In future versions some user setting facility will be enabled.

In the example we use three priorities criteria, namely: by Domain, by Extensions, by Good Words. For instance, [edu, pdf, 2003], meaning a paper written in PDF document in a High Studies educational site in year 2003 could be a nice "hand" to go upwards!. Darwin-FIRST agents are well trained to perform this task, learning as much as possible from experience because criteria and combination criteria successes and failures are precisely tracked.