Darwin - Distribuited Agents for Retrieval the Web Network
Distributed Agents to Retrieve Web Intelligence
email E-mail    
HOME                 Registration        About us  Alliances  Contact us |
 Attractions    Bibliography    Classified Papers    Search    Infomaps    Site Map    i-Web    Images Bank    White Papers  
Login Area 11111
Forgot your password?

free counters
Agents Tour | AI-Lab | Darwin Tour | Faq | News | Newsletter | Pag's | Press Releases | Search Tutorial

Search Engines and i-Mapping

Got to see the our Agents' Tour

When looking for Web authorities we mean Websites that deal with authoritativeness specific topics. Those authorities are like e-books with an index, a prologue, a body of chapters/sections, and sometimes a bibliography/ credits, and an epilogue. Among authorities we may find Portals, B2C, B2B, Vortex, Virtual Communities, Essays, Academic papers, Directories, and Institutional Websites. Generally speaking these units of information and knowledge are collections of files inter connected by hyperlinks. The Web trend is to have a Home where from the website Logical Tree starts.

However Search Engines register its content on a "page per page" basis. Any conceptual unit as for instance a Portal splits in as many pages as it has, each one indexed by its path, its title and its "popularity" or a conventional measure of it. This is a severe limitation even though justified by the robotics' state-of-the-art. It would be almost impossible for actual "spiders", a jargon for some robots that continuously browse the Web feeding their Search Engines, to detect conceptual units.

For this reason the "Thesaurus" of each conceptual unit splits along its pages as it's depicted in the figure above. This limitation plays a crucial role when looking for similar conceptual units.

Each conceptual unit (Cognitive Object) of i-Maps is mapped with a summary or i-URL that has a Header, a Body and Footer. The Editor, a human being, edits the Header and Body extracting the whole set of keywords within the conceptual unit, shown as the collection [k1 k2 k3 ………k15 k16 k17]. We may suppose that Editors use similar writing criteria that conceptual units' authors when editing Bodies and Homes respectively. That means similar use of keywords. In the figure we may argue that keywords k1, k2, and k3 are global, in the upper level of the topics hierarchy, meanwhile from k4 to k17 rests in lower and more specific hierarchy levels of the Logical Tree. Editors that map authorities are trained to identify and distinguish these subtle differences.

Once defined this remarkable difference between Search Engines and i-Maps architectures let's try to create a methodology to find similar conceptual units, essentially "Homes" from which i-Map users may navigate throughout similar Logical Trees. A first trivial reasoning, that probes to be wrong, is to look for similar keywords pattern. This procedure is intrinsically wrong because the obliged mentioned split. It would be almost impossible to find a page that satisfies so many meaningful keywords unless it is a meaningless sample of them. The Search Engines only provide pages and we have to detect similar Homes or gateways to similar conceptual units instead.

A less trivial procedure would be combinatorial analysis, trying to build Markov´s chains of as much as possible keywords with non null query outcome and then merge outcomes, selecting out of them the similar. This procedure would as complex as judging the best move in the middle game of a chess play, even with the aid of the most powerful computers.

A shortcut would be to build Markovian chains limited to body keywords. The convergence probed to be high; usually chains of four or less links get null outcomes. Our experience as content experts tells us that Home coincidences of three keywords evidence a sound similarity. However one problem still remains: what about the rest of keywords?. Could we assure that Home coincidences are a guarantee of the rest of coincidences?. Of course we could not. To assure that there is no other way that human inspection. Notwithstanding, Home similarities tells us probable Logical Tree similarities and Logical Tree similarities tells us about probable keywords similarities as well. Sometimes, keywords out of its correct hierarchy appears within the i-URL's bodies as it's shown in the figure but they are considered as belonging to the upper level carrying some derivation noise of the similarity.

Finally, a human check up could be realized over the whole i-Map extension. The human judgment based on visual inspection is very efficient. For i-Maps of nearly 3,000 conceptual units that check could be performed in a working day by a single person. We advise this check when i-Website owners decide to facilitate a Similar Database to their users. When one-at-a-time feature is offered users could be enabled to reuse retrieve agent as many times as they want having a Markovian outcome each time.

Back to the last page

Copyright © 2003-2013 Darwin! Inc. All rights reserved.