21. Darwin Technology

By Juan Chamero




Darwin Technology
Intag, Intelligent Agents Internet Corp
By Dr. Juan Chamero, Principal Architect
April 2008

 


ABSTRACT


    Darwin Technology is a Knowledge Management Multi Agents computing system oriented to “Knowledge Discovery”. It is specially suited to deal with huge and Unstructured Data Reservoirs like for instance the Web. It is based on a New Knowledge Ontology named Darwin Ontology that models the process of knowledge documenting. It states that along language evolution humans learn to document based on a set of rules and principles not formally explicit but culturally agreed and that could be unveiled and formally described whether possible performing data analysis over a huge enough text “corpus”. Darwin ontology is based on a set of Conjectures that are to be tested over this corpus.
    As the whole Web could be considered a huge enough corpus we designed a technology to test these conjectures with the idea of unveiling the hidden intelligence of it under the form of a probabilistic order. Our initial idea was that the whole Web has, as structured databases, an order but hard to see and of course of probabilistic nature. To check this assumption we mapped two major disciplines, Computing and Art thru a Multi Agent Darwin algorithm. All our conjectures probed to work, unveiling at a high resolution level the hidden intelligence of these two disciplines.  Concerning the art prototype from more than a billion of documents dealing or mentioning art (in English) we unveiled the hidden art structure: 7,570 thematic nodes along a tree of 13 levels encompassing about 300,000 concepts. An important feature was that this tree was built starting from “zero ground” with a very basic seed of less than 40 nodes. Next step will be to map the whole Web “knowledge wood” behind its corpus: 200 Major Subjects or Disciplines, 350,000 nodes –themes- and about 10 million concepts. 
    With these maps it is possible to build YGWYN, You Get What You Need in Only One Click Search Engines like retrieving in a perfectly ordered World Virtual Library where everything is semantically indexed. Talking of a generalized networking man-machine interaction Darwin Ontology covers both sides, unveiling and ordering first the machine side and then trying to unveil-infer what man (people) thinks.

Note: This work started in Year 2000 and collaborated in its development, research and implementation 150 persons centered in an American R&D company, Intag, Intelligent Agents Internet Corp and the CAECE University from Argentina.

Web References:
Institutionally, Intag, Intelligent Agents Internet Corp
AI Website, Intag.org
Darwin Blog
Darwin Demo:, for a more complete demo please ask for a Telecom presentation to Juan Chamero
Art Mapping Prototype Sample, see the three sheets!
Art Timeline Series, a Power Point Presentation of a timeline series retrieved by an agent and adjusted by a human.

Darwin Justification

What is it?: It is a Knowledge Management Multi Agent technique based on a New Knowledge Management Ontology.
What for?: To unveil the hidden intelligence of Data Reservoirs.
Why?: Because Information Retrieval and some associated techniques such as Data Mining are insufficient to satisfy actual users needs in terms of information.

How to cope with the Semantic Web

    IR, Information Retrieval has being up to now the science of searching but generally restricted to more or less structured Data realms. By structured we mean n-dimensional sets of arrays, vectors and tables where the pieces of content are located. All databases are somehow ordered and all Search Engines have their own Web database so why we affirm that the Web is not ordered?. Conventional Search Engines SE’s provide at least what we define as a “zero ground” semantic order where all Web pages are indexed by the words of their content but sharing the same semantic level of uncertainty. However this indexing say almost nothing about the specific subjects these pages deal with. If we suppose that the Human Knowledge has at least 10 levels of ordering from general matters to very specific ones this semantic “order” is what is still missing to talk of the “Semantic Web” as a reality, the Tim Berners Lee utopia.
    Darwin provides this order unveiling the inherent order hidden in the Web text corpus.

Note: In fact the Web is a multilingual text corpus.

Towards “meaning” discovery

    In the Web space, daily, openly, freely, and virtually unformatted millions of documents and messages are hosted, queried, edited, downloaded and interchanged. You know that a data set may have from zero to too much knowledge depending on the “meaning” of its content. We may also argue that this meaning was a creation of an intelligent being, individual or collective and it could be eventually “explained and documented”. Why if we define this meaning, a little more than information, as the inherent intelligence of the data set?. And what would happens if at large in huge data sets the universe of meanings were somehow related to specific content patterns?. Darwin technology as Data Mining are Knowledge Discovery Tools, but Darwin technology with a “Choice Modeling” approach goes a little farther pointing to the best possible meanings, namely “authoritative meanings”.
    Two way network communications are possible when all connected members know their respective languages and codes. Data Mining is insufficient to precisely point to specific meanings in the Web because it only unveils main tracks and patterns being unable to answer questions like:
Why are these patterns produced?;
Who generated these patterns?; and  specially
How these patterns were generated?.  

Note: Data Mining can not solve the problem of collinearity: Its intrinsic weakness is that critical data that may explain patterns is never observed.

Towards the best query
    Usually IR provides the best outcome to a given query but say nothing about “the best query”. This is logic because to do that one step is still missing: to know as much as possible about the user needs, let’ say the “other” extreme of the man-machine communication. We may imagine a facilitator link between the user and the IR tool that aid he/she to build the best query in order to obtain what he/she is really looking for!.  Not trivial mainly because users may express what they need in a wide variety of forms, languages and codes. If this link were a wise enough human to master the cognitive offer hosted in the “machine” side and at the same time endowed with a vast culture about people needs in terms of information we may then imagine his/her task via s smart dialog. Darwin ontology takes into account these two “realms”: man-machine trying to replace the hypothetical human link by a smart “e-membrane” managed by agents. 


Index of Darwin Blog
Recommended Sequence of Reading

1. What’s in a document? – Part I
How are documents “seen” and indexed by conventional Search Engines
2. What’s in a document? – Part II
Trying to “see” better along the Web text corpus: human “reading” discrimination between two types of semantic particles: Common Words and “keywords”.
3. What’s in a keyword?
We go deep analyzing different types of keywords and entering into the more elaborated term “semantic chain” and “concept”.
4. What a concept is
We give by provisionally accepted a milestone Darwin Conjecture: that knowledge structures as a tree, a tree by “discipline” and where each node corresponds with a “subject” identified by the semantic chain of n-links that goes thru n levels from the root of the tree to the subject node. Any subject is identified by a set of specific concepts –Specificity Rule- and where each concept is also identified by a semantic chain of n+1 links, being the last the conventional “keyword”. 
5. Let´s play a little with Google
We play with Google, one of the best and more complete actual Search Engines to learn about the new concepts. Now playing we are in conditions to appreciate the Web as_it_is.
6. How users search and how they “discover” their own keywords
We enter now into the “users’ realm to see how humans look for what they “need”. We introduce here the symmetric “user keyword” and the ideas of “People Knowledge” and “People’s concepts”. From the man-machine matchmaking we introduce the idea of “e-membranes”.
7. Toward e-Libraries
We post here how Darwin imagines the Semantic Web structured with knowledge trees, subjects and concepts. New terms are introduced like Web Thesauruses, “semantic fingerprint” and “authorities”. It’s an introduction to Knowledge Mapping.
8. The Web as a semantic hypercube
We are now ready to imagine how to either re-structure the whole Web by semantically index all documents properly or just keep everything as it is now but implementing “semantic glasses” to see everything as ordered!. Semantic glasses behave like intelligent e-membranes matchmaking users versus the Web. We introduce here a strong Conjecture: “Authorities” tend to write well: introducing the idea of WDD’s, Well Written Documents.
9. How do e-membranes work to build Thesauruses?
We explain Darwin as an industrial process similar to oil distillation in refineries. How we star from “zero ground” semantically (as Conventional Search Engines are actually structured, all themes scrambled) and proceed to build the semantic virtual hypercube, level by level from a “seed” at root level. It explains how agents explore the node neighborhoods looking for semantic tree consistency. This process computes for each node its “semantic fingerprint”, in fact a coded meaning.
10. A Reflection stop
At this stadium of our long lasting semantic trip -since 2001 year- we have been stimulated by the outcomes of two prototypes that prima facie confirmed all our suppositions -see Darwin Conjectures-. So I consider this is a good time to document some basic reflections: how humans build concepts and why our knowledge structures itself as a tree!.
11. Darwin Conjectures
We are now in condition to understand the meaning and scope of our Ten Conjectures.
1. About Generalized man-machine dialog.
2. About the two types of semantic particles in that dialog
3. About the two types of semantic particles in any document
4. The communication as virtually performed thru e-membranes
5. Knowledge structures itself as a set of trees -wood-
6. About this tree structure
7. About “authorities” and authoritativeness
8. About semantic fingerprints and Web Thesaurus
9. About People’s Thesaurus
10. About a generalized semantic man-machine dialog
12. Art Map Prototype – I
We describe here how we proceed to make the art tree grow from a basic semantic seed and how going deeper and how the retrieved tree is harmonized as much as possible in order to facilitate the ulterior use for information retrieval. In numbers 7,570 nodes, 350,000 concepts distributed along a tree of 13 levels.
13. Art Map Prototype – II
It details how fingerprints and specific Virtual Libraries of Authorities are built. It envisages the possibility to implement a system of semantic indexing via document fingerprints building.
14. Keywords detection
It details here the process of keywords detection out of the Web text corpus. This step is similar to some data mining techniques used in linguistic.
15. Darwin Ontology, The Web as_it_is
Darwin Ontology imagines the Web man-machine interaction as one performed between the “Established Order” and the Multitude. It presents the Web as a social model at planetary scale using at full the powerful Internet technology enabling the open, free and spontaneous flow of information and intelligence between those two realms.
16. K-side dissection
A micro sample dissection of the Web content as_it_is. This dissection was performed by humans, people of our staff. However this dissection could be performed by agents in a near future. For many this dissection may look very different as they expected.
17. K’- side dissection
The same applied to the people’ side as_it_is seen as a Multitude by our staff. Take a look to New, Libertarian, Popular and Bizarre activities!.
18. A little more about Darwin Ontology I
It is another reflection about the work performed by our staff along the last five years, how our two prototypes evolved and the near future of applications. Initially we unveiled the Computing discipline starting with a given semantic skeleton of 5 levels. In our second prototype we unveiled the Art discipline from a very primitive seed but a seed at last. Our third prototype probably will unveil an important discipline without seed, equivalent to start with zero information.
19. A little more about Darwin Ontology I
We present here the main differences between Darwin technology and Data Mining and envisages the next step: to model the people reasoning as per their free, open and spontaneous interaction with the Established side!
20. Darwin Applications scope
The scope of Darwin technology is presented here. In our criterion Darwin will make digital knowledge something absolutely manageable at least textually. Another thing is knowledge expressed in images, sounds, smells, and tactile forms of information. We are studying how Darwin Ontology could be extended to deal with these forms.