Copyright © 2003, by Synapse, the Knowledge Link Corporation
Few quotes so efficiently describe both the achievement and yet the failing of the information age at the beginning of the twenty-first century. We have terra-bytes of digitized information and global networks connecting millions of computers, but finding specific information is more than ever like looking for a needle in a haystack.
One of the major reasons why information retrieval is problematic is because language is elastic: one word can have many meanings, and one concept can be expressed by many different words. Human beings can understand words better than machines because our brains store a vast and complex web of interrelationships in which no word is an island.
Text-based searching has failed to provide a satisfactory method for interrogating large data sets and will fail increasingly as the sets of data we need to access continue their exponential growth.
The problems of text-based searching have been known in information science for decades as the precision-recall tradeoff.
Empirical studies have shown that precision will decline as recall increases and vice-versa; this tradeoff is inherent in the very nature of text-based searching.

In other words, if you need to ensure you see all the relevant information then you will inevitably have to wade through a greater mass of irrelevant data – whatever you do to improve the comprehensiveness of your search will diminish the proportional purity of the results.
Shakespeare: Hamlet II,2
The human brain has approximately 100 billion neurons, but it is not this capacity that gives us our noble reason. Our infinite faculties derive from the way the brain is wired through a complex network of interconnections called synapses. The number of ways this network can interconnect is not measured in billions; it is a number greater than all the fundamental particles - electrons, protons, neutrons, etc. - in the entire universe. Nature revealing the infinite within the finite.
Interconnectedness is the key to the way the human brain works and also the key to solving the precision-recall dilemma in information systems.
In the human mind words are not isolated islands like they are in machines. Every thought, word and image is intricately connected to other related words and concepts through many subtle relationships. If we want machines to be able to understand our requests for information, and to respond with comprehensive and relevant results, then we need to give them a knowledge-base that is structured the way our own brains work.
The word ‘syndetic’ means to bind together or to connect; in information science it is applied to the cross-referencing of vocabulary terms to represent a variety of conceptual links. Standard methodologies have evolved for building relationships between concepts in information systems using controlled vocabularies. Broadly speaking these include hierarchical, equivalency and associative relationships.
Ultimately, the goal of a controlled vocabulary is to represent each discrete real-world object or unique abstract concept by one and only one unambiguous indexing term, and then to cross-reference these indexing terms to represent the rich interconnections inherent in their conceptual or real-world correlative. homographs (words with more than one meaning) need to be disambiguated, and synonyms (the same meaning represented by more than one word) need to be mapped to one hub term known as the preferred term. Concepts are arranged in one or more hierarchies to represent the various categorical organizations by which the domain may be viewed. Concepts are also linked together by free association with other related concepts to further enrich the syndetic network. These types of interconnections are discussed in more detail in the following three sections.
Dr. Peter Roget, author of the first thesaurus of the English language, described a phenomenon he called ‘the elasticity of language’. Put simply: one word can refer to several different things (homographs), and one thing can be referred to by several different words (synonyms) .
For example, the word ‘mercury’ is a homograph. It could be used to refer to the Roman god of communication, a planet in our solar system, a chemical element, or the trade name of a line of automobiles.

If these variances seem innocuous it is because our brains instantly and subconsciously translate synonyms on-the-fly and differentiate homographs by an internal knowledge-base of subtle contextual associations. But the variances are far from innocuous when it comes to information retrieval systems.
Imagine going into a hardware store run by a computer and asking to buy a bucket…

Less insidious but just as frustrating is the homograph problem where, for example, an online search for astrological articles on Mercury may result in the user wading through thousands (or on the web, millions) of irrelevant articles on gods, cars and chemicals. Refining the search by building a Boolean construct such as ‘mercury’ and ‘planet’ will improve the precision of the search, but comprehensiveness will necessarily suffer – some relevant results will now be excluded because they do not contain the word ‘planet’. In a controlled vocabulary, homographs are disambiguated by the use of parenthetical modifiers, e.g., ‘Mercury (Planet)’; in a taxonomic structure disambiguation might be achieved by the context of the term’s placement in a particular hierarchy.
Equivalency relationships may be used for other kinds of links besides connecting synonyms. They can be used for mapping spelling variants, regional and multilingual equivalents, organization-specific preferred terminology, acronyms and abbreviations.
Back to Table of Contents
They are often presented as taxonomies, ontologies or classifications. In a hierarchy, tree-structures are presented such that each lower level branch has some differentiated attributes while possessing the qualities of its parent branch. Differentiation can be of three types: generic, whole-part or instance.
GENERIC RELATIONSHIPS
Plants
Succulent plants
Cacti
WHOLE-PART RELATIONSHIPS indicate that each narrower term is a part of (component of) its parent term, e.g., spark plugs are a part of the ignition system of an automobile.
Automotive electrical system
Ignition system
Spark plugs
INSTANCE RELATIONSHIPS indicate that the narrower term is an instance of its parent term. The parent term is usually a common noun category whereas the narrower term may be a proper noun, e.g., Beringer Founders’ Estate is an instance of the Merlot subcategory of red wines.
Red wines
Merlot
Beringer Founders’ Estate
Often terms may logically belong in more than one hierarchy, e.g. musical instruments might branch down paths for stringed instruments and percussion instruments; since pianos are a kind of both stringed and percussion instruments, the term pianos would therefore require placement in both structures. This is called polyhierarchy.
One of the problems of hierarchies is that there can be as many different ways of ordering a hierarchy as there are users of the information. The structure depends on the perspective of each viewer.
For example, when building a medical ontology one could identify several major views, each of which lends itself to hierarchical organization:
Diseases
abdominal diseases
cardiovascular diseases
endocrine diseases
…
Symptoms
fever
nausea
rash
…
Therapies
acupuncture
detoxification
radiotherapy
…
It would not be possible to build a single hierarchy that could structure all the permutations of these concepts in a way that would satisfy the variety of user-perspectives. In such instances it may be desirable to create a set of independent hierarchies representing each major facet of the domain of knowledge.
Facets present new challenges in user interface design as multiple hierarchies need to be navigated and dynamically recombined. One possible solution may be described using a metaphor of parallel universes and wormholes. In this model a graphical process allows the user to jump through a succession of separate ontologies or facets with a single query. The wormhole method is unique in that it presents the results as if they are a continuous hierarchical pathway, even though the actual pathway does not exist in the knowledge-base; it is dynamically constructed by the user. In the following two examples the > sign indicate where the user has wormholed from one facet to another; in both cases the results display like a continuous hierarchy.
E.g.1: Type of wine > body > wine region
Wines
Red Wines
Merlot
> Full Bodied
> Napa Region
Beringer
E.g.2: Wine region > body > type of wine
US Wineries
Napa Region
Beringer
> Full Bodied
> Red Wines
Merlot

The links in all of the above examples represent very important associations, but none of them could be expressed through the hierarchical structures of a taxonomy nor through equivalency relationships. These associative relationships are the hardest to define, but they provide the richest and most subtle connections between concepts bringing machines closer to the kind of knowledge-base that human beings take for granted.
Unlike taxonomies, which only store hierarchical associations, an electronic thesaurus can combine the categorical order found in taxonomies, equivalency and mapping relationships as well as more intuitive associative links. National and international standards exist to guide the construction of such thesauri, e.g., ANSI/NISO Z39.19 and ISO 2788 and 5984.
Named-entity authority control is a branch of information science that utilizes the principles of conceptual vocabulary control. The goals of named entity authority control are:
Pseudonyms, aliases, nicknames, married / maiden names, AKAs, working names, short names, long names, code names, legalized names, pen names (noms des plumes), stage-names and sobriquets all serve to confuse and confound the proper identification of people or organizational entities in information systems. All of the above may be examples of proper and legitimate variant forms of names. The problem can be even further compounded when individuals or organizations willfully attempt to conceal their identity.
Formal and informal relationships and associations between individuals, such as marriage, parentage, friendships and business relationships, can also be expressed using customized associative relationships.
Disambiguation of individuals with similar or the same names may need to invoke rules-based validations that reference controlled metadata elements such as birthplace, residences, birth and death dates, state and other official identification numbers, and many other data.
Like personal names, organizational names also suffer from the confusion of aliases, pseudonyms and a plethora of other variant forms. Organizations frequently change their name through the process of mergers, acquisitions and partnerships. Many organizations can be arranged with some hierarchical structure such as the corporate division / subdivision structure.
Important associations between organizations can be identified such as Company A merged with Company B, Company C customer of Company D, etc.
Personal named entities may also be associated with organizational named entities to show the membership of specific individuals in particular organizations.
Most geographic entities lend themselves to hierarchical structuring as well as other associative links.
Following is a diagram of a model enterprise-wide knowledge-base. Multilingual meta-thesauri, private and third-party vocabularies and named-entity vocabularies are all mapped through a single syndetic network. Note: all conceptual vocabulary, including multilingual versions, map to a common hub (the master thesaurus); the master thesaurus and the named entity files are all mapped to each other.

What we are observing in the six degrees of separation phenomenon is the close proximity between nodes in complex networks, and how fast and easy it is to get from one place to another in remarkably few leaps (phone calls, mouse clicks, etc.).
Scale is important - small and simple taxonomies are not sufficient for accessing the rapidly growing information repositories managed by large corporations and government organizations.
Syndetic networks, however, can disambiguate similar concepts and names, pull together variant forms, capture equivalencies between multilingual vocabularies, arrange concepts and named entities into multiple hierarchical groupings, and capture subtle associative relationships between concepts and named entities.
The richer the network’s cross-referencing within and between these vocabularies, the more successful will be the syndetic network at answering user queries accurately, comprehensively and quickly.
MURRAY GELL-MANN