Search
  


Glossary

Attributes
       Aspects of information about a content object. Attributes may be fields, tags, and meta-tags. E.g. a document can have a “language” attribute and a “date” attribute.

Authority File
       A form of vocabulary control comprising a list of preferred terms or proper names to be used when indexing and searching a set of entities within a limited domain. Includes references from the non-preferred terms. See also Controlled Vocabulary; “Synonym Rings”; “Classification Schemes”; “Thesauri”

Automatic Categorization
       Application of software that uses human-defined rules or pattern-matching algorithms to automatically assign controlled vocabulary metadata to documents. Also denotes the assignment of documents to categories within a taxonomy.

Bayesian Analysis
       A widely used statistical technique for analyzing text . Infers topicality from patterns of words and phrases present in documents. A “probabilistic” method because it returns a likelihood of a document belonging to a topic.

Candidate term
       A term considered for admission into a controlled vocabulary because of its potential usefulness. See also provisional term.

Card Sorting
       A technique to understand how users group information within a particular domain by having users organize cards representing specific types of information from their perspective.

Categories
       Categories are terms in a subject-based classification (taxonomy) or controlled vocabulary (thesaurus) into which information objects are grouped. Categories can be a plain list or they can be arranged in a hierarchy. The process of assigning information objects to categories is referred to as “categorization”.

Categorization
       Process of assigning documents to pre-defined categories in a taxonomy.

Category
       The ultimate classes of phenomena (what we perceive in the world around us)

Class
       A number of individuals (persons or things) possessing common attributes that are grouped together under a general or “class” name .

Class-Concept
       Where a concept is the same as the name of a class

Classification
       Classification refers to the systematic grouping of like things or objects into classes or categories according to some shared quality or characteristic.

The term “classification” can refer either to the process of defining the categories and structure of a classification scheme or to the process of assigning documents to their appropriate categories. For example, developing the categories and structure for an intranet taxonomy is one kind of classification. The assigning of a Dewey Decimal code to a book, based on the mapping of its content into the Dewey Decimal system is another form of classification.

The term “categorization” is often used by vendors of automatic categorization tools to describe the classification of an item of content according to a taxonomy.

Classification scheme
       A type of controlled vocabulary designed for classifying or categorizing resources. The terms in a classification scheme are arranged in hierarchies, which are often well adapted to assist browsing.

Clustering
       A technique of automatic categorization which segments a document collection into subsets of documents/words with the members of each subset being similar with respect to certain features.

Compound term
       A term consisting of more than one word or a phrase that represents a single concept. Compound terms must be constructed according to the guidelines contained in the Z39.19 NISO standards.

Concept
       Defined as an aspect of thought, a concept is a kind of unit in terms of which one thinks. Concepts are not to be confused with the terms that refer to them. Different terms can refer to one concept and more than one concept can be referred to by one term.

Content
       Information that has a tangible aspect because it has been collected and contained in a content object. Content can be unstructured (usually text) or structured (in a database). Content can be collected at different levels of granularity.

Content Management
       The rules (policies, procedures, standards) , roles (people who perform the management) and resources (money, time, software) used to author, evaluate, organize, publish, maintain, and store content objects for a site.

Content map
       A visual representation of an information environment used by information architects to visualize relationships between content categories and to explore navigation pathways within content areas. Typically high-level and conceptual in nature, they help to bridge the top-level business view of content and the concepts contained in the content itself.

Controlled vocabulary
       A predefined subset of natural language for tagging and indexing a collection of information objects. A controlled vocabulary lists the authorized terms to be used and their relationships. The use of a controlled vocabulary in an information system increases indexing consistency and helps match the searchers’ natural language query with the index terms.

There are four basic types of vocabulary control: Synonym ring, Authority files,Taxonomy, Thesaurus.

Data Mining
       Basic process employed to analyze patterns in data and uncover hidden information stored in alphanumeric databases.

Descriptor
       A type of heading that is a term chosen as the preferred expression of a concept in a thesaurus.

Directory
       A directory is a listing of associations between pieces of information. Typically, directories are fairly flat classifications that allow users to look at lists of related objects, such as an employee directory or a list of annual reports by sector. These classifications can have taxonomy-like relationships but are not as deep and consistent as a taxonomy classification system. A typical example is the directory structure on a desktop computer.

Domain
       A specific subject area or area of knowledge like medicine, real estate, financial management, automobile repair, etc.

Facet
       A fundamental category by which an object or concept may be described according to its characteristics. For example, a child’s ball may be described using the facets of size, weight, shape, colour, texture, material and price; wine might be by price, type, country and region.

Facet analysis
       The process of analyzing content to determine appropriate facets and vocabulary term relationships, using one characteristic of division at a time, to produce homogeneous, mutually exclusive groups.

Fields
       See “Attributes”.

Folksonomies
       The tagging of content with metadata or information by users and community members based on their personal preferences. Folksonomies allow any user to add comments or information that other users can take advantage of when looking for or organizing their own information. This is also called "social tagging". There are several examples on the Web including del.icio.us, a social bookmarks manager, and flickr.com , for storing and sharing photos.

Granularity
       The level of specificity with which content is described. The more granular, the more specific.

Hierarchy
       A hierarchy is a series of classes or groups in which each class falls into a subgroup of a larger group, which in turn forms part of an even larger group.

A hierarchy begins with a word or group of words expressive of a broad subject domain. This domain is further subdivided again and again into progressively more specific classes or categories according to specific criteria. The guiding principle of a hierarchy is that each category is contained within the one above.

Information Architect
       Information architect is the job designation for a person who practices information architecture, a person dedicated to organizing and structuring content for access on an intranet web site. A relatively new discipline with as yet no official certification process, an information architect takes into account issues concerning content, users and business context as they impact the information design. Developing a taxonomy is one of the functions an information architect would undertake. An information architect may specialize in thesaurus design, metadata and classification. Specialists in classification are sometimes referred to as taxonomists.

Information Architecture
       Information architecture comprises the art and science of organizing information on an intranet so that it is findable, manageable and useful. As a discipline IA delivers a coherent set of strategies and plans for information access and delivery inside an organization. It includes designing the organization, labeling and navigation schemes to create an information space on an intranet to help people find and manage information.

Information architecture does not deal directly with implementation techniques and tools except as they affect strategy.

A person who undertakes information architecture is designated an information architect.

Information Object
       A digital item or group of items referred to as a unit, regardless of type or format, that a computer can address or manipulate as a single object. The object may be comprised of a single item, or may be an aggregate of many items.

Interoperability
       Interoperability is the ability of multiple systems or components having different hardware and software platforms, data structures, and interfaces to exchange data with minimal loss of content and functionality.

Intranet
       Intranet is the implementation of Internet technologies on a private network within a corporate organization. The intranet may link to the Internet but access from the Internet to the intranet is blocked. Intranets help employees collaborate on business processes by providing ready access to the information and people necessary to carry out these processes. Sometimes referred to as a collaborative workspace, a company portal, an information management system, or a knowledge base.

Lead-in entry
       In a controlled vocabulary an entry provided to guide a user from a non-preferred term to the corresponding preferred term. See also variant term.

Markup Language
       Markup language is a set of codes or tags that surrounds content and tells a person or program what that content is (its structure) and/or what it should look like (its format). Markup tags have a distinct syntax that sets them apart from the content that they surround.

Three of the most well known markup languages are:

  1. SGML (Standard Generalized Markup Language)
  2. HTML (Hyper Text Markup Language)
  3. XML (eXtensible Markup Language)

Metadata
       Metadata is data or information that identifies and describes an information object and makes it possible to understand, manage and use the object. Metadata can also document how that object behaves, its function and use, its relationship to other information objects, and how it should be managed. Metadata is primarily used in database applications to identify, locate and retrieve information objects. For example information objects might be web pages, documents, audio and video files. Metadata can include such elements as author, title, language, copyright.

Metadata schema
       Metadata schema is a set of metadata elements where the schema identifies the element names and formatting rules for each element. A schema is usually created to describe information objects in a standard way so that they can be accessed by other users and applications. For example, a mailing address has a name, address, city, state, and postal code. The metadata schema is this description of the mailing address. The schema provides a structural model that identifies how information objects are represented in a database.

Metasearching
       The simultaneous searching across multiple databases, sources, platforms and protocols. Also known as broadcast searching, federated searching or parallel searching.

Non-preferred term
       The synonym or quasi-synonym of a preferred term. Non-preferred terms are not themselves used for classifying or indexing resources, but are provided as entry points to help people find the most appropriate preferred terms. For example, Older people USE Senior citizens. Also known as “lead-in” or “entry” term.

Ontology
       A common vocabulary for describing the concepts that exist in an area of knowledge and the relationships that exist between them. An ontology allows for a more detailed specification of the relationships in a domain than is the case with a thesaurus or taxonomy. The resulting vocabulary can be used by computers as well as understood by humans.

Polyhierarchy
       A hierarchy in which some vocabulary terms have more than one broader term . For example, in a geographic vocabulary “Rome” might be a narrower term under both “European capitals” and “Italian cities” .

Precision
       A ratio that measures the success of a search. Precision is defined mathematically as the number of relevant items returned by a search divided by the total number of items returned by the search. Precision usually has an inverse relationship to recall ; that is, increasing the precision of a search usually decreases the recall. Precision can be increased by increasing the specificity of vocabulary terms .

Preferred term
       The vocabulary term used consistently in a controlled vocabulary to tag content. See also Relationships.

Provisional term
       A term with temporary status in a controlled vocabulary. It often represents a new concept in a field in which the terminology has not yet been standardized. See also Candidate term.

Recall
       A ratio that measures the success of a search. Recall is defined mathematically as the number of relevant items returned by a search divided by the total number of relevant items in the collection. Recall can be increased by the use of synonym rings and variant terms. Recall usually has an inverse relationship to precision : the higher the recall (ie the more documents), the lower the precision (or relevance of documents to the query).

Relationships
       The controlled vocabulary (or thesaurus) details the relationship of terms with one another. There are three kinds of relationships.

  1. Equivalency: where terms refer to the same concept; for example, cat and feline can be considered as equivalent. Equivalency is the kind of connection between terms in a synonym ring or between preferred terms and variant terms.
  2. Hierarchical: where terms are hierarchically arranged from Broad Term (BT) to Narrow Term (NT). The term at the top is called the Top Term (TT)
  3. Associative: where terms are associated with each other but are not hierarchical or equivalent. They are said to be Related (RT).

Schema
       Schema defines the structure and contents of any information resource. As a data catalog for a database, a schema identifies the entities and the types of attributes for those entities. A schema for an enterprise may also define rules of use and legal values.

Specificity
       The exactness with which a vocabulary term covers a concept. In considering the concept “dog”, the term “canine” is more specific than “animal”. Increasing specificity of vocabulary terms increases precision and granularity, but may decrease recall.

Synonym ring
       One of the simplest of controlled vocabularies. Includes only a list of equivalent terms. When one of the terms is searched, the synonym ring returns results as if the complete set of terms was searched.

Taxonomist
       An Intranet Taxonomist is an information professional who specializes in the classification of digital content into systematic categories for the purpose of navigating through the content on an intranet. Preferred term is Information Architect.

Taxonomy
       A system for naming and organizing things into groups that share similar characteristics. Most typically, a taxonomy places topics or subject categories in a hierarchical relationship to one another, from the broadest to narrowest. Taxonomies can also organize topics in flat, networked and faceted structures.

See Structures for taxonomies

Term
       A word or phrase used to designate a concept. Where more than one term in a controlled vocabulary can designate the same concept, one is chosen as a preferred term and the others are treated as non-preferred terms .

Thesaurus
       Thesaurus is a form of controlled vocabulary consisting of a glossary of words and their relationships used to index and search information objects belonging to a particular subject domain. Thesauri are usually considered the most complex of controlled vocabularies as they include information on the equivalence, hierarchical and associative relationships between the words. A thesaurus differs from a taxonomy in that a taxonomy deals only with the hierarchical relationship between terms.

The main goal of a thesaurus is to improve retrieval through the management of synonyms and the indication of related terms.

Topic Map
       Topic maps organize sets of information as a structured semantic link network. The nodes are topics, each topic representing a subject - person, place, concept. These are connected through associations, and will have occurrences.

Topic Map

Variant term
       A vocabulary term that means nearly the same thing as a preferred term . Variant terms are used in a controlled vocabulary to provide entry terms that lead to preferred terms. Variant terms may include synonyms, lexical variants, quasi-synonyms, and abbreviations.

Vocabulary term
       A word or phrase in a controlled vocabulary. It may be a preferred term or variant term. Vocabulary terms may exhibit several types of term relationships .

XML
       XML stands for EXtensible Markup Language. This is a standard for describing data elements on Web pages and business documents. The tag structure is similar to HTML but XML allows the developer to define particular fields for a page - author, date, subject, amount. XML is used to add metadata to documents. With XML the developer can develop structured pages of content.

Definition of XML at Answers.com. Also refer to Taxonomy Basics > XML Markup Language.

The iSchool Institute, Faculty of Information, University of Toronto  © Copyright 2004