Recent developments in Indexing, Searching and Information Retrieval


 
Search Engines
SE news
  • New largest Search Engine Alltheweb.com launched by Fast Search & Transfer
  • NREN Search and Index Services
    Special purposes Search Engines
    SE Special Services
    SE Technologies
  • Report on the 1999 Search Engines Meeting by Avi Rappoport
  • Search Engines Tools
  • Free Indexing and Searching Software
  • Commercial SW
  • SE tips and links
    Search Engine Projects
    Search Engines Papers
  • Research Papers related to Google!
  • Research Papers related to IBM CLEVER Searching Project
  • Other SE papers
  • SE Legal issues
    Standardisation
    W3C Work: HTML/XML
    IETF Work: Common Indexing Protocol
    IETF work: other standards
    Metadata and XML/RDF
  • Standardisation
  • Metadata/RDF Resources and Publications
  • XML Searching
  • Subject Gateways
    Subject Gateway Projects
    Papers on Subject gateways
  • i18n in Subject gateways
  • Automatic Classification Other SE related technologies
    Expert Systems
    SE Business News
    Directories Business news

    This page is updated regularly, please send your suggestions to: demchenko@terena.nl.


    Standardisation

    W3C Work: HTML/XML

    W3C Web Content accessibility initiative (WAI)
    Web Content accessibility Guidelines
    http://www.w3.org/TR/WAI-WEBCONTENT

    Web Architecture: Describing and Exchanging Data
    W3C Note 7 June 1999
    http://www.w3.org/1999/04/WebData
    Building a space where automated agents can contribute - just beginning to build the Semantic Web. The RDF Schema design and XML Schema design began independently, proposed common model where they fit together as interlocking pieces of the semantic web technology.

    Composite Capability/Preference Profiles (CC/PP): A user side framework for content negotiation
    W3C Note 27 July 1999
    http://www.w3.org/TR/NOTE-CCPP/
    In this note we describe a method for using RDF, the Resource Description Format of the W3C, to create a general, yet extensible framework for describing user preferences and device capabilities. This information can be provided by the user to servers and content providers. The servers can use this information describing the user's preferences to customize the service or content provided. The ability of RDF to reference profile information via URLs assists in minimizing the number of network transactions required to adapt content to a device, while the framework fits well into the current and future protocols being developed a the W3C and the WAP Forum.

    International Layout
    W3C Working Draft 26-July-1999
    http://www.w3.org/TR/WD-i18n-format/
    The following specification extends CSS to support East Asian and Bi-directional text formatting.

    Platform for Privacy Preferences (P3P) Specification
    W3C Working Draft 7 April 1999
    http://www.w3.org/TR/WD-P3P/
    This document describes the Platform for Privacy Preferences (P3P). P3P enables Web sites to express their privacy practices and enables users to exercise preferences over those practices.

    POIX: Point Of Interest eXchange Language Specification
    W3C Note - 24 June 1999
    http://www.w3.org/TR/poix/
    The "POIX" proposed here defines a general-purpose specification language for describing location information, which is an application of XML (Extensible Markup Language). POIX is a common baseline for exchanging location data via e-mail and embedding location data in HTML and XML documents. This specification can be used by mobile device developers, location-related service providers, and server software developers.

    Annotation of Web Content for Transcoding
    W3C Note 10 July 1999
    http://www.w3.org/TR/annot/
    This proposal presents annotations that can be attached to HTML/XML documents to guide their adaptation to the characteristics of diverse information appliances. It also provides a vocabulary for transcoding, and syntax of the language for annotating Web content. Used in conjunction with device capability information, style sheets, and other mechanisms, these annotations enable a high quality user experience for users who are accessing Web content from information appliances.

    XML Schema Part 1: Structures
    W3C Working Draft 6-May-1999
    http://www.w3.org/TR/xmlschema-1/
    XML Schema: Structures is part one of a two part draft of the specification for the XML Schema definition language. This document proposes facilities for describing the structure and constraining the contents of XML 1.0 documents. The schema language, which is itself represented in XML 1.0, provides a superset of the capabilities found in XML 1.0 document type definitions (DTDs.).

    XML Schema Part 2: Datatypes
    World Wide Web Consortium Working Draft 06-May-1999
    http://www.w3.org/TR/xmlschema-2/
    This document specifies a language for defining datatypes to be used in XML Schemas and, possibly, elsewhere.

    XHTML™ 1.0: The Extensible HyperText Markup Language
    A Reformulation of HTML 4.0 in XML 1.0
    W3C Working Draft 5th May 1999
    http://www.w3.org/TR/xhtml1/
    This specification defines XHTML 1.0, a reformulation of HTML 4.0 as an XML 1.0 application, and three DTDs corresponding to the ones defined by HTML 4.0. The semantics of the elements and their attributes are defined in the W3C Recommendation for HTML 4.0. These semantics provide the foundation for future extensibility of XHTML. Compatibility with existing HTML user agents is possible by following a small set of guidelines.

    Document Object Model (DOM) Level 2 Specification
    Version 1.0
    W3C Working Draft 19 July, 1999
    This specification defines the Document Object Model Level 2, a platform- and language-neutral interface that allows programs and scripts to dynamically access and update the content, structure and style of documents. The Document Object Model Level 2 builds on the Document Object Model Level 1 (http://www.w3.org/TR/REC-DOM-Level-1 ).
    This release of the Document Object Model Level 2 has all of the interfaces that the final version is expected to have. It contains interfaces for creating a document, importing a node from one document to another, supporting XML namespaces, associating stylesheets with a document, the Cascading Style Sheets object model, the Range object model, filters and iterators, and the Events object model. The DOM WG wants to get feedback on these, and especially on the two options presented for XML namespaces, so that final decisions can be made for the DOM Level 2 specification.

    IBM online XML education courses
    http://www2.software.ibm.com/developer/education.nsf/xml-onlinecourse-bytitle

    IETF Work: Common Indexing Protocol

    RFC 2651: The Architecture of the Common Indexing Protocol (CIP)
    J. Allen, M. Mealling
    ftp://ftp.isi.edu/in-notes/rfc2651.txt
    This document describes the CIP framework, including its architecture and the protocol specifics of exchanging indices.

    RFC 2652: MIME Object Definitions for the Common Indexing Protocol (CIP)
    J. Allen, M. Mealling
    ftp://ftp.isi.edu/in-notes/rfc2652.txt
    This document describes the definitions of those objects as well as the methods and requirements needed to define a new index type.

    RFC 2653: CIP Transport Protocols
    J. Allen, P. Leach, R. Hedberg
    ftp://ftp.isi.edu/in-notes/rfc2653.txt
    This document specifies three protocols for transporting CIP requests, responses and index objects, utilizing TCP, mail, and HTTP.

    RFC 2654: A Tagged Index Object for use in the Common Indexing Protocol
    R. Hedberg, B. Greenblatt, R. Moats, M. Wahl
    ftp://ftp.isi.edu/in-notes/rfc2654.txt
    This document defines a mechanism by which information servers can exchange indices of information from their databases by making use of the Common Indexing Protocol (CIP). This document defines the structure of the index information being exchanged, as well as the appropriate meanings for the headers that are defined in the Common Indexing Protocol.

    RFC 2655: CIP Index Object Format for SOIF Objects
    T. Hardie, M. Bowman, D. Hardy, M. Schwartz, D. Wessels
    ftp://ftp.isi.edu/in-notes/rfc2655.txt
    This document describes SOIF, the Summary Object Interchange Format, as an index object type in the context of the CIP framework.

    RFC 2656: Registration Procedures for SOIF Template Types
    T. Hardie
    ftp://ftp.isi.edu/in-notes/rfc2656.txt
    The registration procedure described in this document is specific to SOIF template types.

    RFC 2657: LDAPv2 Client vs. the Index Mesh
    R. Hedberg
    ftp://ftp.isi.edu/in-notes/rfc2657.txt
    LDAPv2 clients as implemented according to RFC 1777 have no notion on referral. The integration between such a client and an Index Mesh, as defined by the Common Indexing Protocol, heavily depends on referrals and therefore needs to be handled in a special way. This document defines one possible way of doing this.

    IETF work: other standards

    Uniform Object Locator - UOL
    J. Boynton
    http://www.ietf.org/internet-drafts/draft-boynton-uol-00.txt
    A Uniform Object Locator (UOL) provides a hierarchical "human-readable" format for describing the location of any single attribute within any data object. A UOL emulates the internal structure of a data object by dividing a partial URL into two re-usable components; An object constructor and an object name.
    The UOL format is particularly suited for retrieval and storage of parameter values through multiple object layers. Its basic construction allows it to be combined with a URL; without modification. Possible uses include distributed object management, XML, and e-business development.

    Context and Goals for Common Name Resolution
    Larry Masinter, Michael Mealling, Nicolas Popp, Karen Sollins
    http://www.ietf.org/internet-drafts/draft-popp-cnrp-goals-00.txt
    This document establishes the context and goals for a Common Name Resolution Protocol.

    Internationalized Uniform Resource Identifiers (IURI),
    Larry Masinter, Martin Duerst
    http://www.ietf.org/internet-drafts/draft-masinter-url-i18n-04.txt

    Tags for the Identification of Languages
    H. Alvestrand
    http://www.ietf.org/internet-drafts/draft-alvestrand-lang-tags-v2-00.txt
    This document describes a language tag for use in cases where it is desired to indicate the language used in an information object. It also defines a Content-language: header, for use in the case where one desires to indicate the language of document.

    RFC 2611: URN Namespace Definition Mechanisms
    L. Daigle, D. van Gulik, R. Iannella, P. Faltstrom
    ftp://ftp.isi.edu/in-notes/rfc2611.txt

    i18n and Multilingual support in Internet mail. Standards Overview. Yuri Demchenko
    http://www.terena.nl/libr/tech/mldoc-review.html

    Other Standardisation

    Search Engine Standards Project
    http://www.searchenginewatch.com/standards/

    Domain Restriction Proposal
    http://www.searchenginewatch.com/standards/proposals.html

    Standard for Robot Exclusion
    http://info.webcrawler.com/mak/projects/robots/norobots.html

    Robots META Tag
    http://www.searchtools.com/info/robots/robots-meta.html
     

    Metadata and XML/RDF

    Standardisation

    RFC-2413 Dublin Core Metadata for Resource Discovery
    http://www.ietf.org/rfc/rfc2413.txt

    Encoding Dublin Core Metadata in HTML
    Internet Draft
    http://www.ietf.org/internet-drafts/draft-kunze-dchtml-01.txt

    Guidance on expressing the Dublin Core within the Resource Description Framework (RDF)
    http://www.ukoln.ac.uk/metadata/resources/dc/datamodel/WD-dc-rdf/

    Resource Description Framework - RDF
    http://www.ukoln.ac.uk/metadata/resources/rdf/

    W3C Resource Description Framework (RDF) Model and Syntax - recommendation
    http://www.w3.org/TR/REC-rdf-syntax/

    W3C Resource Description Framework (RDF) Schemas - proposed recommendation
    http://www.w3.org/TR/PR-rdf-schema/

    Resource Description Framework (RDF)
    http://www.w3.org/RDF/

    Metadata and Resource Description
    http://www.w3.org/Metadata/

    Dublin Core
    http://purl.org/metadata/dublin_core/

    Dublin Core Metadata Element Set: Reference Description
    http://purl.oclc.org/DC/about/element_set.htm

    User Guide Working Draft 1998-07-31
    http://purl.oclc.org/DC/documents/working_drafts/wd-guide-current.htm

    1999-07-02: Dublin Core Elements, Version 1.1 moves to Proposed Recommendation
    The Dublin Core Directorate is pleased to announce that a set of revised element definitions (Dublin Core Elements, Version 1.1) has been completed and is for public review and comment as a Proposed Recommendation of the Dublin Metadata Initiative.
    http://purl.org/dc/documents/proposed_recommendations/pr-dces-19990702.htm
     

    CEN/ISSS Workshop on MMI (Metadata for Multimedia Information)
    http://www.cenorm.be/isss/Workshop/MMI/Default.htm

    CEN/ISSS Metadata Framework, edited by Stewart Granger
    http://dialspace.dial.pipex.com/town/way/gkh12/frame/main.html

    CEN/ISSS' The European XML/EDI Pilot Project
    http://www.cenorm.be/isss/workshop/ec/xmledi/isss-xml.html

    The Role of the XML/EDI Guidelines
    http://www.cenorm.be/isss/workshop/ec/xmledi/xmlbook.htm

    Guidelines for using XML for Electronic Data Interchange, Version 0.05, 25th January 1998
    http://www.xmledi.net/guide.htm

    The Global Repository Initiative
    http://www.xmledi.com/repository/

    White Paper on XML Repositories for XML/EDI
    http://www.xmledi.com/repository/xml-repWP.htm

    Dublin Core/MARC/GILS Crosswalk
    Network Development and MARC Standards Office
    http://www.loc.gov/marc/dccross.html

    Character Set and Language Negotiation (2) in Z39.50
    http://lcweb.loc.gov/z3950/agency/defns/charsets.html

    Registry of Z39.50 Object Identifiers
    http://lcweb.loc.gov/z3950/agency/defns/oids.html

    Metadata.Net - Metadata Tools and Services
    http://metadata.net/

    Meta Data Coalition
    http://www.mdcinfo.com/

    An Introduction to the Meta Data Coalition's Initiatives
    http://www.MDCinfo.com/papers/intro.html

    Open Information Model
    MDC OIM Version 1.0 review draft, April 1999
    http://www.mdcinfo.com/OIM/OIM10.html

    OIM proposed models
    Knowledge Description Model
    http://www.mdcinfo.com/OIM/models/KDM.html

    Meta Data Interchange Specification MDIS Version 1.1
    http://www.mdcinfo.com/MDIS/MDIS11.html

    Metadata/RDF Resources and Publications

    Metadata Resources at UKOLN
    http://www.ukoln.ac.uk/metadata/resources/

    Prototype Metadata Registry for DESIRE project
    http://homes.ukoln.ac.uk/~lisrmh/reginfo-v1.htm

    RDF Tools - Briefing document
    http://www.ukoln.ac.uk/web-focus/events/seminars/what-is-rdf-may1998/rdf-briefing.html

    DC News, 1999-08-18
    CIMI Announces the release of the Guide to Best Practice: Dublin Core. The document is one important result of the Dublin Core Testbed, an on-going effort to explore the usability, simplicity, and technical feasibility of Dublin Core for museum information. The Guide addresses Dublin Core 1.0 as documented in RFC 2413.
    http://www.cimi.org/documents/meta_bestprac_final_ann.html

    New Metadata Handbook from European Schoolnet
    1st December 1998
    http://www.en.eun.org/eng/metadatabook-en.html
    Describes extended Metadata element set has been extended with a range of additional local (sub)elements from other metadata initiatives including the IMS (http://www.imsproject.org/ - Instructional Management System) and the ARIADNE set (http://ariadne.unil.ch/ - Alliance of Remote Instructional Authoring and Distribution Network for Europe).
    The EUN metadata harmonisation is happening in close co-operation with EUC (European Universal Classroom) which has been studying DBS/GER (http://dbs.schule.de/indexe.html - Deutscher Bildungs-Server / German Educational Resources), GEM (http://gem.syr.edu - The Gateway to Educational Materials) and EdNA (http://www.edna.edu.au/- Education Network Australia). In the following you will find a guideline to create and publish metadata, a presentation of the syntax and a thorough description of each of the EUN elements.

    Dave Beckett's Resource Description Framework (RDF) Resources
    http://www.cs.ukc.ac.uk/people/staff/djb1/research/metadata/rdf.shtml

    Automatic RDF Metadata Generation for Resource Discovery
    Charlotte Jenkins, Mike Jackson, Peter Burden, Jon Wallis
    http://www.scit.wlv.ac.uk/~ex1253/rdf_paper/

    Classifier/matadata generator Demo
    http://www.scit.wlv.ac.uk/~ex1253/metadata.html

    Mapping Entry Vocabulary to Unfamiliar Metadata Vocabularies
    Michael Buckland, with Aitao Chen, Hui-Min Chen, Youngin Kim, Byron Lam, Ray Larson, Barbara Norgard, and Jacek Purat
    http://www.dlib.org/dlib/january99/buckland/01buckland.html

    XML Searching

    Building a XML-based Metasearch Engine on the Server
    http://xml.com/pub/1999/07/metasearch/metasearch2.html

    GoXML Search Engine
    http://www.goxml.com/
    GoXML.com v1.0 - BETA is an XML Context-based Search Processor. Online documentation (http://www.goxml.com/about/supported.xsp ) and Demonstration (http://www.goxml.com/help_srch.xsp ). The Goxml Project was launched to create a new breed of Search Vehicle which can index, store and allow accurate searching of XML data. The primary focus is to allow XML developers a tool to locate XML documents on the internet.

    Search Engines

    SE news

    Search Engines News
    http://searchenginewatch.com/news.html

    Current Search Engine Report
    http://searchenginewatch.com/sereport/current.html

    Search Engine Size
    http://www.searchenginewatch.com/reports/sizes.html

    News at Web Site Search Tools
    http://www.searchtools.com/info/news.html

    Results from our Site Search Tools Survey!
    http://www.searchtools.com/surveys/survey-results-01.html
    First results from our search tools survey are in, and they're interesting! Most web administrators who haven't installed a site search say it's because they don't have time or the applications are too complex. Those who have cite improved navigation as their number one reason, by far. More surprising results come from sites aimed towards information professionals (many don't have search), and sites with three or more languages (they have search).

    Websearch.miningco.com weekly
    http://websearch.miningco.com/library/weekly/topicmenu.htm?pid=2825&cob=home

    New largest Search Engine Alltheweb.com launched by FAST Search & Transfer

    August 2, 1999 FAST (Fast Search & Transfer) has launched a new site called Alltheweb ("FAST Search: All the Web, All the Time") http://www.alltheweb.com/. The announced size of their index is more than 200 millions pages that is estimated as 25% of all web.
    FATS Search server has the following benefits:

    FAST Server uses hardware filtering using FAST Pattern Matching Chip - PMC.
    The FAST Query language allows query definition in three ways: individual character strings, sets of strings, order requirements on sets of strings.
    The FAST software runs four processes: dispatch, search, spider and indexing. Indexed data is arranged in data structures that guarantees rapid lookup when searching for strings and combination of strings. Contrary to other search engines FAST SW doesn't avoid stop words such as: to, be, or, not. So, it's possible to search for "to or not to be".

    Moderate data volume (up to approx. 1 million pages) accommodate support for approximate pattern matching according to a patented metric.
    Implementations: Biggest Search Engine having approx. 200 millions indexed pages, Lycos FTP search (http://ftpsearch.lycos.com/).

    FAST has special agreement with Dell, their Search Engine Alltheweb is powered by Dell PowerEdge Servers.

    How to test.
    Just try this mentioned phrase "to be or not to be" with quotation and without quotation in Alltheweb, Altavista, Google. You'll see big difference.

    Another high-end technology provided by FAST is FAST Image Transfer that has better compression with the same quality comparing to JPEG format, it's specially oriented on web applications and has embedded thumbnail functionality and progressive multi-resolution image display. Plugings are available for Adobe Photoshop, MS IE and Netscape Navigator. File extension is .fst.

    FAST Aims For Largest Index
    http://searchenginewatch.com/sereport/99/05-fast.html
    All The Web http://www.alltheweb.com/
    FAST http://www.fast.no/
    http://www-new.fast.no/company.html
    http://www.fastweb.no/
    FAST FTP Search - http://www.fastftp.lycos.com/
    FAST Search Server
    http://www.fast.no/product/fsserver.html
    FAST SW Search
    http://www.fast.no/product/fastsearch.html
    Search Engine Size
    http://www.searchenginewatch.com/reports/sizes.html

    NREN Search and Index Services

    German Web Index
    http://www.fireball.de/
    Metagenerator - http://www.fireball.de/metagenerator.html
    Metadata scheme - http://www.fireball.de/meta_daten.html
    Fireball was developed by FLP/KIT - http://flp.cs.tu-berlin.de/
    KIT - http://flp.cs.tu-berlin.de/kit/kit.html

    Swiss search service
    http://www.search.ch/
    Allows metadata search - http://www.search.ch/help.html.en

    Nordic Web index
    http://nwi.ub2.lu.se/?lang=en
     

    Special purposes Search Engines

    US Government Search Engine Launched
    A new search engine that focuses on information from US government sources was opened in May. Called Gov.Search, the service is jointly produced by search engine Northern Light and the U.S. Commerce Department's National Technical Information Service through a five-year agreement.
    The service is unusual for the web in that searching is not free. Those wishing to use it must pay for access, which ranges from US $15 for a day pass, $30 for a monthly pass or $250 for a year. Special pricing is also available to companies and organizations that require multiple accounts.
    Northern Light has now indexed about 4 million web pages located on more than 20,000 US government servers, which also include military and some educational sites. In addition to this information, it has also indexed about 2 million specialty records from the NTIS.
    http://searchenginewatch.com/sereport/99/06-govsearch.html
    Gov.Search
    http://www.usgovsearch.com

    Google US Government Search
    http://www.google.com/unclesam
    Google has its own US government search service. Test queries show it to be much smaller than Northern Light's index, yielding only 10 to 50 percent of Northern Light's counts. But the relevancy of some of the matches was impressive. Definitely worth a visit.

    Cora Search Engine
    http://www.cora.justresearch.com/about.html
    Cora is a special-purpose search engine covering computer science research papers.

    SE Special Services

    Northern Light Adds Research Options
    Northern Light now also operates a "research" version of its service, where the default is to search within its Special Collection index. This index has information from over 5,400 publications, much of which is not available on the web. Searching is for free, and then documents can be purchased for between $1 and $4.
    Titles can be downloaded from http://www.northernlight.com/docs/specoll_help_download.html
    http://searchenginewatch.com/sereport/99/06-northernlight.html Northern Light Research Version
    http://www.nlresearch.com/ (http://www.northernlight.com/research.html )
    Northern Light Special Editions
    http://special.northernlight.com/

    Research Service at HotBot
    http://r.hotbot.com/r/hb_also_rsrch/http://www.elibrary.com/s/hotbot/

    "Invisible Web" Revealed
    Lycos and IntelliSeek have teamed up to produce an index of search databases to help users find information that is invisible to search engines. The "Invisible Web Catalog" provides links to more than 7,000 specialty search resources. Users can browse listings or search Lycos index base.
    http://searchenginewatch.com/sereport/99/07-invisible.html
    Lycos Invisible Web Catalog
    http://dir.lycos.com/Reference/Searchable_Databases/

    IntelliSeek
    http://www.intelliseek.com/

    Direct Search
    http://gwis2.circ.gwu.edu/~gprice/direct.htm
    Catalog of specialty databases. Search inside particular database.

    WebData
    http://www.webdata.com/
    Guide to searchable databases. Browse or search through listings.

    Northern Light Adds clustering
    This is to prevent domination of results from one site.
    In addition to pages index NL provides list of Custom Search Folders ™ created/generated of clustered search data by group of servers of type of pages.
    http://www.northernlight.com/docs/search_help_folders.html

    Navigate web smarter and easier with Alexa
    http://www.alexa.com/

    Netscape's keywords service
    http://home.netscape.com/escapes/keywords/
     

    SE Technologies

    Report on the 1999 Search Engines Meeting
    by Avi Rappoport, Search Tools Consulting
    http://www.searchtools.com/info/meetings/searchenginesmtg/index.html

    Portalization and Other Search Trends (by Danny Sullivan of SearchEngineWatch).
    Main trends underlined: turning into portals; increasing relance of common searches like "travel" or "microsoft"; clustering and directory, etc.

    Quantifiable Results: Testing at TREC
    The valuable testing was done at TREC (the Text REtrieval Conference) sponsored by NIST. TREC provides a set of realistic test collections, uniform scoring, unbiased evaluators and a chance to see the changes and improvements of search engines over time.
    The TREC test collection consists of about 2 GB of combined newspaper articles and government reports.
    Testing includes a few tracks: Adhoc, Cross-Language, Filtering, High Precision, Interactive, Query, Spoken Document Retrieval (SDR).
    Results are in materials of Annual Conferences at http://trec.nist.gov/pubs.html

    Summarization
    Summarization attempts to reduce document text to its most relevant content based on the task and user requirements.
    Results indicated that many documents can be summarized successfully, better results are with variable-length summaries. The Information Retrieval methods applied to this task work well for query-focused summarization, because the topic focuses the summarization effort.
    Valuable information on this issue can be found at Natural Language Processing & Information Retrieval (NLPIR) group of ITL NIST (http://www.itl.nist.gov/iaui/894.02/). In May 1998, the U.S. government completed the TIPSTER Text Summarization Evaluation (SUMMAC), which was the first large-scale, developer-independent evaluation of automatic text summarization systems. Results are available for TREC subscribers, final report can be downloaded from http://www.itl.nist.gov/iaui/894.02/related_projects/tipster_summac/final_rpt.html

    Results Clustering and Topic Categorization
    Clustering of the found documents into useful groups is a fruitful approach to improving results presentation.
    Some search engines perform automatic clustering and categorization on result sets, so they are divided into groups by topic. The NorthernLight Search Engine, for example, cluster its results into Custom Folders that have partly predefined categories.
    The academic case made by James Callen of the University of Massachusetts shown that full text search with modern relevance rankings is the best approach for information retrieval.
    Consensus of the panel, and the meeting, is that automation can help humans, and automated categorization is the best when humans can provide a reality check on the systems.

    Cross-Language Information Retrieval (CLIR)
    CLIR means querying in one language for documents in many languages. It's becoming more important due to internationalisation of the web. Approaches include Machine-readable dictionaries, parallel and comparable corpora, a generalized vector space model, latent semantic indexing, similarity thesauruses and interlinguas.
    Presentation by TextWise (http://www.textwise.com/) described their Conceptual Interlingua approach, which uses a concept space where terms from multiple languages are mapped into a language-independent schema. This technique is used for both indexing and querying, and does not require pairwise translation.

    Improvements to Relevance Ranking of Results
    Two presentations were done by Byron Dom from IBM's CLEVER project (http://www.almaden.ibm.com/cs/k53/clever.html) and Gary Cullis, the chairman of Direct Hit (http://www.directhit.com/).

    Directories and Question-Answering
    This section dealt with current move of SE to provide directory and Subject Gateway altogether with ordinary or advanced searches. Presentation were given by LookSmart (http://www.looksmart.com/) and AskJeeves (http://www.askjeeves.com/).

    Knowledge Management
    Both Daniel Hoogterp of Retrieval Technologies and Rick Kenny of PCDocs / Fulcrum described how search fits into corporate knowledge management.

    Text Mining
    Data mining means evaluating large amounts of stored data and looking for useful patterns, like relation between product and age of customers. Text mining uses techniques from information retrieval and other fields to analyze internal structure, parse the content, provide results, clustering, summarization, and so on. With automatic event identification, conditional responses, reuse of analysis, and graphic presentation of results, the user can skim the best of the information easily.

    Filtering and Routing and Intelligent Agents
    Filtering and Routing allow individuals to set up criteria for incoming data (news feeds, email, press releases, etc.), and only be notified or sent those items that match their interests. Such task are performed by Intelligent Agents that travel a network or the Internet to locate data or track web site changes, evaluating the items using relevance judgments like those of search engines.

    Searching Multimedia
    Main discussion was about spoken documents and video retrieval.

    Search Realities faced by end users and professional searchers
    Carol Tenopir gave presentation on the history of user-centered research on searching, and current work in testing user experiences.

    Visualization
    There are some attempt to visualise search results based on document similarity. It was suggested that the success of this approach depends very strongly on the needs and experience of the searcher.
     

    Natural Language Processing & Information Retrieval (NLPIR) group of ITL NIST (http://www.itl.nist.gov/iaui/894.02/)
    Valuable information. Publications http://www.itl.nist.gov/iaui/894.02/works.html

    Information on DARPA TIPSTER Text Program http://www.itl.nist.gov/iaui/894.02/related_projects/tipster/
    http://www.itl.nist.gov/iaui/894.02/related_projects/tipster_summac/final_rpt.html
     

    IBM Patents Network -
    http://www.patents.ibm.com/

    Lycos holds patent 5,748,954
    (http://www.patents.ibm.com/details?pn=US05748954__&s_clms=1#clms ), which covers roughly any kind of web spider that heuristically downloads "better" documents before "worse" documents, and explicitly includes a reference to looking at how often a document is linked as a goodness heuristic.

    TUSTEP (TUebingen System of Text Processing Programs)
    Munltilingual Textdata Processing and Fuzzy Searching
    http://www.uni-tuebingen.de/zdv/tustep/tdv_eng.html
     

    Search Engines Tools

    Web Site Search Tools
    http://www.searchtools.com/

    Web Site Search Tools - Related Topics


    Search Tools Product Listings
    http://www.searchtools.com/tools/tools.html

    Free Indexing and Searching Software

    Harvest-NG
    Harvest, an open-source project, has been re-implemented in Perl and can summarize documents in SOIF (Summary Object Interchange Format). This version saves the data in a database file and does not include a Broker or search engine, but it is entirely extensible.
    http://www.tardis.ed.ac.uk/harvest/ng/
    http://www.tardis.ed.ac.uk/harvest/ng/develop.shtml

    The Combine System for disributed indexing
    http://www.lub.lu.se/combine/
    http://www.ub.lu.se/~tsao/combine/

    Zebra Information Server
    Powerful free-text indexing and retrieval system, combined with a Z39.50 server. The Zebra server is freely available for noncommercial applications.
    http://www.indexdata.dk/zebra/

    Framework for Advanced Search (ASF)
    http://asf.gils.net/framework.html
    ASF Freeware
    http://asf.gils.net/freeware/index.html

    OCLC Z39.50 freely reusable code (C and Java)
    http://www.oclc.org/z39.50/#api

    Perlfect Search 3.01
    http://perlfect.com/freescripts/search/

    PLWeb Turbo has released a new version, 3.0 for Windows NT with improved performance, customization, web-crawling capability, and a browser-based interface.
    PLWeb and all PLS products are now freeware from AOL.
    http://www.pls.com/plweb.htm
    http://www.searchtools.com/tools/plweb.html

    AltaVista (Windows NT and Unix search tool) has just introduced a free version of AltaVista Search Intranet, Entry Level, which will index up to 3,000 pages.
    http://k2.altavista-software.com/intranet/3000_version/3000_overview.htm

    Commercial SW

    Ultraseek on Linux
    The Ultraseek search engine and the Content Classification Engine now run on Linux Redhat Linux 5.1 on a PC, Kernel 2.0.34 or better, or glibc 2.0.7-19 or better. Commercial
    http://software.infoseek.com/products/ultraseek/ultratop.htm
    Download free trial version
    http://software.infoseek.com/download/download.htm
    http://www.searchtools.com/tools/ultraseek.html

    Ultraseek Content Classification Engine Product Information
    Commercial.
    http://software.infoseek.com/products/cce/ccetop.htm
    http://software.infoseek.com/products/cce/ccekey.htm

    Super Site Searcher Perl CGI works with other modules to create searchable site directory. Commercial.
    http://www.hassan.com/site_searcher/
    http://www.searchtools.com/tools/supersitesearcher.html

    Extense - a powerful search engine developed in France which uses the syntactic declination of French words (masculine/feminine and singular/plural). Commercial.
    http://www.searchtools.com/tools/extense.html

    Inxight LinguistX code library - provides language identification, stemming and tokanization, among other features.
    http://www.searchtools.com/tools/inxight.html
    http://www.inxight.com/
    A collection of componants for many languages that provide word and phrase analysis, stemming, tokanization, parts of speech analysis, noun phrase extraction, language identification, summarization, etc.
    Platform: Windows 95 and NT, Solaris Sparc (will port to other Unix systems). Commercial.

    Verify products
    http://www.verity.com/products/index.html

    Knowledge Retrieval products
    http://www.verity.com/products/knowret1.html
     

    SE tips and links

    Search Engines links
    http://searchenginewatch.com/links/
    Contains such sections:


    Search Tips and Tricks Advanced Searching
    http://websearch.tqn.com/msub21.htm?pid=2825&cob=home
    http://websearch.miningco.com/msub21.htm?pid=2825&cob=home

    Information Retrieval systems
    http://www.mri.mq.edu.au/%7Eeinat/web_ir/software.html

    Top search words and terms
    http://www.searchenginewatch.com/facts/searches.html

    Ask Jeeves Peak Through The Keyhole http://www.askjeeves.com/docs/peek/

    Weekly Search Engine Keyword Statistics For Web and Internet Marketing
    http://www.mall-net.com/se_report/

    Dogpile Top 200 Search Words
    http://www.eyescream.com/dogpiletop200.htm
    Top words from the meta-search engine Dogpile from January to July 1997. Unfortunately, the actual keyword phrases are not shown.

    Search Spy
    http://www.searchspy.com/
    This is a database of search terms available for desktop use. You enter a term, and the program scans to find matches. You can sort results by count or by keyword. Data is gathered from various live search displays.

    Life on the Internet, Finding Things
    http://www.screen.com/start/guide/searchengines.html

    useit.com: Jakob Nielsen's Website
    http://www.useit.com/
    He formulated new approach in SE - LSD: Logo, Search, Directory.
     

    Search Engine Projects

    IBM's CLEVER Searching
    http://www.almaden.ibm.com/cs/k53/clever.html


    Web Archeology Project at Digital Research
    http://www.research.digital.com/SRC/personal/Krishna_Bharat/WebArcheology/
    Contains sections:


    The MetaWeb Project
    The aim of the Metadata Tools and Services project - known as MetaWeb - is to develop indexing services, tools, and metadata element sets in order to promote the use of, and exploitation of metadata on the Internet.
    http://www.dstc.edu.au/Research/Projects/metaweb/

    DFN Indexing and Searching projects - http://www.dfn.de/links/suchen.html
    MetaGer (subject meta search), MESA (email address meta search), Level3 (search service for the DFN-Expo project), Search.de and Entry.de)

    X.500 Directory E-mail Addresses Search (AMBIX-D) - http://ambix.uni-tuebingen.de:8889
     

    Search Engines Papers

    Research Papers related to Google!
    http://google.stanford.edu/google_papers.html


    Research Papers related to IBM CLEVER Searching Project
    http://www.almaden.ibm.com/cs/k53/clever.html
     

    John Kleinberg Homepage
    http://www.cs.cornell.edu/home/kleinber/
    Researches and publications related to IBM's CLEVER Searching project.

    Other SE papers

    TREC Publications
    TREC (the Text REtrieval Conference) sponsored by NIST provides a set of realistic test collections, uniform scoring, unbiased evaluators and a chance to see the changes and improvements of search engines over time.
    Results are in materials of Annual Conferences at http://trec.nist.gov/pubs.html

    Retrieval Performance in FERRET: A Conceptual Information Retrieval System
    Michael L. Mauldin
    Appeared at The 14th International Conference on Research and Development in Information Retrieval, Chicago, October 1991, ACM SIGIR.
    http://www.fuzine.com/mlm/sigir91.html

    Enhancing the World Wide Web
    Social Software for the Evolution of Knowledge
    http://www.islandone.org/Foresight/WebEnhance/index.html

    Learning Webs by J. Bollen, & F. Heylighen,
    http://pespmc1.vub.ac.be/LEARNWEB.html
    Hebbian learning can be implemented on the web, by changing the strength of links depending on how often they are used. paper is exploring the "brain" metaphor for making the web more intelligent. The basic idea is that web links are similar to associations in the brain, as supported by synapses connecting neurons. The strength of the links, like the connection strength of synapses, can change depending on the frequency of use of the link. This allows the network to "learn" automatically from the way it is used.

    Identification, location and versioning of web-resources. URI Discussion paper. Version 1.0. 12 March 1999
    Titia van der Werf-Davelaar
    http://www.konbib.nl/donor/rapporten/URI.html
    This document is a discussion document for use in developing a consensus on practical approaches to be pursued for better information management techniques and methods on the Web.
    This work is done in the context of the following projects: DONOR, DESIRE, NEDLIB.

    Report on the WWW8 conference by Nicky Ferguson
    http://www.ilrt.bris.ac.uk/~ecnf/www8.html

    Semantic Web vision paper
    Alexander Chislenko. - Version 0.28 - 29 June, 1997
    http://www.lucifer.com/~sasha/articles/SemanticWeb.html

    SE Legal issues

    Lycos GENERAL TERMS AND CONDITIONS -
    http://www.lycos.com/lycosinc/legal.html
     

    Subject Gateways

    Subject Gateway Projects

    DESIRE 2 - Development of a European Service for Information on Research and Education
    http://www.desire.org/

    ROADS Project
    ROADS is a set of software tools to enable the set up and maintenance of Web based subject gateways. Subject gateways are services which provide searchable and browsable catalogues of Internet based resources. Subject gateways will typically focus on a related set of academic subject areas.
    http://www.ilrt.bris.ac.uk/roads/
    ROADS Software Downloads (Perl code for WHOIS++, Centroids/CIP etc.)
    http://www.roads.lut.ac.uk/
    The ROADS project exit strategy - Ensuring the future of ROADS for its users
    http://www.ilrt.bris.ac.uk/roads/news/latest/futures/

    IMesh at Desire.org
    International Collaboration on Internet Subject Gateways
    http://www.desire.org/html/subjectgateways/community/imesh/

    Project Isaac - A Distributed Architecture for Resource Discovery Using Metadata
    http://scout.cs.wisc.edu/research/index.html

    Joint Information System Committee
    Established to stimulate and enable the cost effective exploitation of information systems and to provide a high quality national network infrastructure for the UK higher education and research councils communities
    http://www.jisc.ac.uk/
    Publications related to JISC
    http://www.jisc.ac.uk/pub/index.html

    OCLC - Co-operative Online Resource Catalog (CORC)
    http://www.oclc.org/oclc/research/projects/corc/index.htm

    CoBRA+ - Computerised Bibliographic Record Actions
    http://www.bl.uk/information/cobra.html

    CoBRA+ working group on multilingual subject access
    http://www.bl.uk/information/finrap3.html

    EEVL (Engineering Gateway) Evaluation Reports
    http://www.eevl.ac.uk/evaluation/

    The Gateway to Educational Materials
    The Gateway currently contains 6661 education resources and includes resources from more than 40 collections, including the AskERIC Virtual Library, Math Forum, Microsoft Encarta, North Carolina Department of Public Instruction, and U.S. Department of Education.
    http://www.thegateway.org

    Networked Digital Library of Theses and Dissertations
    http://www.ndltd.org/

    German Digital library project Global Info
    http://www.global-info.org/index.html.en

    Papers on Subject gateways

    D-lib Magazine
    D-Lib Magazine is a monthly magazine about digital libraries for researchers, developers, and the intellectually curious. New issues are published on the 15th of each month.
    http://www.dlib.org/

    Modeling Users' Successive Searches in Digital Environments: A National Science Foundation/British Library Funded Study
    Amanda Spink, Tom Wilson, David Ellis , Nigel Ford
    http://www.dlib.org/dlib/april98/04spink.html

    Legal Issues on the Internet: Hyperlinking and Framing
    Maureen A. O'Rourke
    http://www.dlib.org/dlib/april98/04orourke.html

    Cross-Searching Subject Gateways: The Query Routing and Forward Knowledge Approach, John Kirriemuir, Dan Brickley, Susan Welsh
    Jon Knight, Martin Hamilton
    http://www.dlib.org/dlib/january98/01kirriemuir.html

    Using Automated Classification for Summarizing and Selecting Heterogeneous Information Sources
    R. Dolin, D. Agrawal, A. El Abbadi, J. Pearlman
    http://www.dlib.org/dlib/january98/dolin/01dolin.html

    Networked Digital Library of Theses and Dissertations: An International Effort Unlocking University Resources
    Edward A. Fox, John L. Eaton, Gail McMillan, Neill A. Kipp, Paul Mather, Tim McGonigle, William Schweiker, and Brian DeVane
    http://www.dlib.org/dlib/september97/theses/09fox.html

    The Internet Knowledge Manager, Dynamic Digital Libraries, and Agents You Can Understand
    Adrian Walker, IBM Research Division
    http://www.dlib.org/dlib/march98/walker/03walker.html

    An Introduction to the Resource Description Framework
    Eric Miller, OCLC
    http://www.dlib.org/dlib/may98/miller/05miller.html

    A Distributed Architecture for Resource Discovery Using Metadata
    Michael Roszkowski and Christopher Lukas, Scout project, University of Wisconsin-Madison
    http://www.dlib.org/dlib/june98/scout/06roszkowski.html

    Multilingual Federated Searching Across Heterogeneous Collections
    James Powell and Edward A. Fox
    http://www.dlib.org/dlib/september98/powell/09powell.html

    The Joint NSF/JISC International Digital Libraries Initiative
    Norman Wiseman, Joint Information Systems Committee; Chris Rusbridge, Electronic Libraries Programme; and Stephen M. Griffin, National Science Foundation
    http://www.dlib.org/dlib/june99/06wiseman.html

    D-Lib Ready Reference: Subject Area Gateways
    http://www.dlib.org/reference.html#subject

    A Common Model to Support Interoperable Metadata. Progress report on reconciling metadata requirements from the Dublin Core and INDECS/DOI Communities
    David Bearman, Eric Miller, Godfrey Rust, Jennifer Trant, Stuart Weibel
    http://www.dlib.org/dlib/january99/bearman/01bearman.html

    i18n in Subject gateways

    A Multilingual Electronic Text Collection of Folk Tales for Casual Users Using Off-the-Shelf Browsers
    Myriam Dartois, Akira Maeda, Tetsuo Sakaguchi, Takehisa Fujita, Shigeo Sugimoto, Koichi Tabata
    D-Lib Magazine, October 1997
    http://www.dlib.org/dlib/october97/sugimoto/10sugimoto.html

    Multi-Media, Multi-Cultural, and Multi-Lingual Digital Libraries, Or How Do We Exchange Data In 400 Languages?
    Christine L. Borgman
    University of California, Los Angeles
    D-Lib Magazine, June 1997
    http://www.dlib.org/dlib/june97/06borgman.html
     

    Automatic Classification

    IKEM Toolkit
    http://bikit.rug.ac.be:80/ikem/
    IKEM Toolkit is a hybrid knowledge-based platform for thesaurus-oriented electronic document management. The project was sponsored by IWT. IKEM Toolkit contains various tools to manage your hybrid documents in an intelligent and user-oriented way.

    Willpower Information. Information Management Consultants
    www.willpower.demon.co.uk
    Thesauri and vocabulary control: Principles and practice
    http://www.willpower.demon.co.uk/thesprin.htm
    Software for building and editing thesauri
    http://www.willpower.demon.co.uk/thessoft.htm

    CMU Text Learning Group
    http://www.cs.cmu.edu/afs/cs/project/theo-4/text-learning/www/index.html
    Goal is to develop new machine learning algorithms for text and hypertext data. Applications of these algorithms include information filtering systems for the Internet, and software agents that make decisions based on text information.

    CMU World Wide Knowledge Base (WebKB) project
    http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/
    Goal is to develop a probabilistic, symbolic knowledge base that mirrors the content of the world wide web. If successful, this will make text information on the web available in computer-understandable form, enabling much more sophisticated information retrieval and problem solving.

    Bow: A Toolkit for Statistical Language Modeling, Text Retrieval, Classification and Clustering
    Bow (or libbow) is a library of C code useful for writing statistical text analysis, language modeling and information retrieval programs. The current distribution includes the library, as well as front-ends for document classification (rainbow), document retrieval (arrow) and document clustering (crossbow).
    The library and its front-ends were designed and written by Andrew McCallum.
    http://www.cs.cmu.edu/~mccallum/bow/rainbow/

    Homepage of Andrew McCallum
    http://www.cs.cmu.edu/~mccallum/

    Contains a lot of information on Learning Classification algorithms for text recognition.


    Reinforcement Learning with Selective Perception and Hidden State. PhD Thesis, by Andrew Kachites McCallum
    http://www.cs.rochester.edu/u/mccallum/phd-thesis/
    Method uses memory-based learning and a robust statistical test on reward in order to learn a structured policy representation that makes perceptual and memory distinctions only where needed for the task at hand. It can also be understood as a method of Value Function Approximation. The model learned is an order-n partially observable Markov decision process. It handles noisy observation, action and reward.

    WWW -- Wealth, Weariness or Waste: Controlled vocabulary and thesauri in support of online information access
    David Batty
    http://www.dlib.org/dlib/november98/11contents.html

    Using Automated Classification for Summarizing and Selecting Heterogeneous Information Sources
    R. Dolin, D. Agrawal, A. El Abbadi, J. Pearlman
    http://www.dlib.org/dlib/january98/dolin/01dolin.html

    Other SE related technologies

    XANADU(R) ZIGZAG(TM) Hyperstructure Kit
    http://www.xanadu.net/zigzag/

    TRANSPUBLISHING: A SIMPLE CONCEPT
    http://www.sfc.keio.ac.jp/~ted/TPUB/TPUBsum

    OSMIC. THEORY: MODELS OF TIME, VERSIONS AND BACKTRACK
    http://www.sfc.keio.ac.jp/~ted/OSMIC/osmicTime.html

    Ted Nelson Home page
    http://www.sfc.keio.ac.jp/~ted/
     

    Expert Systems

    ROG-O-MATIC: A Belligerent Expert System
    MICHAEL L. MAULDIN, GUY JACOBSON, ANDREW APPEL and LEONARD HAMEY
    http://www.fuzine.com/mlm/rgm84.html
     
     

    SE Business News

    Alta Vista sold to CMGI (http://www.cmgi.com/) - Internet venture holding. In the deal Compaq and CMGI established strategic partership.

    AltaVista's free Internet access offers integrated search, news, quotes, and much more.
    AltaVista FreeAccess http://microav.com/

    Inktomi Launches European Search Center
    Inktomi has opened an index of European web sites that will serve its partners who are based in Europe. The 50 million page index is based in the United Kingdom and mostly populated by content from European web servers.
    Inktomi partners such as UKMax (http://ukmax.com) and Dagens Nyheter (http://dn.se/) are expected to begin using the index soon.
    Results of test - UKMax -bad, dn.se have not implemented search yet.
    http://searchenginewatch.com/sereport/99/06-inktomi.html

    Infoseek adds new search features
    Infoseek has introduced search term highlighting in its results, a related searches prompter, and increased its index size to about 70 million web pages.
    Infoseek added a "Similar searches". Similar Searches display popular queries that are related to your original search. F.E., if your looking for "gardening" you will be also proposed "water gardening", "flower gardening", etc.
    http://www.infoseek.com/
    http://searchenginewatch.com/sereport/99/06-infoseek.html

    Infoseek is to be completely acquired by Disney and merged into a new company called Go.com.
    http://www.internetnews.com/bus-news/article/0,1087,3_159481,00.html

    Dell started their own portal Dellnet.com which actually resides at dellnet.snap.com
    It also includes DellAuction.com (http://www.dellauction.com/), Gigabuys.com (http://gigabuys.us.dell.com/store/index.asp )
    FAST ASA Announces Signing of MOU and Conditional Share Placing/Option Agreement with Dell Computer Corporation
    http://www-new.fast.no/company/press/dell02081999.html

    NBC's Snap.com and GlobalBrain.net unveil sophisticated new technology and services to harness the brain power of Internet users
    NBC and CNET's Snap.com Internet portal and GlobalBrain.net today unveiled an exclusive multi-year technology licensing and development agreement. Snap.com will integrate GlobalBrain's revolutionary new Internet popularity ranking technology that improves the relevancy of search results by learning user preferences and prioritizing search results accordingly.
    http://www.globalbrain.net/html/release.html
    http://searchenginewatch.com/sereport/9811-globalbrain.html

    Snap Picture Finder
    http://home.snap.com/search/picture/form/0,584,-0,00.html
    Snap is now featuring an image search capability, powered by Ditto.com. Previously known as ArribaVista, Ditto.com also offers image searching directly via its web site. The company is embarking on a new strategy of powering image search for other sites.
    http://www.ditto.com/

    Netscape Search Service
    Netscape has launched a revamped Netscape Search service that uses information from the Open Directory and technology from Google.
    http://search.netscape.com/

    Direct Hit Debuts at MSN Search, Lycos
    Both MSN Search and Lycos are now featuring Direct Hit results, and the company itself has just received $26 million in financing from a variety of venture firms.
    http://www.directhit.com/

    LookSmart Live Looks-Up Answers
    Looking for an answer? Look no further than LookSmart, which is providing custom research to frustrated searchers through its new LookSmart Live program. The request gets passed on to one of 80 editors involved in the project, and within 24 hours, you get an email back with your answer.
    http://www.looksmart.com/
    LookSmart Live
    http://www.looksmart.com/live/

    America Online, Excite@Home, Yahoo!, others are working on adopting their portal service to handheld computers.
     

    Directories Business news

    IBM, Novell, Oracle, DCL, Lotus Development and ISO COR rally industry to advance market for Open Directory applications
    Members of the Directory Interoperability Forum plan to:

    As an initial step, members of the Forum have verified that current LDAP-enabled applications interoperate with IBM SecureWay* Directory, Novell Directory Services* (NDS), Lotus Domino* Directory and Netscape Directory*. Applications that have been tested include: IBM WebSphere*, IBM Blue Pages, Lotus Domino, Lotus Notes*, Tivoli Management products, Novell Groupwise* and Novell Net Publisher*. More information about the Directory Interoperability Forum is available at
    http://www.directoryforum.org

    Go Beta Tests User-Assisted Directory
    ttp://www.go.com/

    Go Guides Beta
    http://beta.guides.go.com/



    This page is updated regularly, please send your suggestions to: demchenko@terena.nl.