From a Set of Technical Documents to a Hypertext System on the Web



Yannick Marchand, Jean-Luc Guérin, Jean-Paul Barthès
Université de Technologie de Compiègne
U.R.A.  C.N.R.S.  Nº 817 HeuDiaSyC
B.P. 529 ­ 60205 Compiègne Cedex ­ France
E-mail : {marchand,jlguerin,barthes}@hds.utc.fr



Introduction

Initially conceived in the U.S.A. in 1969, during the cold war, for the benefit of a military preoccupied by an eventual rupture of communications, the Internet was immediately used by research bodies and universities, in order to exchange their ideas. It was with this perspective in mind that the World Wide Web was created, to support a hypermedia dedicated to building information servers on the Internet. Created in March 1989, on the initiative of Tim Berners-Lee [Berners-Lee 94] for the community of physicians at the Centre d'Etudes et de Recherches Nucléaires (CERN) in Geneva, its aim was to centralise scientific results, publications and documentation. The Web was destined to be a great success and was transformed into a cultural and social phenomena when, in February 1993, Marc Andreessen of the National Center for Super Computing, University of the Illinois, U.S.A., edited the first version of NCSA Mosaic [NCSA 93]. This graphical interface and Netscape will greatly accelerate the development of the Internet. The Internet already has millions of users and will soon be affordable to the average household and, and as a direct consequence, it is attracting the interest of companies throughout the world.

As the Web has been completely successfull, due to the fact that it retrieves information in a quick, powerful and intuitive manner, the approach and the technology used, especially the hypertext, will become well known. Unfortunately though, the concept of hypertext has been simplified. In order to understand how this concept could lose some of its original meaning, it is necessary to take a fresh look at the precursers of hypertext and describe their motivating factors. This will facilitate the presentation of 'Nestor', a prototype hypertext for the Web that we have developed for CNET[1] at Lannion, one of the main research centers of France Télécom.

The Founding Principles of Hypertext

The names of Vannevar Bush, Douglas Engelbart, and Theodor Nelson invariably get mentioned when the recent history of hypertext is under discussion. Indeed, the projects Memex [Bush 45], Augment [Engelbart 68], and Xanadu [Nelson 88], that are expressions of new ideas or concrete realisations, have been crucial to the development of this research domain.

It is necessary to lament the absence of Paul Otlet from the aforementioned group of scientists because this Belgian author exhibited in his work [Otlet 34], 11 years before Bush, an exceptional clairvoyance bordering on prophecy . It is for this reason that this section uses the ideas of these four pioniers of hypertext to exhibit three essential elements that characterise and justify the fact that the word 'hypertext' means etymologicaly 'more than' 'text'.

The Gift of Ubiquity

Noting that the number of books and documents increases every day, [Otlet 34] proposes, in order to confront this deluge of information, the creation of 'bibliology', a science and general technique for documentation. The creation of this science would necessitate "a set of interlinked machines" having to perform seven operations of which "the establishment of documents in such a manner that each piece of data has its own individuality and in its relations with other data, it must be called anywhere that it is required" (operation 3) and "automatic access to consulted documents" (operation 6). It is obviously possible to note that these four authors shared the same preoccupation : the organisation of literature on a large scale, support for knowledge accumulated across the centuries, in order to make access easy and quick to that which is being manipulated. Within the projects Memex and Augment, the aim is to help the researchers with their reseach documents. The aim of the Xanadu project is slightly different, its aim being the construction of an immense network that takes into account all the available documentation ever published.

It is possible that these human and ambitious aspirations are themselves cemented within the Web. Indeed, the Web plays the role of a global library, giving its users a flexible and immediate access to a set of documents that are spread worldwide. Thus, the Web gives the impression that the user is consulting a unique document although, in reality, the user is visiting several separated servers throughout the world. Due to the dematerialisation of documents and abolishment of the notions of distance and time, the Web offers amazing possibilities, by using simple electronic clicking, to be everywhere at once. It should be noted that Engelbart, by the invention of the 'mouse' and experimentation with multi-windowed screens, has greatly contributed to an instantaneous and associative displacement within the jungle of information. In other terms, this displacement inspires, as wished for by [Bush 45], our natural manner of thinking ("As We May Think").

The Omnipotence

[Otlet 34] mentions a second essential principle for the concept of hypertext. It concerns the "presentation of documents, either by viewing directly or through the intermediary of a machine that has to make additional inscriptions" (operation 6) and the "mecanical manipulation, at will, of all the recorded data, in order to obtain new combinations of facts, and new relationships between ideas" (operation 7). Within this outlook, the user is no longer only passive, content to consult elements of information connected by active links, but active as well, in the sense that the user has available these elements in order to add annotations and personal links to them. This is the reason why the boundary between the author and reader has a tendancy to disappear since the reader benefits from a freedom comparable to that of a sculptor who is allowed to model, at will, using the material that is initially given.

On the Web, a non computer-literate user unfortunately can not exercise this freedom of action on the documents. Indeed, the creation of one link for the user, for example, is neither natural nor convivial because it is necessary to have minimal knowledge of the following : (i) directories and files, (ii) text editors, and above all (iii) the language HTML (HyperText Markup Language) [Morris 95] that is used to describe the documents. This creation of the relation can be considered as an important intellectual act since it constitutes, for its author, an argumentative and rhetorical element.

The Omniscience

"The machine that would perform these seven operations would be a veritable mechanical and collective brain"[Otlet 34]. "An active community will be constantly involved in discussion concerning the contents of its manual"(Engelbart). These two quotations put the accent on the last distinctive characteristic of the concept of hypertext, namely the cooperative work that puts the creation of personalised links and commentaries within the social construction of knowledge. Due the fact that a hypertext is adaptable and shareable, this approach means that it is never a final product but remains, for its users, an area of expression and memory that is constantly evolving. The hypertext therefore takes the form of a flexible tool of social communication, at the service of collective intelligence processes [Lévy 90]. Thus it becomes possible for each user to have access to all of the knowledge acquired by the community. At the time of his writing, Bush could already imagine a new profession of trail blazer who would be the type of experts capable of discovering and building useful routes within these documents.

It is certainly this characteristic that illustrates the most the difference between the Web and the first aspirations of the concept of hypertext. Due to the fact that the Web is organised according to a client-server architecture, each author is only in charge of a limited number of documents, of which the author has sole rights to define the links to other documents. In other terms, the documents that have not been created by the author are consultable but communication itself does not exist, since it is not possible for the user to adjust and transform them. In this case, it consists more of an interconnection of distributed knowledge : each user puts his knowledge at the disposal of the collective and knows that he can access, by return, all the information that he requires but does not have in his possession [Nanard 95]. "I offer to others my microcosm of documents" has substituted the original idea of "Let's share the universe of documents that we transform together".

Presentation of 'Nestor'

It is possible to rename technical specifications as documents that define the characteristics of a product or service. These specifications have to comply to certain recommendations (or standards), namely a set of rules that are normally created by international organisations of standardisation. This section describes how as such recommendations have been treated with respect to 'Nestor', a hypertext prototype that has been developped for CNET at Lannion.

Characteristics of Corpus and Objectives

For the personnel who have to write specifications, the corpus of recommendations can look like an encyclopedia. Indeed, these reference documents give, in the form of english text, information concerning, amongst other things, definitions, concepts, and examples. The recommendations form a "microcosm" of interdependant documents and are structured in the form of traditional linear texts, namely with a contents page, and a set of successive paragraphs, grouped in chapters. Containing multiple internal and external references, specifically to other documents, the consultation of these 'spaghetti documents' is based as much upon a mechanism of the association of ideas as it is upon a sequential and chronological reading.

The aim of 'Nestor' is to transform the set of recommandations into a hypertext for the Web. It is worthwhile asking if this transformation is opportune, because, as stated by [Nielsen 90], "just as the best films are not made by putting a camera in the front row of a theater, the best hypertexts are not made from text that was originally written for the linear medium". In response to this objection it is possible to put forward, in our case, the following two arguments :

Due to the fact that 'Nestor' is a hypertext prototype, it should logically display the characteristics that have been already stated for the gift of ubiquity, the omnipotence, and the omniscience. These terms were voluntarily emphatic in order to emphasize the quasi-divine character of the concept of hypertext, to enable it to be initially defined. In our case, it concerns recalling the last two founding principles, which happen to be the most commonplace terminology, of easy creation of personalised links and team work.

Nodes and Typology of Links

The hypertexts have to, as their first task, articulate and organise the entities of information (nodes), by use of relations (links) that exist between these grains of knowledge. These links are activated by the user in order to travel elsewhere, according to his interests.

Within 'Nestor', the nodes represent the formal and logical divisions that can be found in reference documents, specifically, the contents page, the chapters and optionally the appendices. Coming from this 'natural' division of the units of meaning, it is possible to define the following links :

These links, that could be renamed as structural links due to the fact that they are directly derived from logical organisation of linear texts, can therefore be automatically identified and generated by a compiler. In additon to these objective links, there are subjective links [Kahn 89] that are reference links created by the users in order to enrich the initial connectivity of the hypertext. This category of personalised link is important for hypertexts as it gives the possibility of adding links that transform, in a certain way, each reader into a potential author. Thus the user can, in concrete terms, structure and build his own knowledge. Indeed, as time progresses, the user, by interacting with the hypertext, acquires an understanding of the domain under exploration [Yankelovich 87]. The hypertext therefore becomes, for the user, a depositary of expertise, a way to organise his knowledge by correcting and completing, through the creation of personalised links, the incoherencies and deficiencies of initial texts. It should be noted that, in our case, only the addition of links can be performed, since the number of nodes is, by definition, constant as it is determined by a finalised set of documents.

Therefore, it is possible to say that, in the first instance, the user appropriates the knowledge of texts through the use of structural links. It is only later that he reappropriates this knowledge through the use of personalised links. In order to explain the semantics associated with a personalised link, the user can add 'commentary' details to augment the anchor and the node of destination. Due to the fact that the 'commentary' and 'destination' details are optional, the personalised links found within 'Nestor' can be classified as one of the following three types :

Destination
Commentary
Direct links (DL)
Yes
No
Commented links (CL)
Yes
Yes
Simple commentary (SC)
No
Yes

In the case of the link with commentary, the term 'link' is used even though it is not a real link, due to the fact it does not relate to two nodes of the hypertext but to a node that is connected to a commentary page (this is the reason why the acronym 'SC' signifies 'Simple commentary'). A personalised link is public or private. The public link, in opposition to the private link, means that the author's work can be accessed by other users.

The General Architecture

The Web uses the CGI (Common Gateway Interface) [Mc Cool 94] that serves the purpose of writing the bridges between the information HTTP servers (HyperText Transfer Protocol) and external programs. The role of these programs, that are commonly known as scripts, consists of : (i) capturing the parameters entered by the user, (ii) manipulating them and, (iii) giving a result to the client program that made the request. The architecture chosen for 'Nestor' is based upon the coupling of Matisse[2], an objet oriented database, and Netscape, a client of the Web. The interfacing of these two applications is assured by scripts written in Python, an objet oriented programming language developed at the 'Centrum voor Wiskunde en Informatica' (CWI) of Amsterdam [Van Rossum 93].Matisse is a system used for the managment of object oriented databases. Its basic concept is the PDM model (Property Driven Model), developed at the University of Technology of Compiègne [Barthès et al. 86]. This model is based upon both semantic networks [Quillian 68] and frames [Minsky 74]. An object is characterised by two different properties : the attributes and the relations. The notions of minimal and maximal cardinality are associated with the relations, as occurs within the Entity-Association data model [Chen 76]. From a practical point of view, it is possible to note the following :

Python is an interpreted programming language that proposes objects and high level operations using a simple syntax that is based upon indentation. In addition, it has the advantage of possessing a standard 'CGI' module that allows the easy capture of parameters from a HTML page. The following modules have been written for 'Nestor' :

Matisse

Within this module are the main functions (written in C) of the Application Programming Interface of Matisse.

FormatHtml

This module embodies a part of the HTML language. This language is used for the diffusion of documents by Web servers and consists of a set of formatting commands.

Compiler

The compiler allows the transformation of the set of ASCII recommendations into HTML files. This transformation is performed in two stages. Firstly, the compiler identifies the units (or nodes) of the hypertext, more specifically the contents page, the chapters, and the appendices, and gives them an identifier (number of nodes) that will be strored within Matisse via use of the 'Matisse' module. Secondly, the compiler segments these units in order to physically generate HTML files by use of the 'FormatHtml' module. This generation constructs the structural links and prepares, for each node, the options that will permit improvement (cf 'Personnalisation' module). It is easy to understand why, for reasons of security, the 'Nestor' option that uses the compiler is safeguarded by a password.

Personnalisation

This module offers the possibility to the users of improving the recommendations by the intermediary of annotations, key words, and especially personalised links. In the latter case, the user can have, at any moment for the current node, a snapshot of all the public personalised links that the group possesses. In the same manner, the plurality of viewing points on a anchor must be taken account. For instance, the following expression : Anchor (Cl) (Sc) (Dl) indicates to the user that :


Conclusion and Perspectives

The 'Nestor' prototype has been recently experimented in a real environment. Two types of ASCII documents were compiled, namely, a set of recommendations for CNET and documentation dedicated to the Python programming language. The results of the two compilations are given in the following table :

Documentation Python
Recommendations CNET
Number of documents
2
6
Number of bytes
183 471
518 845
Number of pages
86
213
Number of nodes
122
85
Number of structural links
427
408
Number of reference links
11
104

We are currently looking for people willing to test 'Nestor' in order to gather comments about its running and its ergonomics. The adaptability manifests by the fact that the user enriches the hypertext via his own knowledge. In the notion of adaptivity, the hypertext system takes the initiative by proposing links that might be relevant to the user. In order to find such informal links, a neural approach, based upon the Hopfield model, is currently under development.

Literature References

[Barthès et al. 86] Barthès, J-P., Vayssade, M., & Monika Miaczynska-Znamierowska (1986). Property Driven Data Bases. Internal rapport of the University of Technology of Compiègne.

[Berners-Lee 94] Berners-Lee, T. (1984). World Wide Web Initiative, CERN - European Particle Physics Laboratory. http://info.cern.ch/hypertext/WWW/TheProject.html

[Bush 45] Bush, V. (1945). As we may think. Atlantic Monthly, July 1945, 101-108.

[Chen 76] Chen, P.P. (1976). The Entity-Relationship model : towards a a unified view of data. ACM Trans. on Database Systems, 1 (1), 9-36.

[Engelbart 68] Engelbart, D. (1968). A research Center for Augmenting human intellect. AFIPS conference proceedings, 33 (1). Thompson books, Washington DC, 1968.

[Glushko 89] Glushko, R. (1989). Design Issues for Multi-Document Hypertexts. Proceedings Hypertext '89, Special Issue SIGCHI Bulletin, 1989, 51-60.

[Kahn 89] Kahn, P. (1989). Linking together books : Experiments in adapting published material into hypertext. Hypermedia, 1 (2), 1991, 111-145.

[Lévy 90] Lévy, P. (1990). Les technologies de l'intelligence. L'avenir de la pensée à l'ère informatique. La Découverte.

[Mc Cool 94] Mc Cool R. (1994). The Common Gateway Interface. National Center for Supercomputer Applications, University of Illinois at Urbana-Champaign, http://hoohoo.ncsa.uiuc.edu/cgi

[Minsky 74] Minsky, M. (1974). A framework for representing knowledge. Cambridge Mass., M.I.T.

[Morris 95] Morris, M. (1995). HTML for fun and profit. SunSoft Press, Prentice Hall.

[Nanard 95] Nanard, M. (1995). Les hypertextes : au-delà des liens, la connaissance. Sciences et techniques éducatives,. 2 (1), 1995, 31-59.

[NCSA 93] National Center for Supercomputing Applications (1993), University of Illinois at Urbana-Champaign. NCSA Mosaic. A WWW Browser, http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/Docs/fill-out-forms/overview.html

[Nelson 88] Nelson, T. (1988). Managing Immense Storage : Project Xananu provides a model for the possible future of mass strorage. Byte, 13 (1), January 1988, 225-238.

[Nielsen 90] Nielsen, J. (1990). Hypertext and Hypermedia. Academic Press.

[Otlet 34] Otlet, P. (1934) : Extracts from "Traité de documentation, le livre sur le livre" taken from "La Pensée", n°281, May/June 1991, 66-71.

[Quillian 68] Quillian, R. (1968). Semantic memory in Minsky, M. (ed.). Semantic information processing, Cambridge Mass., M.I.T. Press, 227-270.

[Van Rossum 93] Van Rossum, G. (1993). An Introduction to Python for UNIX/C Programmers, in the proceedings of the NLUUG najaarsconferentie 1993.

[Yankelovich 87] Yankelovich, N. (1987). Creating Hypermedia Material for English Students, Sigcue-Outlook.

Acknowledgements

Special thanks to (i) CNET, for funding the project, (ii) Jean-Pierre Poitou (CREPCO, University of Provence), the first person who advocated Paul Otlet 's writing to us and (iii) Darren Millward, for his precious help during the compilation of this English version.