Originally published in Technology Review, July/August 2000
"Survival of the hittest" leaves a precious record crumbling [*]
Think of the Web as an enormous, slow hard disk. Shared by the entire world, this disk holds a record of radical media experimentation, the history of a form that sprang up less than a decade ago to infect popular consciousness and transform the way we use information. Yet despite a few archival projects, no one is backing up our collective disk.
Those who forget the past are condemned to reload it.
That's not what you'd conclude from a casual glance at leading Web sites. Almost every major Web magazine has an "archive" which holds old content. These are not real archives, however, any more than home pages are real homes, or real pages. They do not preserve early versions of the site — they only keep the most popular old content online and accessible, for the sake of additional banner-ad revenue. The archive of HotWired is typical in leaving out some early content: A serialized novel and an advice column, early Web-based experiments in these forms, are omitted. The old content that remains online is seldom in its original format, even though form is of clear importance to the Web's development.
Those who forget the past are condemned to reload it. The Web's advance has been rapid, but that's all the more reason to study it with care. In all the confusion, it's easy to lose track of what publications and business models have already been tried, and with what results. For students of new media, understanding the sometimes arcane structure of Web sites is even more difficult without knowing about what has come before. Even important political phenomena such as the groundswell of opposition to the Communications Decency Act cannot be accurately considered without looking at the essential, primary source: the blackened Web pages of 1996. From a business, media-studies and historical perspective, the Web's past is worth remembering.
Early Web developers tossed together a salad of old-media forms and genres, graphic and interactive design.
The Web began as an all-text system at the end of 1990, used by an international community of physicists. In early 1993, it had fewer than 100 servers providing scientific and technological information. The turning point came in November 1993, with the release of the Mosaic browser, well-designed software that could load and display graphics. Mosaic, which came from the National Center for Supercomputing Applications (NCSA), was not the first graphical browser. It was, however, simple, effective and compatible with different operating systems. Mosaic led to incredible innovation in the following years. By the end of 1994 the Web's 2500 servers hosted cultural magazines, banner advertising and unique efforts such as the Internet Movie Database. That project, a collaborative attempt to create a comprehensive filmography of the world's movies, would have been impossible without a far-reaching data repository like the Web.
Scrambling to put together Web sites with no guidelines or precedents, early Web developers tossed together a salad of old-media forms and genres, graphic and interactive design and both esoteric and offhand document organization schemes. In this experimental time, all of today's principles of cross-platform design and site navigation were devised. "Survival of the hittest" determined what types of sites worked. Universities found the Web an easy and money-saving way to provide information to different groups: prospective students, enrolled students, alumni and others. Hardware vendors such as Dell and Cisco found that Web sites perfectly suited their technically proficient buyers. Even the Web's losers made interesting advances in interactive design, online writing and business development, some of which were simply before their time. Security First Network Bank launched in 1995 as the first Internet bank, for instance. The vice president of Chase Manhattan's e-commerce division denounced it as "a dismal failure" in 1998, when it was sold, along with an associated software company, to the Royal Bank of Canada for $29 million. Yet major players like American Express are now starting to offer similar Internet-only banking services.
The innovation that characterized the mid-1990s is a thing of the past. It has been replaced by many different forms of creative development, of course, as designers refine the fundamental advances made during the Web's early era. But by the end of 1996 the Web's basic conventions had been established, in only three post-Mosaic years. For the printed book, this early developmental period — in which "incunabula" were printed (and things like page numbers and tables of contents were figured out) — lasted about fifty years.
What has happened to the actual Web pages that were marked up during the Web's salad days? Some are still around, but for the most part they are changed beyond recognition, and the original versions exist only on some obscure and offline hard disk. The online novels Delirium and As Francesca and the early Web serial The East Village can no longer be seen — and those works were offered on major Web sites whose parent companies are still around. Some concerted attempts are being made to preserve material from this era: The 1996 presidential election Web sites, for instance, are the target of a current Internet Archive project. (That organization also plans to maintain copies of the whole Web, stored at various points in time.) The election sites have popular appeal as candidates for preservation. They're considered notable by the offline world and are of some importance to this country's political history. But they're not relevant to the development of the Web as a medium.
Consider instead the first daily Web magazine, the cranky and crack-joke-filled Suck — an exemplary first-wave site (to which I contributed some pseudonymous items). Suck, which appeared in August 1995, put fresh daily content right on the home page instead of burying it within the site. Each elegantly laid-out column of text tore into many of the day's sillier Web experiments: Turner Entertainment's Spiv, Web soap operas like The Spot, the subscriber-based model of Microsoft's Slate. In the early days, this edgy text was illustrated with images lifted from the sites it panned, presented at a characteristic tilt. Suck's writers figured out how to use the hypertext medium cleverly. They linked to pages not to refer the reader to further information, but to lampoon absurdities and recontextualize Web pages humorously. For instance, a link to the Netly News (a publication of Time Warner, presented as part of the Pathfinder site) wasn't there so the reader could actually learn more about some newsworthy topic. It was to point out how the tone and daily release of the Netly News was weakly ripping off the Suck concept.
Suck is still up and twitching.[**] It has gone through expansion and reduction, and been sold twice-first to Wired Digital and then, along with other Wired Digital properties, to Lycos. The original simple and direct design has been made more convoluted by upgrades, but catchy illustrations have been added. The writers now take on a broader range of pop-culture targets, humiliating TV programs and youth subcultures, not just Web sites. The site has stayed irreverent and somewhat relevant, and has not become as thoroughly encrusted with features and rimmed with "portal" links as has most of the Web today. But looking through an early Suck article now reveals the fate of many mid-decade Web pages. The writers, complaining about how stupid and doomed to failure many of the early attempts really were, were largely right. Many witty links are now dead, leaving the wry hypertext without its digital straight man.
Despite the digital nature of online information, real archiving comes at a cost.
The decay of 5-year-old digital humor may not be cause for mourning, except among scholars of new media. The loss of early Web sites isn't entirely academic, though. Those who are plunging into startups today should look closely at what succeeded and failed back around 1995, when ".com" was a dirty phrase instead of a lucrative suffix. Take, for example, one of the Web's most successful companies, Yahoo!, which started as a student home page at Stanford University. Yahoo! gained popularity and, with its more useful organizational scheme, eclipsed the well-established index of the day, NCSA's What's New. Of course, many of the sites this early Yahoo! actually linked to are gone. The real woe is that businesspeople hoping to emulate Yahoo!'s success, as well as students of computing and media history, can't easily see what version 1.0 of Yahoo! looked like and compare it to the rival index. The original Stanford home page — the Web's first table of contents — is long gone. The Internet Movie Database has been even more drastically transformed. The collection of movie reviews, originally contributed by volunteers who had no financial interest in the films they wrote about, is now owned by Amazon.com and used to market videos.
The disappearance of valuable Web content will not be stopped by simply selecting "Save As ... ." Despite the digital nature of online information, real archiving comes at a cost. For one thing, sites that are stored for posterity must be maintained in a way that is verifiably legal and respects the copyright and privacy interests of content creators. Legacy browsers have to be kept on hand, too, so one can see the early Web in the way surfers encountered it back around 1994. Finally, magnetic media and even CD-ROMs degrade after decades, and data has to be copied over every few years if the material is to be safely preserved.
The Internet Archive project is taking such factors seriously, although that project has made some curious omissions. For instance, although one of the project's directors is a librarian, the Archive does not have an archivist on its board. The organization's recent approach of focusing on a handful of specific sites is sound, but pages of greater importance to the culture and medium of the Web could have been selected. Choosing a few sites, though, is certainly a better idea than the Internet Archive's original plan to preserve every bit of the Web using data from the company Alexa — a Sisyphean task to which the organization remains devoted. The issues of copyright, privacy and access are tractable when specific sites are chosen for preservation. To write the whole Web to a giant array of hard disks, on the other hand, is a showy and largely useless technological gesture.
The Web sites of lasting interest are early versions of innovative business, publishing and artistic ventures — not Ross Perot's home page. Unless we act to preserve important sites, many of which are already offline, the Web's origins will become even murkier. Developing the Web intelligently, and trying to understand it, will be made harder by our lack of perspective.
It's quite possible that the origins of the most technologically advanced worldwide system for publishing and communication, now less than a decade old, may one day be known only through isolated scraps of information and the hazy recollections of aging geeks, trying to recall when they created their first animated GIF, or when they first used a Web conferencing system, or when they visited Yahoo! back in the day, when it was on konishiki.stanford.edu ... or was that akebono.stanford.edu?
[*] It is hard to imagine a more succulent irony, but this essay about the disappearance of Web content, which was published on the Web as well as in print, was removed from the Web in late 2001, when Technology Review began selling PDFs of articles instead of making them freely available.
[**] Alas, no more. In June 2001 Suck ceased publishing new materials. The articles that have been published remain online.