Is All of Human Knowledge on the Internet?

One of the primary roadblocks to expanding comprehensive, digital libraries is copyright.

Like a nerdier Nostradamus, H.G. Wells practically predicted the Internet in his 1937 essay "World Brain: The Idea of a Permanent World Encyclopedia."

In it, Wells describes this futuristic encyclopedia (made possible in his mind by revolutionary microfilm) as a "world organ to 'pull the mind of the world together,' which will be not so much a rival to the universities, as a supplementary and coordinating addition to their educational activities - on a planetary scale."

And in many ways, Wells' vision has been realized by the Internet. Digital archives scattered among servers around the world house innumerable books, documents, records, photographs and films that collectively represent an outpouring of human knowledge.

HOWSTUFFWORKS: How Does the Internet Work?

"That (H.G. Wells) essay collection is utopian, but really, if you look at what we're all trying to do, this idea of a permanent world encyclopedia that he has, it's really a template for what's happening," said Paul Jones, director of the digital archive and associate professor of information science at the University of North Carolina, Chapel Hill.

"The real question is can that ever be accomplished, and the answer is 'no' - but why not try?" Jones told Discovery News.


For the past 18 years, Jones and others working with Ibiblio have been digitally preserving collections as well as "vernacular work," which are freely accessible works in the public domain. A well-known example of vernacular work is the collection of songs composed by Roger McGuinn, former leader of The Byrds, which he's published under a Creative Commons shared licensing agreement.

Although establishing digital libraries depends on server space, real tug-of-war over how many knowledge works (books, recordings, other documents) will end up accessible online happens between librarians and lawyers.

Why? One word: copyright.

"One of the primary roadblocks (to expanding digital libraries) is copyright," said Maura Marx, a fellow at the Harvard Berkman Center and lead organizer of its Digital Public Library of America initiative. "Its one-size-fits-all nature locks up all works as if they will remain commercially viable for extended periods of time. Not everything is "Harry Potter" - there is no provision, for example, for circulation of scholarly works after an initial period of commercial distribution, or for any other deviation from locking things up for life, plus 70 years."

While the legal system hashes out copyright issues, establishing The Digital Public Library of America could bring together many disparate archiving initiatives, such as digital collections at universities and other institutions, into a single unified resource.

The European Council, Norway and the Netherlands have already made significant strides toward national and international digital databases like this, and Marx thinks that a similar resource in the United States would broaden awareness and use of the wealth of knowledge works archived online.

This type of nonprofit resource is distinctly different than commercial digital repositories like Google Books, Marx says.

"Amazon and Google are commercially driven and have a responsibility to shareholders, the bottom line," Marx said. "A knowledge commons can support the emergence of other types of transactions."

But setting aside legal and cooperative issues, what is the current sum of all this archiving? Despite copyright restrictions, how much human knowledge is on the Internet at this point – if it's even quantifiable?

Since the Internet is comprised of an ever-changing number of servers, pinning down the precise amount of data contained online is practically impossible.

"At any rate, you can increase the amount of data on the Internet simply by turning on a new server, which happens every second of the day," said Jonathan Strickland, tech expert at and co-host of the TechStuff podcast.

Really, the only certainty about the amount of human knowledge online is that it will continually grow.

"I'd compare it to asking the question 'How many books are in the library?'" said Strickland. "If you take the question literally, you'd have to check the library's inventory as well as all the books that had been loaned out and subtract the second figure from the first, and by that time more books will have been returned and loaned out, making the figure meaningless."