Although search engines like Yahoo!, Bing and Google index billions
of web pages and other electronic documents, this represents only a tiny
part of the total information available on the World Wide Web. To
unearth the buried treasure, you have to understand how to mine the
data.
Two Layers of Data
Think of the Web as having two layers: a shallow surface and an
almost bottomless, deep level. In the top layer, the Surface Web, you
will find all the web pages like the one that you're now reading. This
page and others like it have fixed web addresses or URLs (in this case,
http://www.learnthenet.com/how-to/search-the-deep-web). Also, the
information contained in the page doesn't change very often.
The Deep Web contains pages with dynamic content--data that changes
frequently and can't be indexed easily by search engines. Most of this
information is stored in databases and is assembled "on the fly" when
you query the database. For instance, when you search for an item on
eBay, information is pulled from eBay's database and instantly assembled
on a web page for you. That page did not exist until you performed your
search, which is what makes it dynamic; it was customized in response
to your query. Because of this fact, search engines can't readily index
this information.
Other types of "deep" information include:
- Multimedia (audio, music and video)
- Photos and graphics
- Job listings
- Financial data (stock and bond prices, currency rates)
- News
- Travel-related data (airline and train schedules)
- Information on sites that require passwords