DeepWeb Ad

Saturday, April 25, 2015

Mining the Deep Web

Although search engines like Yahoo!, Bing and Google index billions of web pages and other electronic documents, this represents only a tiny part of the total information available on the World Wide Web. To unearth the buried treasure, you have to understand how to mine the data.
Two Layers of Data Think of the Web as having two layers: a shallow surface and an almost bottomless, deep level. In the top layer, the Surface Web, you will find all the web pages like the one that you're now reading. This page and others like it have fixed web addresses or URLs (in this case, http://www.learnthenet.com/how-to/search-the-deep-web). Also, the information contained in the page doesn't change very often.
The Deep Web contains pages with dynamic content--data that changes frequently and can't be indexed easily by search engines. Most of this information is stored in databases and is assembled "on the fly" when you query the database. For instance, when you search for an item on eBay, information is pulled from eBay's database and instantly assembled on a web page for you. That page did not exist until you performed your search, which is what makes it dynamic; it was customized in response to your query. Because of this fact, search engines can't readily index this information.
Other types of "deep" information include:
  • Multimedia (audio, music and video)
  • Photos and graphics
  • Job listings
  • Financial data (stock and bond prices, currency rates)
  • News
  • Travel-related data (airline and train schedules)
  • Information on sites that require passwords

Thursday, April 2, 2015

NSF's DarkWeb Life imitates Art

The National Science Foundation (NSF) is funding the University of Arizona in developing a project they call the Dark Web to track down terrorists on the net.
When I read the NSF press release that my friend Randy A. pointed out to me, I could have sworn some of it was describing chapters of The Dark Net.


"They can put booby-traps in their Web forums," Chen explains, "and the spider can bring back viruses to our machines." This online cat-and-mouse game means Dark Web must be constantly vigilant against these and other counter-measures deployed by the terrorists.

Dark Web's capabilities are also being used to study the online presence of extremist groups and other social movement organizations. Chen sees applications for this Web mining approach for other academic fields.

"What we are doing is using this to study societal change," Chen says. "Evidence of this change is appearing online, and computational science can help other disciplines better understand this change."