deep web - ReadWriteWeb http://www.readwriteweb.com/feeds/tag/deep web en Copyright 2012 Richard MacManus readwriteweb@gmail.com Tue, 14 Feb 2012 18:04:00 -0800 http://www.sixapart.com/movabletype/?v=4.35-en http://blogs.law.harvard.edu/tech/rss Google: "We're Not Doing a Good Job with Structured Data" During a talk at the New England Database Day conference at the Massachusetts Institute of Technology, Google's Alon Halevy admitted that the search giant has "not been doing a good job" presenting the structured data found on the web to its users. By "structured data," Halevy was referring to the databases of the "deep web" - those internet resources that sit behind forms and site-specific search boxes, unable to be indexed through passive means.

]]> Google's Deep Web Search

Halevy, who heads the "Deep Web" search initiative at Google, described the "Shallow Web" as containing about 5 million web pages while the "Deep Web" is estimated to be 500 times the size. This hidden web is currently being indexed in part by Google's automated systems that submit queries to various databases, retrieving the content found for indexing. In addition to that aspect of the Deep Web - dubbed "vertical searching" - Halevy also referenced two other types of Deep Web Search: semantic search and product search.

Google wants to also be able to retrieve the data found in structured tables on the web, said Halevy, citing a table on a page listing the U.S. presidents as an example. There are 14 billion such tables on the web, and, after filtering, about 154 million of them are interesting enough to be worth indexing.

Can Google Dig into the Deep Web?

The question that remains is whether or not Google's current search engine technology is going to be adept at doing all the different types of Deep Web indexing or if they will need to come up with something new. As of now, Google uses the Big Table database and MapReduce framework for everything search related, notes Alex Esterkin, Chief Architect at Infobright, Inc., a company delivering open source data warehousing solutions. During the talk, Halevy listed a number of analytical database application challenges that Google is currently dealing with: schema auto-complete, synonym discovery, creating entity lists, association between instances and aspects, and data level synonyms discovery. These challenges are addressed by Infobright's technology, said Esterkin, but "Google will have to solve these problems the hard way."

Also mentioned during the speech was how Google plans to organize "aspects" of search queries. The company wants to be able to separate exploratory queries (e.g., "Vietnam travel") from ones where a user is in search of a particular fact ("Vietnam population"). The former query should deliver information about visa requirements, weather and tour packages, etc. In a way, this is like what the search service offered by Kosmix is doing. But Google wants to go further, said Halevy. "Kosmix will give you an 'aspect,' but it's attached to an information source. In our case, all the aspects might be just Web search results, but we'd organize them differently."

Yahoo Working on Similar Structured Data Retrieval

The challenges facing Google today are also being addressed by their nearest competitor in search, Yahoo. In December, Yahoo announced that they were taking their SearchMonkey technology in-house to automate the extraction of structured information from large classes of web sites. The results of that in-house extraction technique will allow Yahoo to augment their Yahoo Search results with key information returned alongside the URLs.

In this aspect of web search, it's clear that no single company has yet to dominate. However, even if a non-Google company surges ahead, it may not be enough to get people to switch engines. Today, "Google" has become synonymous with web search, just like "Kleenex" is a tissue, "Band-Aid" is an adhesive bandage, and "Xerox" is a way to make photocopies. Once that psychological mark has been made into our collective psyches and the habit formed, people tend to stick with what they know, regardless of who does it better. That's something that's a bit troublesome - if better search technology for indexing the Deep Web comes into existence outside of Google, the world may not end up using it until such point Google either duplicates or acquires the invention.

Still, it's far too soon to write Google off yet. They clearly have a lead when it comes to search and that came from hard work, incredibly smart people, and innovative technical achievements. No doubt they can figure out this Deep Web thing, too. (We hope).

]]> Discuss]]>
http://www.readwriteweb.com/archives/google_were_not_doing_a_good_job_with_structured_data.php http://www.readwriteweb.com/archives/google_were_not_doing_a_good_job_with_structured_data.php Trends Mon, 02 Feb 2009 07:32:07 -0800 Sarah Perez
DeepDyve: Indexing the Deep Web deepdyve_logo_nov08.pngDeepDyve is a new search engine that is aimed at students, academics, and knowledge workers. DeepDyve's mission is to index the 'deep web' that is hidden behind pay walls and subscription fees. We first looked at DeepDyve in September, when it was still called Infovell and hidden behind a pay wall itself. Starting today, Infovell has not only changed its name, but is also available in a free version.

]]> Since the launch of its first version in September, DeepDyve has slightly improved its user interface, but if you have used a subscription database before, DeepDyve's interface and feature set, with the ability to narrow your results by subject areas and save your searches, will look quite familiar.

Paid Version

DeepDyve also released a paid version of its service for $45 a year month, which will also allow you to refine your searches by content type. Other features of the paid version include dynamic clustering, visual clustering, and advanced search.

Verticals

DeepDyve is slowly expanding into more search verticals, but for now, its focus is on life sciences, physical scienes, and patents, though it also indexes a few humanities journals. The service also indexes newspaper and lets you search for Wikipedia articles as well. Overall, DeepDyve's index consists of about 500 million pages.

deepdyve_results_nov08.jpg

What's New?

DeepDyve launched with a good amount of hype this morning, but after our initial tests, we have come away somewhat disillusioned. Most users who need to search academic sources can already do so through databases like Academic Search Premier, Lexis-Nexis, PubMed, or Science Direct. These services also typically feature more advanced search functions and often give you direct access to the full text of your sources as well.

Information is Still Behind a Pay Wall

As useful as it can be to be able to search the deep web, most of the articles retrieved by DeepDyve still sit behind paywalls anyway, and you either need to have access to an institutional subscription to access these sources or pay a hefty fee per article.

DeepDyve markets itself as being the first search engine that allows its users to "access a wealth of untapped information that resides on the 'Deep Web'" - and if you forget about Google Scholar and the myriad of subscription databases, then that is surely true. In its current incarnation, however, DeepDyve is mostly an interesting technical experiment.

]]> Discuss]]>
http://www.readwriteweb.com/archives/deepdyve_indexing_the_deep_web.php http://www.readwriteweb.com/archives/deepdyve_indexing_the_deep_web.php Product Reviews Tue, 11 Nov 2008 09:41:17 -0800 Frederic Lardinois
Weekly Wrapup, 15-19 September 2008 It's time for our weekly summary of Web Technology news, products and trends. This week we surveyed the leading online banking products and 10 recommended photo sharing sites. We also checked out a new 'deep web' search tool and reported on Joost's move to the browser. Our prediction question this week was about the controversial 'Twitter for enterprise' app Yammer - check out the results below. On the trends side, we looked into a report about "super influencers", gave you some suggestions for quality social media consultants, reported on the latest Tim Berners-Lee foundation, and analyzed how the economy shake-ups this week affect the tech sector. Last but not least, we bring you the latest from our new Enterprise Channel.

]]> Web Products

Banking 2.0: Money Management Moves to The Cloud

There was a time when managing finances from your computer meant you had to use desktop software. Today, that's no longer the case. There are now a number of applications that let you do your banking in the cloud, a trend we've dubbed "banking 2.0."

These sites aren't just simplified versions of our former desktop apps, either. Instead, they offer a number of features that take advantage of their "always on" status. Forget downloading updates and typing in your transactions line-by-line, these new banking 2.0 sites can offer you better insight into your financial situation with no additional effort on your part beyond just logging in.

Store, Tag and Print: 10 Great Photo Sharing Services

photobucket_roundupThe photo sharing market is growing at a steady clip and new services are released regularly. In this round-up, we compare the features and usability of 10 of our favorite photo sharing sites. Two years ago, we published a similar list. So now seems a good time for us to revisit the topic.

Some of these sites focus more on mainstream users and photo-finishing, while others stand out because of their extensive social features. Note: we've included a full table of features for the services listed (see below).

photo_sharing_round_up_2008_small.png

Click here for the full-size version of this table.

Sometimes Google Isn't Enough: New Research Engine Searches "Deep Web"

What do you do when you need to research something on the web? You just google it, right? Using a web search engine like Google is usually fine for casual searches, but when you need to delve deep into a subject, it just won't do. What you really need is a research engine that explores the unindexed reaches of the Deep Web. For that, there's now Infovell, "the world's research engine."

Less than 0.2% of the web is indexed and some of the most valuable information lies beyond the search results returned from traditional engines. That's where a service like Infovell can help. This new subscription-based software-as-a-service (SaaS) engine lets you explore content found on the Deep Web.

See also: Semantics + Google = SemantiFind

Finally: Joost Now Available on the Web

joost_logo_sep08.jpgThis week Joost announced that all of its content is now available directly on its website and not just through its desktop client. Joost was one of the most hyped-up companies on the web when the peer-to-peer streaming video service was still in stealth mode in 2006; and beta invites were rare and coveted. However, once users actually got a look at Joost, disillusion quickly set in. Joost's video quality was very high and it had signed up a wide range of content producers, but its downfall was its reliance on a desktop client. Users were already switching to viewing video on the web and having to start up a client just to watch video was simply too inconvenient.

RWW Predictions: Funding for Yammer

This week's prediction question focused on the winner of the TechCrunch50 event: Yammer. Yammer is a communications product that duplicates the functionality of Twitter, but with an enterprise twist. We certainly have our doubts about Yammer as an enterprise tool. However, we wanted to know your prediction for the financial future of Yammer. Will Yammer raise a round of funding in 2008 or 2009. If so, how much will they take? As at time of writing, here were the results:

SEE MORE WEB PRODUCTS COVERAGE IN OUR PRODUCTS CATEGORY

Web Trends

Are You a Super Influencer?

A new report from Universal McCann discusses the rise of "a new breed of super influencers" that has been created by "the tools of the social media revolution." Before we all don our superhero capes, let's look more closely at the findings of the report.

Entitled When did we start trusting strangers? How the internet turned us all into influencers, the premise is that influence was moved beyond "professional and top down" (mainstream media) and into Web-enabled peer to peer influence. But despite McCann calling this a "democratisation of influence", all influencers are not equal. There are "super influencers" who are "extremely heavy users of social media, particularly in terms of content creation." Are you one of these people? Let's check out what the characteristics are...

Seven Social Media Consultants That Deliver Tangible Value

hotairlogo.jpgIs social media nothing but snake oil? Sometimes it can seem that way. As economies shift and trends emerge, would-be experts start popping up like weeds. Really good social media experts are a treasure - and they're not always easy to find.

In this post we highlight seven social media consultants that consistently bring tangible value to the table. These folks aren't full of hot air - they use their blogs to offer clear examples, links, tutorials and other resources you can put to use. If the goods you can see for free are so solid, that's all the more reason to investigate paying for these peoples' services. We hope this list will help you get smarter and maybe save a whole lot of money and anguish.

Tim Berners Lee Launches World Wide Web Foundation - Will it Be Effective?

wwwfoundationlogo.jpgTim Berners Lee, the inventor of the World Wide Web, announced this week the formation of a new organization dedicated to studying how the web works and expanding access to the billions of people who can't get online today. The World Wide Web Foundation kicked off with $5 million in support from media funders the Knight Foundation.

Can yet another organization really make a difference? Some observers seem to be suffering from Organization Fatigue, but we're interested to see what Berners Lee can do. A group dedicated to deep study of the web and the obstacles to its growth sounds like a great idea to us. Not everyone agrees.

How Decoupled is The Innovation Economy From Rest of The Economy?

What a week of market mayhem! How odd having that as the backdrop to the Web 2.0 Expo in New York. We have been sounding alerts about the economic backdrop to our world of innovation for nearly a year. Back in February we wrote that this is not our bubble. Since then, the news from the economy has gotten worse and nobody is suggesting it will get better any time soon. Reading the papers is pretty grim (unless you stick to Sports or Arts). Yet we contend that it is not grim in the 'innovation economy'. Here's why...

SEE MORE WEB TRENDS COVERAGE IN OUR TRENDS CATEGORY

RWW Enterprise Channel

Report: Nearly 70% of Businesses Allow Social Media Usage

A new report about Enterprise adoption of Web 2.0 technologies, by Awareness, Inc., shows that employers are increasingly allowing staff to use social media applications in working hours. Awareness puts the figure at 69 percent of businesses in 2008, up from 37 percent last year.

It's the latest in a string of reports this year - from Awareness, Forrester and others - which provide data about the growth of web 2.0 in the enterprise. It'll be a $4.6 Billion industry by 2013, according to Forrester. See more of Awareness' findings in this post.

Email us if you're interested in writing for ReadWriteWeb's Enterprise Channel.

SEE MORE ENTERPRISE COVERAGE IN OUR ENTERPRISE CHANNEL

That's a wrap for another week! Enjoy your weekend everyone.

]]> Discuss]]>
http://www.readwriteweb.com/archives/weekly_wrapup_15-19_september_2008.php http://www.readwriteweb.com/archives/weekly_wrapup_15-19_september_2008.php Weekly Wrap-ups Sat, 20 Sep 2008 05:00:00 -0800 Richard MacManus
Sometimes Google Isn't Enough: New Research Engine Searches "Deep Web" What do you do when you need to research something on the web? You just google it, right? Using a web search engine like Google is usually fine for casual searches, but when you need to delve deep into a subject, it just won't do. What you really need is a research engine that explores the unindexed reaches of the Deep Web. For that, there's now Infovell, "the world's research engine."

]]> Less than 0.2% of the web is indexed and some of the most valuable information lies beyond the search results returned from traditional engines. That's where a service like Infovell can help. This new subscription-based software-as-a-service (SaaS) engine lets you explore content found on the Deep Web.

What Does Infovell Do?

The engine scours through open-access repositories of information like PubMed Central and the U.S. Patent and Trademark Office Claims, but it also allows access to scholarly journals such as those from Oxford University Press, SAGE, Taylor & Francis, Annual Reviews, Mary Ann Liebert Publications, and more. The culmination of these billions of pages currently unindexed by other engines, gives you access to content in the areas of Life Sciences, Medicines, Patents, Industry News, and other reference content from expert sources. In addition to just functioning as a search engine, Infovell can also deliver breaking news alerts which are automatically sent to your email, PDA, or any other device you choose.

It May Look Boring, But It's Not

In the demo (see video below), the team from Infovell showed how their engine could be used for researching a medical condition - something that many people try to do today using Google, but with little success. Generally, web searches only return results to sources of general information like the Mayo Clinic results, WebMD, or online support groups. To be able to research something by reading through the actual journal articles that the doctors have access to would be a huge step towards democratizing the world's knowledge.



Why Can't Information Be Free?

Unfortunately, that knowledge is not being set free with Infovell. Instead, the service will exist behind a pay wall, which once again puts the power of information into the hands of those that can afford its access. Although expected, it's disappointing to see that this service will be yet another source of critical information which most people won't have the time or financial resources to use it. Case in point, if someone needs to research a medicinal condition in that much detail, it's a sure bet that they have doctors' bills that are a bigger priority than a subscription fee to a search engine.

Why isn't anyone building a Google for the Deep Web? If Infovell is offering a collection of scholarly information and putting a price tag on its access, why can't someone else build a similar collection and wrap ads around the service to monetize it? We love the idea of this type of service, but would would rather see a bigger effort to open up the unindexed web and deliver it to the public for free.

Infovell will be available for a 30-day free trial, starting September 22nd.

]]> Discuss]]>
http://www.readwriteweb.com/archives/sometimes_google_isnt_enough_when_researching_deep_web.php http://www.readwriteweb.com/archives/sometimes_google_isnt_enough_when_researching_deep_web.php Product Reviews Thu, 18 Sep 2008 08:26:30 -0800 Sarah Perez
WorldWideScience: Like Google for Deep Web Science Need to get access to real scientific data but having trouble finding any relevant search results in Google? That could be because a lot of the science and technology documents on the web aren't typically indexed by major search engines. They're a part of the "deep web," the repository of web pages usually generated by database-driven sites that search engines' spiders can't access. One resource to help open up the deep web for scientific research is WorldWideScience. This portal allows you to query more than 200 million documents not typically indexed by today's search engines.

]]> About WorldWideScience

WorldWideScience is a science portal developed and maintained by the Office of Scientific and Technical Information (OSTI), an element of the Office of Science within the U.S. Department of Energy. The WorldWideScience Alliance, a partnership consisting of participating member countries provides the governance structure for the WorldWideScience.org portal.

When it debuted back in June 2007, it linked to 12 databases from 10 countries. Today, the portal links to 32 national, scientific databases and portals from 44 different countries.

WorldWideScience Homepage

How To Use WWS

To use the portal, you just enter a search term, as you would with any search engine and click "search." An advanced search feature lets you specify more details like title, author, or year, and lets you specify which databases to query.

Unlike Google, where results are ranked based on an algorithm that essentially displays items by popularity, WorldWideScience provides only authoritative scientific information by relevance - a ranking that is noted by the number of stars next to the result. The higher the number of stars, the more relevant the result.

Another difference between WWS and other search engines is that WorldWideScience's results are retrieved in real time. So, as you search and results come in, you may see a box appear with a "include these results" button. Clicking this will update the list with the latest information.

On your search results page, there are several features that make finding the answers you need easy to do. On the left, are "clusters," which let you narrow down a broad subject by specifying topics or dates. On the right, a snippet from Wikipedia provides a quick definition and link to an article about the subject you queried. Below that, a "EurekAlert!" section provides links to relevant articles from EurekAlert!, an online, global news service operated by AAAS, the science society. EurekAlert! is like a PR news wire for scientific research, providing a central place through which universities, medical centers, journals, government agencies, corporations and other organizations can bring their news to the media.

WorldWideScience Search Results

The WorldWideScience portal is a great resource for anyone looking for the most current findings from fields such as technology, energy, medicine, agriculture, environment, and more. You don't have to be a student, professor, or researcher to enjoy the richness of the data provided, either, as WWS has been designed to be easy enough for anyone to use. You can try it out for yourself here: worldwidescience.org

]]> Discuss]]>
http://www.readwriteweb.com/archives/worldwidescience_like_google_for_deep_web_science_stuff.php http://www.readwriteweb.com/archives/worldwidescience_like_google_for_deep_web_science_stuff.php Product Reviews Mon, 16 Jun 2008 07:29:00 -0800 Sarah Perez