Apple co-founder Steve Wozniak is joining the advisory board of the research engine DeepDyve, a search engine designed to scour the "deep web." This "deep web" is an area of the internet that isn't currently indexed by modern-day search engines like Google and yet consists of an estimated 99.8% of the Internet. Any company that is able to successfully tap into this data will be the one to introduce the next breakthrough technology in search as we know it. Will that be DeepDyve?
According to Wozniak, most of the information on the web is "collecting dust because nobody's come up with a way to mine the data in a way that's useful to researchers and consumers." He believes that DeepDyve has the potential to transform Deep Web search and says he's "excited to bring about that transformation."
Wozniak's role at DeepDyve isn't limiting in any way, but the company expects that his contributions will focus on the DeepDvye technology, especially as it relates to the user experience. As a member of the Advisory Board, he will also meet formally with the company twice per year.
DeepDyve is currently known for their KeyPhrase technology which lets you type in anything in their search box from a few words to entire paragraphs that you copy and paste. The search engine's algorithm itself was developed by two scientists who worked on the Human Genome Project. As with that project, which required using pattern-matching techniques across large amounts of data, there's also a need for search engines that can analyze large amounts of data in the same way. That's precisely what DeepDyve does.
Typically, keyword search on other engines breaks down as queries grow in length - but not on DeepDyve. The more search terms you enter, the more relevant your results. The DeepDyve engine actually encourages longer search queries. This type of search technique certainly comes at a good time, as our query length is growing each year, with 8-plus keyword searches having increased 20% year-over-year as of February, 2009.
We looked at DeepDyve back in September when it was still behind a paywall, then reviewed it again in November when they introduced a free version. At the time, we noted that there were still some issues with any "deep web" search engine - most notably that a lot of the information which DeepDyve uncovers is still behind additional paywalls on subscription-based web sites. Today, that issue still remains.
However, the complaints of many of the commenters on the last post were not about the paywalls but about how the site forced you to register before you could do any searches. As one anonymous commenter noted, to paraphrase, "if you're going to launch a search engine, open it up so people can use it."
It seems DeepDyve took that advice. The search box on the homepage is immediately accessible and even the results pages have a more refined look today than they did only a few months prior.

These sorts of complaints highlight the problem with reviewing cutting edge technologies when they're still in such a raw format - people go there expecting Google, find what appears to be a boring research experiment and then become disillusioned. What they fail to see - what they cannot see, in fact - is that a cursory glance can't reveal the technology behind the site or service. The technology in DeepDyve, for instance, involves advanced algorithms that other engines don't even have yet. The usability issues will be addressed in time and the issues with access to content behind paywalls could always eventually be worked out through partnership deals. But that business side of the DeepDyve project isn't anywhere nearly as interesting as the potential of gaining access to 99.8% more of the Internet!
Comments
Subscribe to comments for this post OR Subscribe to comments for all ReadWriteEnterprise posts
I've been trying out various "semantic" search engines with similar queries over some time now to see if any of them do a better job on answering questions than conventional means.
Deep Dyve's generic answers seem significantly worse than Google's for many of my queries -- but then, so do most other such search engines. As a general rule, I'm more likely to be able to read the answer to my question in the excerpted blue Google links than I am in semantic search result lists.
There's a limit to how good an answer you can give just by rummaging through Wikipedia or Freebase, like most of these engines do (due to lack of resources to cover the web in general), in order to answer generic queries. But PowerSet and various others are no better either.
However, in their specialty areas like Life Sciences, Deep Dyve may well do a much better job -- I'm just not competent to judge those kinds of specialized queries.
Hi,
i am interested in those algorithms which are used in showing the whole paragraph out of just one query, can you specifically name them please?
Sarah said...
Typically, keyword search on other engines breaks down as queries grow in length - but not on DeepDyve. The more search terms you enter, the more relevant your results.
Sarah, that's not necessary true. That might be true in old Boolean type or Vector Space type search engine, but not in LSI (Latent Semantic Indexing) type search. LSI can query using a single key-word or a bag of key-words (phrase, etc...). The longer the bag of key-words, the more relevant the search is. There are different types of LSI that are in existent today, and they do vary in their accuracy. It is reported here that Google is already using LSI. Perhaps the difficulty for them (Google) is how to combine LSI and their PageRank algorithm into one output, because the 2 algorithms are completely different. LSI works on a matrix of "document by keyword" frequency, while PageRank works on a matrix of "document-inbound by document-outbound" links.
Mansoor Ahmed , what you're looking for is LSI. It is based on an algorithm called SVD (singular value decomposition) and there are lots of open source on the internet that already contains the SVD algorithm.
Interesting, I recall reading LSI papers at Bellcore almost 20 years ago.
I don't think they are using LSI or semantic approaches. If you look at their results, the information they are bringing back is "different" than what Google generates and, more often than not, in a good way. Google is very good at getting to a specific answer you know is out there, DeepDyve seems to be very good at helping you learn about things you may not be familiar with. I experienced this when I was looking for a specific health issue that a friend had, starting with the name of the disease the doctor told him and then using the 'more like this' functionality to explore deeper into information. Within a few minutes I was definitely better informed and I didn't have to type in anything else except for that starting disease name. Is DeepDyve perfect? Not by a long shot, but it sure makes it extremely easy to research information and learn about things you may not be familiar with.