What do you do when you need to research something on the web? You just google it, right? Using a web search engine like Google is usually fine for casual searches, but when you need to delve deep into a subject, it just won't do. What you really need is a research engine that explores the unindexed reaches of the Deep Web. For that, there's now Infovell, "the world's research engine."
Less than 0.2% of the web is indexed and some of the most valuable information lies beyond the search results returned from traditional engines. That's where a service like Infovell can help. This new subscription-based software-as-a-service (SaaS) engine lets you explore content found on the Deep Web.
The engine scours through open-access repositories of information like PubMed Central and the U.S. Patent and Trademark Office Claims, but it also allows access to scholarly journals such as those from Oxford University Press, SAGE, Taylor & Francis, Annual Reviews, Mary Ann Liebert Publications, and more. The culmination of these billions of pages currently unindexed by other engines, gives you access to content in the areas of Life Sciences, Medicines, Patents, Industry News, and other reference content from expert sources. In addition to just functioning as a search engine, Infovell can also deliver breaking news alerts which are automatically sent to your email, PDA, or any other device you choose.

It May Look Boring, But It's Not
In the demo (see video below), the team from Infovell showed how their engine could be used for researching a medical condition - something that many people try to do today using Google, but with little success. Generally, web searches only return results to sources of general information like the Mayo Clinic results, WebMD, or online support groups. To be able to research something by reading through the actual journal articles that the doctors have access to would be a huge step towards democratizing the world's knowledge.
Unfortunately, that knowledge is not being set free with Infovell. Instead, the service will exist behind a pay wall, which once again puts the power of information into the hands of those that can afford its access. Although expected, it's disappointing to see that this service will be yet another source of critical information which most people won't have the time or financial resources to use it. Case in point, if someone needs to research a medicinal condition in that much detail, it's a sure bet that they have doctors' bills that are a bigger priority than a subscription fee to a search engine.
Why isn't anyone building a Google for the Deep Web? If Infovell is offering a collection of scholarly information and putting a price tag on its access, why can't someone else build a similar collection and wrap ads around the service to monetize it? We love the idea of this type of service, but would would rather see a bigger effort to open up the unindexed web and deliver it to the public for free.
Infovell will be available for a 30-day free trial, starting September 22nd.
Comments
Subscribe to comments for this post OR Subscribe to comments for all ReadWriteWeb posts
The reason it can't be free is that publishers make much bucks (way more then they could through advertising) selling this content into libraries. To protect that market, they can't make the information free.
Advertising is not a one-size-fits-all solution, and in the case of many information providers, ads would provide less then 1/100th the revenue per use that institutional sales do.
It's not ad supported because they likely want to actually make money.
I'd like to know what the source was for the 0.2% (percentage of web indexed) figure.
At any rate, I think it's a bit absurd and misleading to imply, even indirectly, that when you make a (free) Google or Yahoo search you're missing out on 99.8% of the total possible (valid) results. Especially in the context of the overview of a paid service.
It may not be free but I'm sure the quality of the content and search responses are far more accurate and credible than any free search out of Google. Someone who is willing to spend the money clearly wants more than what Google can offer and this is a good niche to fit into.
Craig
www.budgetpulse.com
@Justin: In the video, he says the figure comes from a U.C. Berkley report. Take that as you may.
Products needs tiered pricing, a large subset of function always remains free. Restricting information especially things like medical information to only those that can afford it does not follow the Google (Do no evil) creed. Don't expect to be acquired by Goog unless you spend a little bit more time on the Robin Hood mentality.
With 30 day free trials, people can use multiple accounts and locations anyway. However, that again restricts the info to people with higher intelligence and that once again defeats the purpose. Those less educated and with less money should have the right for information that may benefit them at least in terms of health if not for prosperity.
What is the true barriers to entry for this? No network effects, no restrictions on who can access the "deep" info (let's hope). I like the concept but semantic and context based searching is another area that will reduce the benefit this service provides. All that said, I like these types of service advancements.
We'll all just ask the computer what we need and get what we want based on our current context in the future anyway. Got to take the small steps in advancement to get there though.
Posted by: harleycw.pip.verisignlabs.com
|
September 18, 2008 12:48 PM
Actually a lot of what they seem to be indexing would be available via pay to play academic databases, technically not search engines but you have a search interface, some of which are available for free through public libraries:
http://www.wakegov.com/libraries/research/databases/medical.htm
Others would be available via academic libraries or the various companies that provide the databases. This stuff isn't free because many of the academic journals are very expensive.
If you're really interested in making research freely available, check out the Open Access movement which would lead to much more research being readily accessible:
http://www.earlham.edu/~peters/fos/fosblog.html
A similar, however much simpler initiative is the ScienceRoll Medical Search, what is also searching deep web sources. We are open to add any free targets on the request of the community.
I would take issue with that 0.2% figure from Berkley as well. Google is certainly not the end all be all solution for online search, but with a bit of skill one can certainly target to a bearable degree of results.
I have no issue with a fee-based service if I were in need of that information. Heaven knows we've paid Lexis/Nexis enough.
I can find the same with seconds using my own Google Co-Op search engine ( http://psychosearch.googlepages.com/ ) and some search engines like: http://medline.cognition.com/ ...it's for free.
The only new concept here is an ability to use big queries as search terms. But again: you can cut all the words of a cut-and-paste-query, that are not really important, and get similar results on free search engines.
If you have to pay, you just can subscribe to one of databases like Willey, SAGE etc., or subscribe to your library that gives you a possibility to use digital academic databases. But for searching-only purposes you don't need to be subscribed to database because there are several free databases of abstracts. In most of the cases there's no real need to search the full text, but just the title and abstract.
You might consider ISEN as a Google for the deep web. But it isn't. It more like an ISBN number and DNS system for deep web interfaces. see http://www.isen.org for more details.
Infovell does have some unique features and search algorithms, based on their presentations to me at Stanford University's HighWire Press.
For individuals who want access to a lot of research information without having to take out a subscription, I can suggest these alternatives:
1. Google Scholar. GS is a focused part of the google index and searches only the research literature.
http://scholar.google.com/
2. PubMed. PubMed is a government (National Institutes of Health) database that is focused only on the biomedical research literature. There are about 15 million articles indexed in it, and it is linked to databases or information that provide an overview to patients and caregivers who want to find out about a topic that is new to them.
http://www.ncbi.nlm.nih.gov/pubmed/
Both of the above provide links to the full text of the research literature. Abstracts (which contain a summary of the findings in the full article) are always free.
In the case of the research journals hosted by my organization at Stanford University -- which are indexed by Google Scholar and PubMed -- about half of the full text content (about 2 million of the 4 million articles we have online) are freely available -- articles may be free on publication or free a few months after publicaton. And, in the cases where the article isn't free, the abstract is; in many cases if you write the to publisher (via a link on the site) and ask for a free copy as a patient or caregiver, they will send you a copy (Pay Per View is also available, but the prices range from $5-$35/article). In PubMed, and beginning in Google Scholar, the link to the article text will tell you if it is free.
By all means try out Infovell during its free period. You might also try Google Scholar and PubMed.
John Sack,
Director, HighWire Press, Stanford University
Another free deep web search: Science.gov http://www.science.gov/ offers free research documents and science information from 13 major federal science agencies. These agencies contribute their science and technology documents which aren't typically indexed by major search engines.
Approximately 200 million pages are available and results are ranked by relevance. Options for sorting in other ways are also available.
Since searches are in real time, users get the most up-to-date information available at each participating site at the time of their search.
Valerie Allen
Science.gov Product Manager
U.S. Department of Energy
Office of Scientific and Technical Information http://www.osti.gov/
The most important question to ask is: Is the technology used by Infovell really better than the one used by google or even the open-source Lucene search engine?
Most search engine technologies already have some weighting functions on words and considerations on word proximity.
Also, the so called KeyPhrases technology sometimes may put too much weight on some none-sense phrases like "Although this is a ..." unless the phrases get curated manually.
In terms of biological/medical information, there is really no better places to go than http://www.ncbi.nlm.nih.gov/.
So, in order for Infovell to succeed, they really need to show some convincing proofs of their technology in terms of recall/precision, not just some cosmetically polished search results.
Mark Smith