research tools - ReadWriteWeb http://www.readwriteweb.com/feeds/tag/research tools en Copyright 2010 Richard MacManus readwriteweb@gmail.com Sat, 13 Mar 2010 05:00:00 -0800 http://www.sixapart.com/movabletype/?v=4.23-en http://blogs.law.harvard.edu/tech/rss Zoetrope: New Web Crawler Allows For Searching, Analyzing The Ever-Changing Web Does Adobe think they can out-Google Google? Perhaps. The company is involved with Zoetrope, a joint project with researchers at the University of Washington. What they're building is a tool that allows for manipulating the web over time. Instead of the snapshot of the web you see today when googling, Zoetrope will let anyone use keyword searches to discover archived web information and look for patterns in the data found.

]]>Sponsor

]]> About Zoetrope

As with the Internet Archive, the data in Zoetrope's database is a backup of the entire web, including those pages which have changed over time. But this archive won't be limited to the somewhat inconsistent periodic snapshots of the web's content like the Internet Archive offers. It will encompass everything.

Using the intuitive Zoetrope interface, a user could compare historical changes of various data through time by comparing snapshots of different pages on the web. Analyzing different, changing elements on web pages, side-by-side and over a period of time is downright difficult today - if not impossible. But Zoetrope makes it happen.

The process is done using Zoetrope "lenses" to draw boxes around elements, connect data from one site to another, and pull up charts of relevant data, all while manipulating a slider to scroll back and forth through time. That may sound hard, but if you watch this video, you'll see that it looks surprisingly easy.

For Everyone, Not Just The Computer Savvy

In a way, this project is similar to Google's new visualization API, which lets developers use historical web data to build charts, graphs, gadgets, and the like. However, where Google's tool is aimed at the technically savvy programmer, Zoetrope, on the other hand, is for the average user. Says Dan Weld, a UW computer science and engineering professor who worked on the project, "Zoetrope is aimed at the casual researcher. It's really for anyone who has a question."

As noted in the Washington University article on the project, example uses of Zoetrope could range from the basic: checking historical rankings of favorite players on a sports team, to the advanced: comparing daily air pollution levels in Beijing to number of world's records broken each day in the 2008 Olympics. 

"Your browser is really just a window into the Web as it exists today," said Eytan Adar, University of Washington computer science and engineering doctoral student who's also a co-author of the research paper on the project.

"When you search for something online, you're only getting today's results...This is really a new way to think about storing information on the Web."

The researchers hope to offer Zoetrope for free as early as next summer.

Image credits: Color, Torley; Others, University of Washington

]]>Discuss]]>
http://www.readwriteweb.com/archives/zoetrope_new_web_crawler_searches_analyzes_ever_changing_web.php http://www.readwriteweb.com/archives/zoetrope_new_web_crawler_searches_analyzes_ever_changing_web.php Product Reviews Fri, 21 Nov 2008 07:47:01 -0800 Sarah Perez
Speed Up Your Research with ChunkIt chunkit_logo_sep08.pngChunkIt is a browser plugin for Firefox and Internet Explorer that wants to help you speed up your online research. To do so, ChunkIt preloads and searches through all the links on a given page and displays the search results in a large sidebar on the left side of your browser window.

The idea behind ChunkIt is that this will help you get to relevant search results faster, as you get to see your search terms within their context and not just in a short excerpt on a search engine.

]]>Sponsor

]]> ChunkIt installs a toolbar in your browser, which allows you to 'chunk' searches in your favorite search engine (ChunkIt supports Google, Live, Yahoo, AOL, and ASK), as well as the links on the currently visible tab. You can also use ChunkIt to search the currently active page itself.

Search Results in Context

In our tests, ChunkIt usually turned out to be the most useful when using a search engine. Having ChunkIt display all the instances of a keyword in their context does indeed safe you from having to click through to all the top search results. Often, if you are just looking for a specific fact, the 'chunks' in the sidebar will already give you all the information you need without having to visit any other site.

chunkit_sshot_sep08.png

Being able to search through the links from a given page is also quite useful. You could, for example, go to a newspaper homepage or a Wikipedia article and find all the links from this page that contain a certain keyword. This feature is also very useful when search academic databases. Of course, ChunkIt would be even more useful for a lot of academic and business users if it could also search PDF files, though that would probably slow the extension to a crawl.

We did notice, however, that ChunkIt often ran into trouble with sites that featured a very large number of links and often just refused to work on these pages.

If you are doing a lot of research on the web, ChunkIt is definitely an extension that's worth trying.

If you want to see ChunkIt in action without installing the extension, here is a short video introducing the service:

]]>Discuss]]>
http://www.readwriteweb.com/archives/speed_up_your_research_with_ch.php http://www.readwriteweb.com/archives/speed_up_your_research_with_ch.php Product Reviews Sat, 11 Oct 2008 01:00:27 -0800 Frederic Lardinois
Sometimes Google Isn't Enough: New Research Engine Searches "Deep Web" What do you do when you need to research something on the web? You just google it, right? Using a web search engine like Google is usually fine for casual searches, but when you need to delve deep into a subject, it just won't do. What you really need is a research engine that explores the unindexed reaches of the Deep Web. For that, there's now Infovell, "the world's research engine."

]]>Sponsor

]]> Less than 0.2% of the web is indexed and some of the most valuable information lies beyond the search results returned from traditional engines. That's where a service like Infovell can help. This new subscription-based software-as-a-service (SaaS) engine lets you explore content found on the Deep Web.

What Does Infovell Do?

The engine scours through open-access repositories of information like PubMed Central and the U.S. Patent and Trademark Office Claims, but it also allows access to scholarly journals such as those from Oxford University Press, SAGE, Taylor & Francis, Annual Reviews, Mary Ann Liebert Publications, and more. The culmination of these billions of pages currently unindexed by other engines, gives you access to content in the areas of Life Sciences, Medicines, Patents, Industry News, and other reference content from expert sources. In addition to just functioning as a search engine, Infovell can also deliver breaking news alerts which are automatically sent to your email, PDA, or any other device you choose.

It May Look Boring, But It's Not

In the demo (see video below), the team from Infovell showed how their engine could be used for researching a medical condition - something that many people try to do today using Google, but with little success. Generally, web searches only return results to sources of general information like the Mayo Clinic results, WebMD, or online support groups. To be able to research something by reading through the actual journal articles that the doctors have access to would be a huge step towards democratizing the world's knowledge.



Why Can't Information Be Free?

Unfortunately, that knowledge is not being set free with Infovell. Instead, the service will exist behind a pay wall, which once again puts the power of information into the hands of those that can afford its access. Although expected, it's disappointing to see that this service will be yet another source of critical information which most people won't have the time or financial resources to use it. Case in point, if someone needs to research a medicinal condition in that much detail, it's a sure bet that they have doctors' bills that are a bigger priority than a subscription fee to a search engine.

Why isn't anyone building a Google for the Deep Web? If Infovell is offering a collection of scholarly information and putting a price tag on its access, why can't someone else build a similar collection and wrap ads around the service to monetize it? We love the idea of this type of service, but would would rather see a bigger effort to open up the unindexed web and deliver it to the public for free.

Infovell will be available for a 30-day free trial, starting September 22nd.

]]>Discuss]]>
http://www.readwriteweb.com/archives/sometimes_google_isnt_enough_when_researching_deep_web.php http://www.readwriteweb.com/archives/sometimes_google_isnt_enough_when_researching_deep_web.php Product Reviews Thu, 18 Sep 2008 08:26:30 -0800 Sarah Perez
Our Kids Are Failing - And It's All Wikipedia's Fault! Talk about a knee-jerk reaction. Yesterday, news broke out in Scotland about how the internet was to blame for Scotland's failing exam pass rates. According to the Scottish Parent Teacher Council (SPTC), Wikipedia, among other sources, was cited as the reason as to why the students were failing. Is this a case of the internet making us stupid? Or do students just need to learn how to use the new research tools of the web a little more appropriately?

]]>Sponsor

]]> It's All Wikipedia's Fault!

According to the report, Eleanor Coner, the SPTC's information officer, said: "Children are very IT-savvy, but they are rubbish at researching." She noted that today's students do the majority of their research online instead of using books or other resources that could be found at the library.

The internet encyclopedia, Wikipedia, was one of the Council's main concerns because its very nature allows it to be edited by anyone and it is not updated by verified researchers, they said.

In addition, the Council was worried that students don't know how to research and tend to put faith in the validity of online resources. Says Ronnie Smith, the general secretary of the Educational Institute of Scotland, "We need to make sure youngsters don't take what they read online as fact."

Sounds familiar, doesn't it?

We've heard many of iterations of that phrase before - "don't believe everything you read," "don't believe everything you see on TV,"...now it's the internet's turn to be held up to scrutiny.

Is The Internet Making Us Stupid?

A quick glance at this news could lead one to believe that this is a clear case in support of Nicholas Carr's recent argument that the internet (or Google, as he says) is making us stupid. Easy access to a stream of information via the internet, he says, is affecting our ability to focus for long periods, like when reading and absorbing long articles (a trend we also mentioned here). In a way, Wikipedia could be seen as the ultimate manifestation of our convenience culture when it comes to information retrieval.

How many times per day do you do a quick search to look up a quick fact and winding up skimming the highlights on a Wikipedia page? (For me at least, I'll admit it's a the very least a daily occurrence, if not more.)

But are the failing test scores really an indication of our brains' ability to reprogram itself to adapt to this new way of learning, as Carr mentioned in his article? And is that ability really affecting our intelligence?

...Or Do Students Need To Learn How To Research ?

Perhaps not. If anything, this problem could point to the fact that educational institutions need to adapt their curriculums to include teaching students what real research is all about. A Google search may or may not lead them to valuable resources online, but many students today clearly don't know how to differentiate between what's legitimate and what's not. Being able to look at a piece of information online and challenge it in order to determine whether or not it is a fact is simply not a skill that many online users have. However, once this process is learned, students can apply it throughout their education - no matter what medium they use for research.

Image credits: Library: Canadian Veggie

]]>Discuss]]>
http://www.readwriteweb.com/archives/our_kids_are_failing_-_and_its_wikipedias_fault.php http://www.readwriteweb.com/archives/our_kids_are_failing_-_and_its_wikipedias_fault.php Trends Mon, 23 Jun 2008 09:00:00 -0800 Sarah Perez
Digital Image Resources on the Deep Web Sometimes you stumble across something that really makes you say "wow" and reminds you that there's so much more to this internet thing than just the latest web app. Case in point is this article describing some of the visual resources available on the web. The deep web. These images won't show up in search engines' image searches or on Flickr (save one exception), but instead can only be accessed via the links below.

]]>Sponsor

]]> The images are a part of online collections created by institutions in the U.S. Some of the images may be a part of the public domain, but many will require permission or accreditation in order to use. So, no, these aren't necessarily images you can use in your next blog post, but that doesn't mean they're not useful. Instead, if given permission, these images could be used in the classroom, in private study, or even included in a media project or publication.

Collaborative digital collections

  • Alabama Mosaic: Thousands images that can be searched by keyword. Images are from historical collections featureubg content from libraries, archives and museums from across Alabama.
  • Alaska Digital Archives: More than 5,000 quality digital images of Alaska's heritage in a searchable online database.
  • Calisphere: A free online collection of more than 150,000 digitized primary materials contributed by libraries, archives, and museums from all over California. Search for content by keyword, by browsing the alphabetized subject list and exploring theme collections, such as the Gold Rush Era and World War II. Lesson plans are also available for elementary and secondary schoolteachers.

Calisphere

  • Library of Congress American History and Culture Collections:  These collections began as a pilot project in 1990 to provide middle school as well as high school teachers and students with digital surrogates of collection material on CD-ROM. Over the years, the collection has become a "National Digital Library" with diverse institutions from all across the United States contributing content. Search or browse alphabetized subject lists, time periods, and geographical locations. American Memory Historical Collections features more than 100 thematic subjects ranging from advertising to maps to women's rights.
  • Library of Congress International Collections: Access content from American Memory Historical Collections as well as international visual resource collections, such as the Abdul Hamid II collection of photographs of the Ottoman Empire and the Prokudin-Gorskii collection of photographs of the Russian Empire. Additionally, through partnerships with national libraries in other countries, you can access collections that highlight the history of the United States in relation to other nations, such as "France in America" and "The Meeting of Frontiers: Siberia, Alaska and the American West."
  • University of Washington Digital Collections: Access to tens of thousands of digital images covering a wide variety of subjects, but with an emphasis on the Pacific Northwest. The digital collections include image-heavy resources, such as the J. Willis Sayre Photographs of actors, vaudeville performers, and movie stills; the Washington Women's History Consortium Fashion Plate Collection; the Dearborn-Massar Photographs of Architecture; and the Seattle Photographs Collection.
  • Photomuse: A research resource for the history of photography. Features online exhibitions, a chronology of the evolution of photography complete with visuals and historical information, as well as an image database.

Photomuse

University digital image collections

  • Duke Digital Collections: Featured collections are freely available on the Internet and include the Emergence of Advertising in America, Ration Coupons on the Home Front (1942-1945), and the 50,000 item William Gedney Photographs and Writings collection.
  • Yale University Library Digital Collections: More than 100,000 digital images are searchable and viewable by the public.
  • Harvard University Library: A Selection of Web-Accessible Collections: A list of visual resource collections that are unique to Harvard University, but reside in different repositories on the Harvard campus. Collections include the Harvard Daguerreotype Collection, the Hedda Morrison Photographs of China, Immigration to the United States (1789-1930), Legal Portraits Online, and the Latin American Pamphlet Digital Collection.

Harvard

Digital image collections at public libraries and archives

  • Historical Photograph Collections at the Arizona State Archives: 33,000 digital images of primary materials from the historical photograph collections. Most of the photographs available through the public online database date to before 1940 and include examples of all types of photographic processes, including tintypes, glass lantern slides, and photographic postcards.
  • Library of Congress Prints and Photographs Online Catalog: Get access to more than 1 million digital images via one of the largest digital image databases in the world. Search for images by keyword, by browsing lists of alphabetized subjects, or by choosing a collection and looking through individual image records.
  • Los Angeles Public Library: More than 60,000 images featuring the work of many notable photographers active in the Los Angeles area over many decades, including some contemporary photographers. Search by keyword or photographer.
  • New York Public Library Digital Gallery: One of the largest open-access image databases available on the Internet featuring more than 600,000 digital images, including all kinds of primary materials, such as manuscripts, maps, photographs, prints, restaurant menus, sheet music covers, and much more.

NY Public Library

Digital image collections at historical societies

  • Indiana Historical Society: An extensive collection, covering topics ranging from architecture to railroads to sporting events.
  • Wisconsin Historical Society: A visual resource for Wisconsin history containing 35,000 photographs. Of special interest is the Wisconsin Historical Museum's Children's Clothing Collection where visitors may browse images of more than 2,000 articles of children's clothing dating back to the 18th century.

Other

Library of Congress

You can learn more about the history of these collections and get details on how to search them from the article here.

]]>Discuss]]>
http://www.readwriteweb.com/archives/digital_image_resources_on_the_deep_web.php http://www.readwriteweb.com/archives/digital_image_resources_on_the_deep_web.php Product Reviews Wed, 14 May 2008 08:33:20 -0800 Sarah Perez
Twine Disappoints After Semantic Web Hype Twine is the most hyped semantic app of the season and recently opened up for some press previews. General availability of this smart, social bookmarking and research tool may come in a matter of weeks.

If that's the case, it will probably be too soon. Twine has some major shortcomings that I think are going to drastically hinder the service's adoption. Perhaps unsurprisingly, those shortcomings come down to usability and performance. Hopefully these problems will be resolved, but it isn't going to be easy.

]]>Sponsor

]]> Richard MacManus said months ago that Twine might be the first mainstream semantic web application to hit the market. Semantic technology seems very likely to be key to the future of the web, but Twine demonstrates just how hard it's going to be for that technology to operate close to the surface of the user interface.

The Basic Idea

Twine looks at content and parses it automatically for the names of people, places, organizations and other subject tags. Users are then able to navigate between related content, view recommended content and connect with recommended people with related interests.

The semantic analysis is faster and smarter than full text search. Making content online machine readable cuts time and thought out of discovery of related information, letting users focus on higher levels of engagement. It's a great idea and I hope Twine can overcome the issues I'm seeing with it.


Problem: It Doesn't Work Very Well

The biggest problem with Twine right now may be that it doesn't work as well as it should. It doesn't consistently grab summary text or tags for pages you save in Twine, it doesn't recognize article authors as relevant people and it often captures summary information about the domain you're on instead of a particular page's content.

Twine founder Nova Spivack saw that I was saving pages that weren't coming in with summary information and commented on one of my items that the page at issue was irregularly formatted. That's why Twine wasn't able to analyze it, he said. That is a major problem; most of the web is made up of ugly, non-standard pages. Fundamental to the value proposition of a top-down semantic analysis tool should be the ability to discover meaning from unstructured data. Many of the other problems Twine faces will be challenging but do seem solvable. This one could be a deal breaker.

Serious researchers will also be frustrated with the lack of support for authenticated (password protected) pages and the absence of RSS feeds -though feeds may come as soon as the app is public.

Problem: It's Poorly Organized

Twine has bitten off a whole lot to chew on. It's an impressive service for the most part. Unfortunately, full-featured social bookmarking is information-dense enough that adding all the semantic features and recommendations from Twine turns information architecture and User Experience into huge challenges.

Twine's user experience is confusing. It's hard to keep track of all the levels and types of information available, site navigation is dizzying and my use of the service happened in spite of the interface.

There are a lot of little things Twine could do to help, like defaulting the saved item path to the same category I saved the previous item in.

There are a variety of different approaches already explored in the social bookmarking market. Del.icio.us is simple and does what it says it does, nothing more nothing less. Ma.gnolia does a little bit more, looks great and is relatively self-explanatory. Furl.net was probably better technology than either Del.icio.us or Ma.gnolia but the user experience makes you want to punch some one and the service has withered accordingly. Twine needs to blow this category out of the water but it doesn't.

I'm sure with some practice I could learn to use Twine more easily, but that's not an ideal first experience. I don't feel compelled to keep trying, other than because of my interest in the semantic technology. There's no visualization, just flat interlinked pages, the only zing to the product today is the recommendation feature.

I would use Twine for recommendation alone, but the value of that feature is minimal until the service finds a large number of users. As it stands, that's not likely to occur. When it comes to collective organization and discovery of content - nothing is as important as network effect.

Twine's in closed beta right now - but it's been in the oven for a long time, has substantial investor backing and is highly anticipated. Despite all that support, it still feels half baked. I hate to say that because everyone says the trouble with the semantic web is that products never come to market - but I don't think Twine is ready. I don't know if it ever will be. Someone else may have to be the first mainstream semantic web app - or maybe no one will be. Semantics may be best suited to the back end. I hope Twine, or someone, can bring something like this to market that I want to use.

You can join the long line of people requesting beta access and make your own decision about Twine probably later this month. For a more positive review, see Rafe Needleman's write up at Webware.

]]>Discuss]]>
http://www.readwriteweb.com/archives/twine_disappoints.php http://www.readwriteweb.com/archives/twine_disappoints.php Product Reviews Tue, 11 Mar 2008 10:41:04 -0800 Marshall Kirkpatrick