Wired has an awesome top story today on the world of startups utilizing scraped data from big companies to offer new layers of value for their own users. It's a roughly objective piece that I highly recommend reading but it was also inspiration for me to finally record a screencast on the subject (see below).
I love RSS, probably more than anything on the web. If you're not familiar with the concept, see my very old definition of RSS and my almost-as-old post on teaching people about RSS.
Not every page on the web publishes an RSS feed, though. Thus the need for these wonderful screen scraping tools. I've written about a variety of tools you can use to create a feed for a site or page that doesn't have one. Sometimes, though, you've got to pull out the big guns. In those cases, it's time for Dapper.
Dapper is a company founded in Israel, now venture backed and was named in the aforementioned Wired article. It is the sweetness.
Dapper will let you pull data from almost any web page and get it in a wide variety of outputs, including RSS, email, iCal, a Google Gadget, CSV and Google Maps. Is that incredible or what?
Let's let the video do that talking. I have an awful cold (it's almost better, Mom!) so please excuse the very rough voice. I made the following screencast using JingProject, setting up an RSS feed of search results in Del.icio.us for articles tagged from ReadWriteWeb.
Clicking on the image below will open up another window so you can view the 4 minute video full screen.

If you're as excited about Dapper as I am, you should check out DapperCamp, a two day free conference all about Dapper coming up in early February in San Francisco. IBM and Mindtouch are sponsoring the event and Mitch Kapor is keynoting it. It looks like it's going to be a lot of fun.
Take that, Wired Mag ambivalence! Really, though, you should read that Wired article - it's a good one that discusses some issues that are going to be very big once more people figure out how exciting data portability is.
Comments
Subscribe to comments for this post OR Subscribe to comments for all ReadWriteWeb posts
Just watched the screencast. That was way cool.
Very good screencast. Great enthusiasm.
Dapper is perhaps the best scraping tool available today - Extremely user-friendly, especially for us non-programmer types - registration to first Dapp in less than 5 minutes. I also appreciate their customer service - certainly stands apart when it comes to an early stage company.
Well deserved coverage!
This is a great post. my first exposure to scraping. looking good.
Great screencast, Marshall. Explains it all very nicely.
I tried it and didn't like it. I still have it as a Firefox add-on (DapperFox), but it's disabled.
I can't see what all the excitement is about. I have other ways for doing all (or, at least most of) the things that Dapper can do.
Biggest downside: Seemed to seriously impact Firefox performance.
Hi All,
Same as you, I love RSS. To help non-geek people, I created http://milliondollarchannel.com, which intends to provide alternative list of feeds than technorati etc.
Does it makes sense? I plan add more feeds into it. Which feeds do you think are worth to list?
Thanks!
This is cool, thanks for the screencast, now I know where to go. :)
Nhick
http://www.itrush.com
Marshall
Great screencast. It's kind of sad that not more people are aware of the power of services like Dapper, which IMO, can really really increase their productivity.
I have been using http://www.feedyes.com for scraping websites a long time. Creates a valid rss feed. Works for every website, with every browser, and is reliable. Only thing: feeds are updated once every hour, and not realtime.
Funny you mention Dapper in relation to DataPortability Marshall - I had recently posted a post to the GraphSync project (sponsored by the DataPortability workgroup) about using Dapper and others for the cause.
Here it is here:
http://groups.google.com/group/graphsync/browse_thread/thread/81c69e0aa786a3df
Marshall
A very interesting tool and a very user-friendly screencast. Many thanks.
Best wishes for 2008!
Charles
P.S. I also find Jing to be a very handy free screencast tool too.
I have used dapper to get some good rss feeds during my experimenting time with http://semantisize.com
Dapper seems to work great so far.
Orchestr8's AlchemyPoint is another solution that offers advanced screen-scraping capability. A screencast is available here. Another note: AlchemyPoint can be utilized "in-the-cloud" or installed locally for behind-the-firewall screen scraping.
Dapper is a great tool... at first, it can be a bit confusing, but quickly becomes easy to understand. I love it... Great post, very good information on Dapper.
The ListPic example from Wired serves as a great example ... I think we can draw something concerning "fair use" from this. After all, we aren't talking about someone sharing links with friends and the occasional reader:
I've been beavering away at the distinction between "marketing" and, say, sales or coding or Q&A or other more obviously production-related aspects of rolling something out.
IP arises from effort, ok? So buddy created a web-scraping site ... and a good one ... well-coded enough to deliver the goods with that many hits, and (I can only guess) well designed enough to attract a following. That deserves something.
But he was poaching ... picking up his stock from someone else's loading dock, re-packaging and then selling.
Metaphorically, he had an atractive store in a good location ... but the stuff folk were selecting from his shelves, where did it come from?
I've been in "stealth mode" for a good long while *Toooo long!* for a single simple reason: I can't find a way of sharing cream and cheese without some marketing-type taking not only my cow but more ... and I don't intend on giving away the farm.
dont forget about open kaPow. Its an amazing product and openKapow is free.
http://kapowtech.com/products_data_collection.html?google&keyword=dataextraction&gclid=CI7s0tHU2JACFRE_OAodjQgbZQ
Thanks, Marshall, great information here and I can't wait to play around with Dapper further. I've shared this tool with others on TechSoup: http://tinyurl.com/282jo6
Great tutorial. Do you know if Dapper supports the ability to create an RSS feed with enclosures - e.g. photos, MP3's, etc.?
Last week I applied for a credit card. The website was cool, absolutely what I was looking for! Visit if you need good service. They have credit cards for bad credit score. Please tell me if you like it
http://cardits-immidetly.cn/discover-interest-credit-lower.html >discover credit card lower interest rate
R56ma87de