This post is a result of an email exchange between Greg Pass from Summize and myself (Alex Iskold). Big thanks to Greg for his original ideas and the technical collaboration.
We spend most of our time online searching for information. This is not
surprising, since the Web is a vast sea of information, where finding exactly what you
are looking for is not easy. But why is it that when we find something on one site it is
still not easy to find it on another? Say you found a Harry Potter book on Barnes and
Noble, why is it still hard to find the same item on other sites like Amazon and Powells?
Why is search a one time deal?
We are used to a Web where each site has its own copy of the information. Each web site is a silo. But that does not need to be the case. If web sites agree on how to represent things like books, music, movies, travel destinations and gadgets, then we would spend a lot less time searching. Imagine that the URL for the Harry Potter Goblet of Fire book is this:
http://www.amazon.com/books/j-k-rowling/harry-potter-and-the-goblet-of-fire
In other words, if there was a standard way to turn things into URLs, then finding information would be a lot easier.

The basic idea behind standard URLs is simple - given a type of object, like a book or a movie or a music album, create a URL schema that can be used by any site. Here are some basic examples to get us started:
First, the objects are divided into categories such as books, music and movies. The category is followed by a major attribute such as author, artist or director. Finally there is the title of the object. So for example, if this scheme worked, we could type in:
http://www.netflix.com/movies/alejandro-inarritu/babel
...to get to that movie on Netflix. Or:
http://www.blockbuster.com/movies/alejandro-inarritu/babel
...to get to the movie on Blockbuster.com.
There are three big benefits to standard URLs:
Extending the idea of standard URLs, we can think of web sites as directories. For example, /books should match all books, while /books/michael-pollan should match all books by Michael Pollan. So in a way, instead of a search, users will be doing a directory listing - which is much more reliable. If this works, the next step would be auto-completion as the user keys in the URL. It would work by having the browser query the list of possible matches from the web site. However doing auto-complete on URLs would be more harder than doing auto complete on Google search today.
Quite a few companies are doing this already. del.icio.us was one of the first companies to start using standard URLs. However, del.icio.us does this only for tags. So a URL like http://del.icio.us/tags/books returns all posts tagged with books. A richer example is the review aggregator called Metacritic, which we covered here. Metacritic developed proprietary representations for objects, similar to the one we discussed above. For example, here is a link to a music album:
http://www.metacritic.com/music/artists/arcadefire/neonbible
Amazon is also trying to do this, but it seems like there are legacy issues that prevent the eCommerce giant from fully implementing standard URLs. The example below shows that there is still the need to have an ASIN (universal identifier for all Amazon products) as part of the URL:
http://www.amazon.com/Songs-About-Jane-Maroon-5/dp/B00006879E
The actual nuances of the protocol are not really that important. To paraphrase Dave Winer, it does not matter what the standard is, as long as there is a standard. This is a really important observation, as a lot of times we argue over the details - forgetting that there is an important bigger goal that we are trying to get to. Greg and I discussed specific, fairly simple flavors of the possible protocol. The main idea is to represent objects like this:
/topic/major-attribute/title/[one or more minor attributes]
Each object needs to be presented so that it is as distinct as possible. The disambiguation is done by adding one or more minor attributes after the title. For example, for a book a minor attribute could be a type - softcover or hardcover. It is important to agree on the sequence of the minor attributes for each topic. For example, for music it could be year, followed by record label followed by genre.
No matter what the specifics are, it is unlikely that a protocol will be able to eliminate ambiguity completely. That is ok, as long as it works most of the time - the benefits will be greater than the glitches. In the worst case scenario, users will see all matching objects instead of exactly one. That is, in the worst case scenario we are back to search - except that it would be much more precise, since it would actually be a directory listing.
Can this actually work? Yes, but it will take a big community effort. Adopting a standard on a web-scale is no easy endeavor, but this one could be worth considering. There is a big incentive for the companies as well - they want users to get to their content as quickly as possible.
Let us know what do you think about this idea.
TrackBack URL for this entry: http://www.readwriteweb.com/cgi-bin/mt/mt-tb.cgi/2319
Comments
Subscribe to comments for this post OR Subscribe to comments for all ReadWriteWeb posts
1) Language. How does this idea scale across the multitude of languages used on the net? A Harry Potter book in German would have a different title.
2) The idea seems to work fine for specific things; books, CDs, movies, computers, etc. But how would it work for ideas and things less tangible? Lets say I want to research "climate change" or "Taoist philosophy." How would the urls work?
3) It seems the idea means webmasters must design their sites to fit the directory schema of the idea. Not sure how this would go over...
Even with all that, I like the idea. Just not sure how it would be implemented.
I like the idea as it reduces dependancy on search engines and increase savings. The group you need to partner with which also have benefits and more reach then any other group online, including google, is domain owners. Domain owners want to increase direct navigation and address bar usage and I think this idea may finally bring the two groups to work together.
To take it further I suggest talk to the ICA, http://www.internetcommerce.org/
Nice article - Zoomin (a New Zealand based mapping service) uses something very similar and I'm sure I read sometime ago about a 'patent', or maybe that it was 'making it a standard'.
Anway, here's where I work and just by reading it (and knowing Wellington, NZ) you probably don't have to click it to know where I am:
http://www.zoomin.co.nz/nz/wellington/wellington+wharf/queens+wharf/3/
@Bob Morris
The idea is to tackle the simple things first. Trying to do it universally is pretty complicated, as one needs to devise complete ontology of everything. Thats difficult / impossible.
Alex
Neat Article!
Why do I get the feeling of Web 1.0 Directory Services, of course in a URL package?
http://www.example.TLD/Parameter1/.../ParameterN
----------------------/Sub-Folder1/.../Sub-FolderN
?
DJ
I Second this motion. Nice summary post and well written. The only argument I can bring up against "Speaking URL's" os the extra DB storage it takes to index those string values as opposed to the normal primary key integer value. Even so, the benefits trump that point entirely - I don't think I need to preach to the choir on this one. Let's start a community site, agree on the obvious standard, work out the details, and facilitate change by providing quality codes samples, technique's and how to's... Sign me up.
The web has became successful because of the chaotic and loose structure, not despite it. The effort required to add content to the web is minimal -- I don't need to be aware of schemas for specific content types (content here means books, movies,etc not html, xml,etc) or agree with others on formatting or any details. Any joe can add to the web with minimal skills, planning and effort.
A former manager of mine used to work at AT&T in the pre-web days and according to her, AT&T folks were convinced that gopher, not the web, would dominate the net because it was so highly structured. Trying to add too much structure to the retrieval aspect of the web seems to be going the gopher route...
Having said this, I'm completely in favor of sites exposing a user-friendly URL scheme. But trying to coordinate many sites to adopt the same URL scheme for content seems very difficult for non trivial examples. I would instead focus on the equally difficult but more fruitful task of getting many sites to agree on a common format for representing similar content (microformats, ontologies, etc). To me, the representation of content is a more important problem than the retrieval.
And what's wrong with search anyway?!?
Isn't that why they created the Dewey Decimal System?
It organizes everything in a way that you could easily find the same book in two completely separate libraries -- however, I can only imagine the hassle librarians went through so a few people here and there will use it.
Would this improve search? Maybe. A URL, like its syntax, has no determination on the listing's position.
In my opinion this would be a lot of work for the web giants to do (and all the littler guys to do) so that a few people could use it.
I mean, seriously. Now that 95% of libraries have their contents indexed in a database searchable by visitors and librarians alike, the Dewey Decimal System is "just another annoyance" to please a few people.
As much as I'd like to simplify the web, this is a bad idea.
I'd rather standardize content organization via meta tags where we can keep adding multiple properties to express information and later retrieve it in a better way.
In the example of the book search, if you don't know any bit of information on the url you would be going to a search engine again. It seems very inflexible, in reality, sometimes you will know the title, other times you'll know the author, yet another times you'd like to search for anything that has "virtual" as part of it's main description, etc. In addition it's not too expandable, if we found out that there is one more relevant property for books everybody would have to go and change the urls, plus it does not allow multiple hierarchic organization of the same data at the same time without being redundant.
It would be great for commerce and web organization in general if the web could agree to a set of content descriptors, even if not the ultimate thing just a reasonable set of properties that everybody could use. Then, we would not have to rely on obscure search algorithms and searches would provide much better results.
Not sure how this would take care of spam, that's a separate picture ... I think most likely only identification of all participants will be the only way to have a web clean of spam (similar to how it is in real life).
That's a nice idea, but I think standards like what you're suggesting will appear naturally. Most well designed sites are using common sense URLs these days.
For my own tag based directory, I used this style:
http://bla.st/photography/
and not
http://bla.st/tags/photography/
like del.icio.us. Perhaps I should have? Does it really matter?
This is a totally absurd idea.
It's so braindead, in fact, that I thought the post was being ironic for the first half. Oh well.
So, what your essentially suggesting is:
1) We need to devise a global ontology.
2) We need to devise a global hierarchy.
3) We need to forcefully partition the global URI space.
How do you tackle localisation? French? German? Italian?
Ontologies and hierarchies are SO dependant on the person who invented them it is absolutely impossible to come up with one that applies to all people everywhere.
I might like to arrange a collection of hats (for example) by size, you might think that colour or style is more important.
The notion or organisation, categorisation is not only unique to the individual but unique to societies. French people might think that the publisher of a book is the most important top level category, Spanish people might think it's the Author and I might personally think it's the subject matter.
As for number 3, you don't stand a cat in hells chance of convincing any one to do this. It also directly goes against the Architecture of the World Wide Web, W3C Recommendation on the use of "reserved names". [1]
[1] http://www.w3.org/TR/webarch/
Standard URLS would create a lot of transparency. Does all companies that are into e-commerce really want this? I mean, there may be some sites that are more expensive than their competitors, but still have a big client base because of other possible reasons. I am not sure if those candidates really want to make it that easy for regular and potential customers to compare prices and switch to the cheapest store just by addressing several shops for the same product in the way you described it. They are fed up enough with price comparision sites. I have my doubts that the implementation of standard URLs would be supported by the majority of e-commerce companies.
Interesting post. A few reactions to this:
First, remember that a high proportion of users use search to locate a site because it's the most convenient way of doing it, not because they don't know the URL. So even if what you've described was implemented, it would still be much easier to Google "amazon harry potter 3" than enter the
URL. That doesn't make the idea have less value, but there's more to the problem than discovery - there's navigation as well
Second, there has to be a way to do this without imposing the same standard on everyone. This is a bit like the point I was trying to make on your post the other day about AdaptiveBlue vs RDF. A given site ought to be able to do this with its own choice of vocabulary and the idea needs to still work if other sites choose different vocabularys. Vocabularys are like XML namespaces - I have to be able to use my own, and know it won't collide with something someone else happens to be defining.
Third - isn't there a semantic web approach to this? Suppose the amazon page referencing that book has metadata in the form of an RDF assertion or microformat (eg an hISBN if there is such a thing), then I ought to be able to find that page by URL, eg
(search metaphor)
amazon/books/search?hISBN.title=harry%20potter
or (resource metaphor)
amazon/books/meta/title/harry%20potter
That is, you have a navigable URI space that can be used to locate the items in the way you describe, but that URI space is not representative of the actual page organisation, but is mapped to the pages by semantic assertions.
(This also allows you to support multiple vocabularys, eg in the sense of
amazon/books/meta/dc:title/harry%20potter
or a more user-friendly equivalent.)
it is a good idea. But a standard URL will make crawling much easier. anyone knows about crawling will build up a spider, feed it with a standard database of book names, and sent it out grabbing all the book selling information on the web, then form a starter page. if standard URL is implemented, i definitately won't go to something like www.amazon.com/books/jkrolling/harry-potter-and-the-order-of-pheonix but go to a webpage like www.allbooks.com/books/jkrolling/harry-potter-and-the-order-of-pheonix
where i can find a mockup page for all information about this book. security and respect for a website's content will be greatly damaged.
as for me ,i am curently working in a project about accumulating car prices all over the www. if every car website is implementing a standard URL, then it is would be much eaiser for me.
Why? Seems like a solution in search of a problem...not to mention that even if this was possible (and I don't think it is) it still doesn't help agents/bots because the urls are not the (main) problem when it comes to information retrieval. The real issue there is semantic and that's why we've got things like microformats slowly taking hold, no?
Interesting feedback, thanks everyone. Seems like most people like the idea, but a few really don't for a few reasons, major one being impossibility of having a standard like this across sites and languages.
Please keep this in mind, that the idea is not to do this for everything, but for simple, everyday things like books, music, movies first. This way problem becomes much simpler. There is no need for global ontology of everything.
In terms handling things like sorting, Greg and I discussed that there is no reason why this scheme can't keep parameters like current URLs do. For example URL?sort=blue is fine.
Alex
Specifically for movies, take a look at http://www.seegest.com where the URLs are quite 'standard', that is to say that if you see one you can guess the others.
mod_rewrite should be able to support standard URLs even for systems with a lot of 'legacy' URLs.
as a developer, I would like to say that this is easy to implement (in new sites) by using Ruby on Rails. also, would be straightforward in Django too.
reading over the story above I think http://linkedwords.com has something to do with a web with less search relying on contextual URLs -- they have almost perfect contextual paths in virtually every topic or category pissible in the life ....
This article sounds like it was written by a librarian who just jumped on the internet for the first time, couldn't find any dewey labels and started freaking out.
If you are using firefox try typing 'amazon goblet of fire' into your address bar without any www.'s or .com's or slashes or anything. Google doesn't even bother to show you a search page, it just goes straight to 'http://www.amazon.com/Harry-Potter-Goblet-Fire-Book/dp/0439139597"
In the social book-cataloguing website LibraryThing, they already have "easy linking", which lets you formulate your URLs either by ISBN or "sloppy title" or author, thus:
* http://www.librarything.com/isbn/0441172717
* http://www.librarything.com/isbn/0380976749
* http://www.librarything.com/title/voyage_of_the_dawn_treader
* http://www.librarything.com/title/the+educated+imagination
* http://www.librarything.com/title/tender violence
* http://www.librarything.com/author/jkrowling
* http://www.librarything.com/author/williamshirer
and in most cases it'll get you to the correct page. Seems similar to what's being proposed.
Good idea to make it a standard but it's already been treated by web 2.0 companies as de-facto standard. REST protocol is followed by most of the web 2.0 companies.
This is what URI's are supposed to be - call it a REST-ish approach to data structuring on the web
it would be easy to create a proxy service to do this. you could mask all the crazy urls out there that use ids or identifiers with no meaning. Most CMS's apps are very guilty of this (look at the URLs of any major newspaper online). As for databases, it just makes sense except for when you dont want your directory to be guessable
I am always punching in URLs directly in this way (eg. mininova.org/search//seeds ;) or wikipedia urls) and for those that dont support it rely on google redir with eg 'imdb babel' or 'amazon harry potter'
neat idea, but unwieldy and unnecessary. Might have been great to have done this in the 90's, but we didn't.
saw a demo of Powerset last night. Natural language search, basic understanding of the text of a page, is far more important, and far more powerful.
And far more useful to the less technical.
You will understand when you see it, its a paradigm shift.
Well argued, but a horrible, horrible idea.
The Web already has a way of declaring a standard way to find stuff; link metadata. So instead of standardizing on "/books", we might standardize on, say, a "book" rel tag, permitting each site to have it's own <link rel="book" href="/libres">.
And FWIW, this is definitely not RESTful as it violates the hypermedia as the engine of application state constraint.
See also; http://esw.w3.org/topic/UriSpaceSquatting
@23
Mark,
1) People do not know what link meta is, this is the point.
2) the hidden rel stuff is great, but again people do not see it and getting it requires spidering, which is costly. Why should each site build a spider just to figure out how to link to other sites?
@22
Ryan,
I can't wait to see it, and I do hope that it will be a true breakthrough. Right now, I am still skeptical.
Alex
People in this thread are saying -- "we need the Semantic web instead of this". But the semantic web isn't going to work as intended... it will just be a top-down directory system imposed on us as yet another syntax (not a true semantics).
My obersvation is that while people dream of the semantic Web, people like amazon and imdb, etc. are already creating humble working directories. Granted, amazon's links aren't that readable. But they are the de facto standard for referencing individual books now, and aside from readability they already have all the properties we want. You could say the same thing about IMDB, a de facto standard mapping of URL's to movies.
I am working on a stealth mode "Web 2.0" company, and we are working on readable directory URLs in yet another problem domain. We don't seriously expect many people to craft them, but we still think it's an important part of our strategy to become the de facto standard in our sector.
Alex, this type of link metadata is pervasive. Most web pages include some. Authors may or may not know what it's called, but they certainly know how to use it.
Anyhow, there are several reasons why link metadata works far better than standardized URL structure, and most are issues of coordination and independent evolution. Consider what a standardized "/books" would mean to sites which already have "/books" but don't conform? In addition, consider what it would mean to spiders like Googlebot; they'd have to update their software each time a new standard is developed, otherwise they wouldn't be able to get from "/books" to "/books/author/". The architecture of the Web requires that these transitions be made explicit via links, rather than implicit via agreement, so that spiders can exist. There's a bunch of other reasons. See the UriSpaceSquatting link I provided for more information on some of those. Or read the REST dissertation for the nitty gritty of the value of this approach to enabling independent evolution between untrusted parties.
Oh and most non-tech web users rarely type in URLs. I watch people struggle with just domain names, let alone paths (just like they dont type paths into their local file browsers either)
Sahar Sarid, Domain Guru sums it up perfectly on his blog:
"In terms of technology there’s nothing new here. All that is needed is to put a proposal out for web sites to agree on, or if you want to get mass reach quick, get the big search engines to adopt such a proposal and give benefits to those who structure accordinally. On a Second thought, I’m sure search engines will want nothing to do with it as it reduces their value offering. More direct navigation, any sort of it, is less power to search engines."
http://www.conceptualist.com/?p=256
This, unfortunately is a really bad idea, even for books and movies.
Most books have a single author, but a significant number don't. Are you going to categorize on movies based on director, producer, or distributor. Even in those cases it's not clean. Let's say you know one of the creator's but not the other and it's labeled by that creator first.
Ontologies, web or otherwise, don't work. They never have. They're kludgy solutions to information problems. For one they require that everything have a direct parent, which is not the case.
This is not the first or even the second or third time this idea of standardized URLs has come up. This is an idea that gets revived every couple of years and these days, it would be so costly that it has no hope of getting implemented.
So ultimately it's not possible to do, even for simple stuff like movies and books. Additionally, the predictability that you're hoping would make your life easier cannot exist. Right now spelling mistakes don't actually hurt you that much, but in a system where you're trying to have predictability, spelling mistakes kill you. You simply miss them because they're not where you thought they were. And for even more issues, rather than discovery, you're supplying a list of metadata and then telling the thing to crawl, but that's an insane amount of work that must be done constantly. Metadata should be exposed, not created. You should never find yourself making lists of things that represent other things; those lists should be made for you by grabbing the metadata from the objects themselves. It's the only technique we've found so far that scales.
This is an interesting idea. But as several others have mentioned, this idea is hard to be realized. In fact, "standardization" is a dangerous term, especially if we apply this term on the web. At the majority of the time, web standards are "adapted" gradually rather than by reinforcement. Due to the complexity of semantics, it is almost impossible to start a standard at the global level from the beginning. In contrast, it would be better if we can implement some strategy to encourage local standardization within small communities. Our web needs to be "deeper weaved" to enable really facilitated search.
Other than standardizing URLs, would a search web be practical in reality?
-- Yihong
lol this would never ever work
nice idear for a world take over >:)
It would simply allow web site owners and webmasters to specify search words and .... anti-thesaurus paper, asking, "What web site would want less traffic?
http://www.testi32.com/di/michael-jackson/index.php