ReadWriteWeb

Info Architecture

Fun with XSLT - my draft thematic taxonomy

By Richard MacManus / January 27, 2004 9:53 PM / Comments

Over the past few days I've been doing some work on a new XSLT-based topic navigation for my weblog. I started it over xmas, but had parked it since the new year because of a couple of bugs. My goal was to swap my Radio Userland-hosted OMPL-to-HTML transform (see Weblog Archive - by Topic in my menu) with a custom XML-to-HTML transform hosted on my own server. The reason I want to use XML over OPML is that it's more flexible - I can potentially do lots of clever things with the XML data in the future, using XPath and the like, whereas OPML would be limiting in that respect. Also I want to host it on my own server to enhance download speed. So I picked up the XML topic nav work again this week and I pretty quickly solved the issues that were bugging me at xmas. It's funny how parking troublesome code for a couple of weeks can clear the mind and make the fog disappear!

My ideal is to do the XSL transformation on the server-side, rather than the client (browser) side. The reason for this is that due to the proliferation of different browsers on the Web, it'll be a nightmare to second-guess how all of them will process the XSL transformation. Whereas with a server-side transformation, I know how my server will handle the task. Basically it comes down to this: it's one less thing for the user's browser to do when reading my site. Why get the client to process the XSL transform if I can do it on my own turf (my server)?

But having just said all that, I currently don't have the correct server configuration to do the XSLT processing. I'm used to working with IIS at work, so I was able to come up with a nifty ASP solution to transform my XML to HTML. But my weblog runs on Apache, so ASP can't be used (there may be a plug-in somewhere to get around this, but I wouldn't bet on it being an easy implementation). Far better that I do it in a language Apache understands, and the obvious one is PHP. So I've investigated using PHP to do the transformation and this will probably be my long-term solution. However it requires me to install two things on my server, which I've yet to do - Sablotron and Expat. These things will enable me to do XSLT transformations on my Apache server using PHP. There are other options too: Java/Cocoon and Perl/AxKit, to name a couple I found while searching. However I know very little about those options. If there are any XSLT experts out there who can advise me on the best method to transform XSLT server-side, I'd appreciate it.

My short-term solution is to do the XSL transformations on the client-side, using Javascript. And yes I know I just talked myself out of doing this a couple of paragraphs up. But I really want to see how my XML transforms look now and a Javascript is the quickest way. Besides it doesn't hurt to experiment with both client-side and server-side, to see for myself the differences.

Here is a test page I've done: it's an HTML page with Javascript (c/o W3Schools) that uses an XSL file to transform a selected section of my XML file into HTML. A caveat: it currently only works with Internet Explorer. I haven't been able to track down a a cross-platform version that will work in Mozilla and Firebird etc. Note that there is also a way to transform straight XML-to-XSL-to-HTML in modern browsers (eg IE6) without using Javascript. However to do that I'd need multiple xml files (or else do some tricky things to bundle multiple XSL files into 1 XSL file). As the purpose of my topic nav is to have a single XML file to update, and bearing in mind this is a short-term solution, I decided to use Javascript to do the job.

Hey, what are you trying to achieve with all this XSLT processing?

That's a good question; allow me to explain. I recently converted my taxonomy to a flatter hierarchy, with a maximum of 3 levels. In line with this, I also decided I only want to categorise each weblog post into one category. This may seem to go against the grain of the latest in weblog taxonomy trends (see Jon Udell's Dynamic Categories post), but there is a method to my madness. I hope. 

I was browsing through an introductory book on Wittgenstein, as you do, and I read that his major philosophical work called the Tractatus is ordered using a decimal numbering system. He lays out his arguments like so:

1 -> 2 -> 3 (first level)
1.1 -> 1.2 -> 1.3 (subordinate to first level)
1.11 -> 1.12 -> 1.13 (subordinate to second level)

My understanding, based on my limited reading of Wittgenstein, is that he structured the Tractatus using seven main theses. For each theses, he drilled down and analysed it using the above numbering system. Not to sound pompous or anything, but this is similar to what I do with my weblog. I have a dozen or so topics that I regularly write on and it's tempting to think of these as theses. They're probably more like themes than theses, but hey it's just one letter difference :-) My recurring themes are things like Universal Canvas and Microcontent.

To make a long story short, I discovered Wittgenstein's numbering system wasn't suitable for my weblog taxonomy. However I did end up with a manifesto of 12 themes that generally revolve around the subject of the Two-Way Web (it's not restrictive though). I've categorised each of my posts into one of those 12 themes. There is one further level below that, so that I can bundle things together if need be - e.g. my collection of posts about Nanowrimo 2003 is categorised as Top > Writing > Nanowrimo 2003.

To bring this post full circle, currently I'm using Radio Userland's OPML-to-HTML service to produce the above taxonomy. As I mentioned, I've got a draft XML-to-HTML version going that uses client-side XSLT. Here is my list of 12 main categories (a plain html file for now) and here is a list of all my weblog posts categorised using this taxonomy (this one uses XSL).

In future I will add extra bits of data to the XML file (e.g. dates, maybe even the content of the posts). This is another advantage of using XML over OPML. I'll also eventually introduce some dynamic categorisation, a la Udell. All of this XML exploration may be leading me inexorably towards a tool like Syncato, which stores all its content in XML.

In summary I think my thematic taxonomy will help me keep my weblog writing on topic. And from a readers perspective, you will be able to explore any one of '12 paths to Two-Way Web enlightenment' ;-)

Do we really need Web Design and Taxonomy?

By Richard MacManus / December 17, 2003 10:32 PM / Comments

Two recent memes from the blogosphere seem to me to be ripe for mixing:

Meme 1) The current trend for tech blog re-designs to have a minimalist, lotsa-white-space look that places emphasis on the content. Dave Winer probably started this trend with his re-design, but I've seen it elsewhere before him (e.g. Peter Lindberg and Erik Benson). And now Robert Scoble and Marc Canter. Mark Pilgrim's site is also bathed in white space nowadays. Hey I guess my site is pretty minimalist too. Maybe it's just a tech blog thing?

Meme 2) Jason Kottke's comment today: "Nothing takes the fun and personality out of writing like metadata." Jason points out that blogs have lots of extra design bits in them to help people organise and link together information, but it distracts from the main content.

What's the connection? Maybe it's that some of us bloggers are trying to push extraneous pieces of visual design out of our weblogs. And what's this a trend towards? The usurpation of websites by RSS perhaps. I'm beginning to sound like Steve Gillmor or Dave Winer. What I mean is: does web design, in the visual / graphical sense of the phrase, really matter anymore? Does ontology / taxonomy of a physical website mean much nowadays? If the majority of people read a weblog via an RSS Aggregator - and that's not the case yet, but it's heading that way - then does Web Design or Taxonomy matter a hill of beans? Why bother putting in all these design flourishes and metadata if our readers don't see/use it?

I'll give you an example. I use k-collector to categorise each weblog post I write into topics. But those topics can't be seen via RSS Aggregators (at least not in the one I use - let me know if you do see them). Another example: trackbacks aren't visible in the RSS Aggregator. A link to comments is available in my Aggregator, but there is no context - ie I don't know how many people have commented on a post, I have to click the link to open it up in my web browser.

And some people still don't provide the whole of their text in their RSS feeds. Movable Type people are the biggest offenders (if that's the right word), but only because it is the default behavior to include only excerpts in their RSS feeds. Doing this may be The Last Bastion of Web Design, because it's forcing us readers to go out of our RSS Aggregators and visit their websites. It's noticeable that most of the people who I've categorised as "Designers" in my Bloglines RSS Aggregator exhibit this "click to see" behavior. Can't blame them, they've got nice pretty sites and they want people to view them.

Mind you I've noticed in my own referrer logs that about half of my visitors (to my actual site) get here via a search engine. So that alone is probably a good case for me to continue to provide a nice design and a helpful taxonomy. Plus of course you want to make a good impression generally speaking with your web presense. It's like you don't want people to see your house when it's messy and has things strewn all around the lounge. You want to vacuum the place and have your furniture arranged in an orderly fashion before visitors call. So design and taxonomy has its place, even in our increasingly RSS-ified world.

But RSS (and/or Atom) is the Future. How long before we can represent our content's taxonomy/ontology in our RSS feeds? I mentioned this in a previous post and Dave Winer commented: "I plan to make my aggregator work with categories." That's definitely a good start. What are other aggregator developers planning to do in this regard?

And how long before we can cram all those bits of metadata that Jason mentions into our RSS feeds? That wasn't Jason's point of course, he was saying all that metadata necessarily de-emphasizes the main content. I agree with that sentiment, but I have to admit also that I'm addicted to those little bits of metadata. I like reading comments, clicking on the trackbacks, seeing the referrers, etc. It all adds to the community aspect of weblogs. And if we can cram all that community into our RSS or Atom feeds, then all the better.

Update on Weblog Ontologies

By Richard MacManus / December 14, 2003 1:44 PM

Couple of bits of feedback from last night's post on weblog ontologies. Bill Seitz points out that his Wikilog does in fact have a hierarchical view, the user has to enable it though (via their user settings when they visit Bill's site). For example the post of his I used as an example yesterday has this hierarchy:

Front Page > Personal Network Architecture > Group Ware > Collaboration Ware > Wiki For Collaboration Ware > Summarizing Is Necessary

I don't think it is a hierarchy of categories. Bill explains it as "the chain of generations of pages whose creation led to the current page."

Secondly Bill notes that a key question to building an ontology is asking yourself: what's the point? This is something I was pondering last night when I went to sleep (dreams being one way I think through technical things...sad as that makes me sound!). I was also thinking about why we put so much effort into organising our weblog sites, when the majority of our readers read our content via an RSS Aggregator - which doesn't care about the content structure. How long before some bright spark creates an RSS Aggregator that does take into account each publisher's content ontology? Or maybe the question should be asked the other way round: how long before site developers figure out how to create an RSS feed(s) that represents its home site's ontology?

Andrew also makes a good point: "The ontologies are nice, but they shouldnĂ­t require oodles of work to set up, maintain, and categorize things into."

Amen to that. This is the drawback to using XTM topic maps - it's going to require a lot of work to set it up. Same could be said of RDF. Hmm, thinking more...

Weblog Ontologies, Part 1

By Richard MacManus / December 13, 2003 9:22 PM / Comments

I've been jotting down re-design ideas in my trusty paper notebook. On the Web there is an unwritten maxim: learn (steal?) from the best. So I decided to review some of the weblog ontologies/taxonomies on the Web that I admire. My method of review is informal and non-judgmental. I try to illustrate my findings with a test drive of each site. In no particular order...

1. FTrain - Paul Ford: Trust me to start with the most complex ;-) Paul Ford's site is graphically striking and he's one of the few bloggers to have implemented a Semantic Web-like structure. Not to mention his writing is mind-blowing. But to the design. I don't claim to fully understand it yet, but basically it seems every piece of content is connected to other content according to various types of relationships. Here's how Paul describes it:

"Ftrain is a hierarchy. Any given page has one or more of parent, children, and sibling pages, and every page lives somewhere in the hierarchy."

Further down that page, he states:

"Ftrain is this complicated because it has over 1000 separate nodes, all of them connected to one another in some way, with something like 700,000 words between them, and all extensible."

I decided to start at a recent Ftrain article, A Response to Clay Shirky's "The Semantic Web, Syllogism, and Worldview", and browse from there. If you scroll down to the end of the article, you'll find some navigation and external links. First there is a "Links Related To" table, which has 1 external link. Below that there is a statement of the hierarchy:

"This is A Response to Clay Shirky's "The Semantic Web, Syllogism, and Worldview", a technical essay by Paul Ford, published Monday, November 10, 2003. It is part of Theory, which is part of Ftrain.com."

By this I understood I am at the third level down from the homepage. Breadcrumb-like: Home > Theory > A Response to Clay etc. But the "technical essay" bit threw me. I clicked on that and discovered this article had been cross-posted to another category: Home > Taxanomy > Things > Ways of Communicating > Forms of Expression > Essays > Technical Essays.

So I back-button back to the Clay essay. The next thing after the hierarchy statement is a list of 6 links under the heading 'Related'. 3 of these links are also attached to 'Technical Essays'. The other 3 aren't immediately obvious relations.

Below this is a group of links called "Navigate by Hierarchy", which is two other links from the category 'Theory'. Finally there is a "Navigate by Time" option.

So all up, a pretty complex navigational structure and who knows how it is done under the hood. The Ftrain Sitekit provides some clues - we find out the site is built using XML technologies and XSLT.

2. Erik Benson: Erik bases his site around the concept of nodes, which are grouped into categories, which are placed in a section. So it is a hierarchy, like Paul Ford's site. Hmm, already I'm sensing a pattern in the ontologies I admire - they're hierarchies. To be honest, I hadn't clicked that Ftrain and Erik Benson's sites were hierarchical (grouped by categories) until now.

I took Erik's most recent article, The future has already happened, as my starting point to check out the ontology. Under the title, the following breadcrumb displays:

"home > thing > idea"

I clicked on "idea" and it took me to that category page, which displayed an alphabetical list of all the other nodes in the "idea" category. There's also a chronological navigation, under Weblog Archive. The article above is listed as:

"home > weblog > general > archive for Dec 2003"

Another feature is the "Related Nodes" functionality. I'm not quite sure how this works yet, I'll have to come back to it.

3. Dave Winer's Scripting News. As most people are aware, Dave has recently switched to a category-based design. Right at the top of his homepage is the following breadcrumb:

"Top > Dave's World > Weblog Archive > 2003 > December > 12"

This is chronological, but Dave is also categorising each of his weblog entries. To find the category listing, you have to go to the search drop-down box and click "All Cats". Then you will see a long list of all Dave's categories. For my research purposes I clicked on "Politics / Money" and got a page which displayed all the posts in that category.

4. Bill Seitz's Wikilog. Bill's site is a cross between a Wiki and a weblog. When I first saw it a few months ago I was blown away by it. As I've been following it I've gotten to like the way all content on one topic is grouped together on a single topic page. So no matter if two entries on the same topic were written a year apart, both entries end up on the same page. This has huge benefits in terms of linking and relating ideas together. Bill calls it his "thinking space".

Usually with Bill's site, I track his RSS feed of headings in my RSS Aggregator (Bloglines). When I see a heading that looks interesting, I click on it. For example, a recent one was SummarizingIsNecessary. If you scroll to the bottom of this page, you'll see a list of "Backlinks". According to Bill, backlinks are:

"...a list of all pages referring to the current page. This is useful for finding "related" information. (This is the Two Way Links feature available in pre-World Wide Web Hyper Text environments, that people like Ted Nelson have been complaining about since the Web came about.)"

Summary of Part 1:

I'm sure I haven't done justice to the sites I've analysed so far. They are all pretty complex and very well-developed. But there are some patterns emerging for my purposes: they all in some way use the concept of "nodes", 3 of the 4 use hierarchical categories, all but Dave's have a "related links" feature.

This is just a start. There are other sites whose ontology/taxonomy I admire. Andrew Chen, Mark Pilgrim, Phil Pearson - to name just a few. But I'll be here all night if I write about them now. Maybe tomorrow. For now, I'll think more about the "category vs topics" dilemma that I'm stuck on currently. I'm very keen on having a topic-based navigation, which has the benefit of a bottom-up "flat" structure of content - and I was thinking of using XTM topic mapping to achieve "related link" functionality. However given that hierarchical categories are being used to great affect by 3 of the 4 people listed above, maybe I'll change tack. Hmm.

Nanowrimo Day 15 - plus some thoughts on categories and topics

By Richard MacManus / November 15, 2003 9:16 PM / Comments

27,563 words. Here's the latest (ch. 34 onwards). I'm hoping to reach the 30,000 mark by end of tomorrow. That will give me a nice round figure to aim for of 10,000 words per week for the final two weeks.

I'm enjoying having two storylines intertwining now. On the one hand, Declan Atomz is now beginning to understand the alien world. It'll be interesting to see how far this character goes (grows?). The other more recently introduced storyline is about Dave Darwin and his social software company called "Social Kinetics". I think I've come up with a new type of social software (well, as it deals with multimedia it wouldn't surprise me if Marc Canter has already done it). Here's a description from my novel - but to get the full picture, you'll have to read the rest of it ;-)

Today was the day Dave would announce Social-Kineticís new social software product to the world. The product was simply named after the company - ìSocial Kineticsî - and it was made up of three main ingredients: firstly it was an online community space on the Web, which was like other online communities. That is, it was a website with a URL (web address) and, in order to use the website, people were required to sign-up and register an account. The second main ingredient was the personality assessment and physical body mapping. Once a person had registered to become a member of the ìSocial Kineticsî community, that person would fill in a questionnaire to establish the basic parameters of their personality. This was a very superficial personality type, similar to Myers-Briggs. Social Kinetics had 50 initial ìtypesî (eventually the number of types would number in the hundreds). The person would also have a ìmappingî done of their face and body, so as to approximate their physical appearance in the avatar. The software and management team had decided on the following policy: people must use mappings of their own person as a base for their avatars. This decision was driven by Daveís philosophy that people should not be able to ìhideî behind an avatar. The principle of ìWhat you see is what you getî should apply, so that people learned to trust one another and be honest with their interactions in the community. Dave felt that if people could select a graphical online persona like in previous examples of virtual worlds - a wizard, or a dog, or a green two-headed alien ñ then that would only encourage other falsities. The purpose of Social Kinetics was to encourage people to extend themselves via their avatars - meet new people, and make new connections. Dave felt that a big determining factor in the success of his software was that the avatar should approximate its human owner as much as possible ñ not just personality but physical likeness. The third ingredient was the avatar software itself. Once a personality was assigned and physical characteristics mapped, the customer would be given an avatar. At that point the avatar would join the community and at the same time begin to build up its own identity, by collecting and aggregating data about its owner.

P.S. I miss my blog! There are some real interesting things happening in the blog world right now. Particularly Dave Winer's new category-based blog design. His use of categories is not quite the same as topics that e-vectors and Phil Pearson are doing. But it's pretty damn close. I suggested in a couple of comments that it'd be great to add a "topic" tag to the RSS2.0 spec. It would be a sub-element of the "item" tag, just like the "category" tag (that Dave is using). I agree with Paulo and Marc that a "category" is different to "topic", but in practice they are complimentary. Dave's use of categories seems to give more scope for heirarchic organisation, while using topics gives a flatter and (IMHO) more flexible organisation. It'll be interesting to see how it evolves. Anyway, all this is inspiring my next idea for a project - a re-design of my own weblog! There goes December ;-)

P.P.S. if you're wondering, here's a description of Nanowrimo.

P.P.P.S. GO THE ALL BLACKS!!!

Is this the beginning of the Age of Topic-focused Blogs?

By Richard MacManus / October 15, 2003 9:42 PM / Comments

I read with interest Matt Haughey's essay Blogging for Dollars, where he relates his experiences running Google's Adsense adverts on his TiVo-focused weblog, PVRblog. Matt is making a pretty penny running the Google ads on his TiVo blog and one of the main reasons why is that it is focused on a single topic. He advises:

"In order to have any remote chance of success gaining an interested audience and getting good on-topic ads showing up, pick a narrow topic you are passionate about and run with it."

I'm not that concerned with the controversy surrounding the terms and conditions of Google Adsense, as that has been covered by many others. What interests me is blogging on a narrowly-defined topic, which if it's one that attracts plenty of e-commerce action (like TiVo) could even make a buck.

But it's not a case of "there's gold in them thar blogs", like in the Dot Com days. The Google ads won't make you a paper millionaire and they won't lead to an IPO, but if you're fortunate you may get a comfy chair like Matt did. So just to get that straight, I'm not talking about anything that starts with "e-".

One person that has been pushing the envelope with topic-focused blogging is Elwyn Jenkins. In his case, it is also commercial as he runs a business based on his blog Microdoc News. The topic of his blog is nano-publishing:

"Nano Publishing is a tiny web-based operation that publishes online as its primary focus and usually runs a weblog as a core part of its primary income earning activity."

So Elwyn Jenkins is making money by blogging about how to make money by blogging. Niiice! But he succeeds because his content is (usually) interesting and informative. He does however employ some tricks of the trade in order to get people to read his blog, and hence make his money. These could be construed as cynical.

For example I am subscribed to Microdoc News in my RSS Aggregator, but when I see a new article that I want to read I have to click on a link that opens up Microdoc News in my web browser. Most blogs I subscribe to let me read the entire article in my RSS Aggregator, which is how I prefer it. I should add that some other blogs do it too - in fact Movable Type calls it a feature, making you click for the whole story. And there is another reason - web designers like Zeldman and Asterisk actually want people to click out of their RSS Aggregators to view their beautifully designed websites. Which is fine, because I enjoy viewing well-designed websites. But I believe Elwyn Jenkins makes people click out of their RSS Aggregators because it makes people visit his website, which earns him advertising cash.

There are other tricks too. Microdoc News is styled as an "online magazine", but it's just run by one bloke. It's amazing how many times I see people on the Web refer to Microdoc News as a "they" instead of a "he". Microdoc promotes itself as an authority on blogging and it runs a blog portal, under a different domain name. Microdoc is also almost obsessively focused on Google, which allows him to feed off of Google's high profile and success. It reminds me those birds that travel around on the back of large animals like rhinos, on the African plains. Oxpeckers, I think they're called. Google would be the rhino and Microdoc the oxpecker.

Now I don't mean any of this in a rude way, I greatly admire Microdoc News and have had it in my blogroll ever since my own weblog started. Microdoc News is one of the sites that inspires me and its content is always worth a peek. But it's also a good example of how weblog advertising is subtlely affecting the reader's experience, because of all the little tricks that Microdoc employs to entice people to visit it and treat it as an authority on its main topic (nano-publishing).

Finally, today I noticed that Tom Coates has started a new topic-focused blog called Everything in Moderation (the topic is managing online communities and user-generated content). There's no advertising on the site, so I'm not suggesting that there is a monetary incentive to this. But even if there was, I'd still applaud Tom's new site because it's focused on a specific topic and hence it will attract an audience of people who are interested in that topic. And if his readers are as passionate about the topic as Tom is, then it's likely they'll become regular readers. Perhaps even contributers.

In conclusion, I view topic-focused blogs as a further extension of the Two-Way Web rather than just another attempt to milk money out of the Web. Because these blogs are attached to a single topic, it means the content will nearly always be relevant to its audience. This in turn encourages interaction with the reader, maybe even the reader writing back to the blog. Yes I am beginning to like the idea of topic-focused blogs, even discounting the fringe benefit that they may make money. Hmmm, I'll have to think of what narrow topic I can focus on in order to start a new blog. I could use some pocket money too!

Topic of the Pops

By Richard MacManus / September 29, 2003 10:45 PM

CSS and XHTML are still dominating my mind's attention.xml file. As you can see in my menu, they're numbers 1 and 2 in my Weekly Topic Top 10. btw the Topic Top 10 is going to be a weekly record (pardon the pun) of the most popular topics on my mind. I've actually created some XML files to store each week's top 10, so I can track what topics are occupying my mind over time. I'll see if I can implement this into my Radio blog, so the menu automatically extracts the data from the XML files. In fact this feature could be extended across the blogosphere too (but by a better programmer than me!).

Wouldn't it be fun to have a Rick Dees-like weekly countdown of the Top 40 topics in the blogosphere. Popdex has something similar - a popularity index of weblog posts and stories. Technorati comes very close by sorting posts from different weblogs into topics (nice work again Dave Sifry!). But ideally I want something attuned to my interests.

To make the Weekly Top 40 relevant for different groups of people, you'd need to categorize topics...like they do with the Billboard charts - there's a pop chart, an R&B chart, a country chart, etc. Likewise in the blogosphere you could have a Tech Blogs chart, a Political Blogs chart, a Personal Journal chart, etc.

In the meantime, you can tune into my Weekly Top 10 topics chart. CSS is number 1 this week, but there are many topics vying for my attention currently. What will be number 1 next week?

Simplicity and extensibility

By Richard MacManus / July 21, 2003 9:06 PM

Tim O'Reilly writes in Dan Gillmor's comments: "Simplicity and extensibility should not be orthogonal. And any technology that sets them up as opposed, instead of complements, has clearly done something wrong."

Note: orthogonal means "independent or well separated".

Tim O'Reilly is talking about RSS2.0 (simple) and RSS1.0 (extensible). Lately I've been thinking and reading about weblog topics. There seems to be the same issues of simplicity vs extensibility in this space too, although nowhere near as much mud-flinging.

XTM stands for XML Topic Maps. For a general introduction, check out the Cover Pages:

"A topic map is a kind of index or information overlay which can be constructed separate from a set of resources, identifying instances of subjects and relationships within the set of resources."

The key things to note are that topic maps are separate from the actual content and they are used to organise content into topics or categories. Although XTM was created only in 2001, topic mapping dates back to 1993 and has its roots in SGML. Right there is a giveaway that this spec is a complex beast. SGML is like the queen ant of XML (to borrow Scoble's ant metaphor) and it has given birth to many XML ants

The XTM spec is a bulky insect, weighing in at 100 pages long. But being heavy gives it the advantage of extensibility. Using XTM, you can define not only topics but also associations, occurances, characteristics, hierarchies, mergers - the list goes on.

XTM even has a fancy term for creating a topic: reification. The spec defines this as:

"The act of creating a topic. When anything is reified it becomes the subject of the topic thus created; to reify something is therefore to create a topic of which that thing is the subject."

Riiiiiight. Now I understand why they used Shakespeare as an example topic in the spec :-) But it also illustrates that XTM has a lot of scope and you can define topics for Africa.

Compare this to the ENT specification. ENT stands for Easy News Topics and it was built as an add-on to RSS2.0. The authors, Matt Mower and Paulo Valdemarin, make a point of emphasizing the simplicity of ENT:

"ENT is intended to be a very simple standard for describing how topic information can be introduced into an RSS2.0 news feed."

ENT is a lightweight ant, weighing in at only 8 pages. It has only two main concepts: the "topic" and the "cloud", which is like a map of topics. ENT necessarily doesn't have the same extensibility, or breadth of functionality, that XTM has. But, here's the kicker. ENT can reference XTM. ENT topics can be linked to an XTM topic map (as well as RDF), via a URI within a cloud. Whoa, lotta acronyms in that last sentence. But the point is, using ENT along with XTM means you get both simplicity and extensibility.

And all this can be done in RSS2.0, and no doubt in RSS1.0 and Atom too. Tim O'Reilly is right, simplicity and extensibility don't have to be orthogonal. You can have your cake and eat it too. That is, as long as the ants don't eat it first ;-)

More on weblog topics

By Richard MacManus / July 7, 2003 10:16 PM

Couple of interesting comments to my last post. Harvey Kirkpatrick from itopik wrote:

"I would argue that all the efforts are complementary and can be automated by some and humanified by others. We are choosing to humanify a bit the process hoping to be a bit more intelligent in our organization as Yahoo was in the beginning. Seeing linkages that perhaps software might miss. Granted slower, but in the end a lower signal to noise ratio we think."

It's a good point - we humans have a marvelous ability to 'think outside the square' and see linkages where computers can't (right now). But I think computers are at their most useful when they automate tasks, which frees up the human brain for more creative things.

I am day-dreaming though. In practical terms itopik is successfully 'doing the business' promoting weblog topics, as are k-collector and Topic Exchange. The developers are out there taking risks and building stuff. All I can do is applaud and support those efforts in my weblog. As Harvey says:

"It is my hope that we can build a village of good efforts and be mutually supportive."

The other interesting comment on my previous post was from Prometheus 6, who linked to me out of blogging courtesy (or charity?). But I'm glad he did because it made me realise I need to clarify one thing. When I said that "Topics can and should be 'exactly the size of one idea'", I meant to make it clear that each weblog post can be associated with more than one topic. Prometheus 6 said it well:

"Really good, really informative writing can draw of diverse conceptual roots, and the "topic" can be "these multiple things correlate in this fashion," but it might not be...Good writing just kind of flows."

Speaking of "flow", I read Rogers Cadenhead's post the other night about the Russian positivity guru Mihaly Csikszentmihalyi. He wrote a book called 'Flow'. Rogers quoted this gem, which I'll end my post with ('cause it's so darn positive):

"Problems are solved only when we devote a great deal of attention to them and in a creative way."

Organizing weblogs by topic

By Richard MacManus / July 4, 2003 11:03 PM / Comments

My post in response to Clay Shirky's article on Corante generated some interesting discussion. The time is ripe to discuss weblog topics, thanks to innovative new tools such as k-collector, Phillip Pearson's Topic Exchange, and itopik. I want to address a few points about organizing weblog posts by topic.

1) I still believe authorship is important. I have favourite bloggers who I will read no matter what topic they write on. They are authoritative voices and I trust them to inform and/or entertain me. But I also believe the blogosphere should allow for the emergence of new and alternative voices. One way to achieve this is to have a system that organizes information via topics. Otherwise A List bloggers will continue to dominate the blogosphere, like A List actors dominate Hollywood. Do the rest of us really want to be waiting tables the rest of our lives, looking for our big break in the blogosphere? Hmmm maybe the majority of us are better suited to acting in local plays, than on the big screen ;-) But either way, organising weblog posts by topics potentially gives more people a chance to be read in the blogosphere. And the more bloggers that are 'in the mix', the better the chance of finding new and unique ideas.

2) Topic generation should be automated. I've seen a few comments along the lines of: "Oh I wouldn't know what topic to choose, and anyway who's to say my definition of a certain topic will match other peoples definition?" This is a fair point and as I've been using k-collector, I've often wondered if I'm choosing the correct topics for my posts. There have also been instances of duplication or overlap of topics - e.g. there have been two topics about the new Matrix movie on the same k-collector cloud.

The answer (easier said than done) is to automate creation and management of topics, so us humans don't have to worry our pretty little heads about it. k-collector and Topic Exchange are on the right track, as they already automate some functions. For example when you need to choose a topic for your post using k-collector, the software automatically presents you with a list of potential topics to select from. Matt Mower has previously suggested there may be ways to fully automate topic assignation, which in a past entry I likened to an automated Yahoo!.

I'd like to imagine also that topics can someday be managed in a decentralized way, like the World Wide Web itself. Currently k-collector and Topic Exchange both maintain topics on a central server. Perhaps there is a peer-to-peer way to manage topics?

And my final point for now:

3) Topics are different than categories. I use Radio Userland as my weblog authoring tool and I have the option of dividing my posts into 'categories'. However I choose not to, because categories are too broad and they aren't flexible. Anil Dash posted an entry today about posts being the "atomic element" of weblogs:

"When I first wrote up the idea that had been percolating in my mind for the microcontent client, the one element that kept popping up was "meme-sized chunks [are] the natural idiom of the Internet". A post is that memetic chunk, exactly the size of one idea. Not coincidentally, a lot of emails are that size, as are a lot of instant messaging conversations."

Topics can and should be "exactly the size of one idea", whereas categories usually encompass a number of similar ideas. For example if I have a category called ".NET", then I may use it to file links to information about ASP.NET, my thoughts on how .NET can be used to build a Universal Canvas, how Microsoft is using .NET as the base for their next Operating System, etc. Many topics, but just one category.

---

Here are some trackbacks from my original post...manually tracked mind you. Bring on the Radio trackback Matt :-)

http://www.corante.com/many/20030701.shtml

http://blogdex.media.mit.edu/track.asp?id=6045980

http://chutry.wordherders.net

http://home.earthlink.net/~prometheus_6/

http://blogs.it/0100198/2003/07/02.html

http://www.rolandtanglao.com

http://www.downes.ca/news/OLDaily.htm

http://randgaenge.net/

http://infodesign.no/jonblogg.htm

RWW SPONSORS


ReadWriteWeb on Facebook
ReadWriteCloud - Sponsored by VMware and Intel



TEXT LINK ADS



RWW PARTNERS