Classifying Along the Other Axis – To Talk of Many Things

Classifying Along the Other Axis.

Although I still think the blogging community should work together and create an optional controlled vocabulary for tagging, the real problem goes well beyond blogging and comes back to indexing the Web: We're classifying along the wrong axis. When used with some skill, good search engines like Google can tells us what a Web item is about. What's much harder to determine is what I call the “literary form: Is the Web item a FAQ? An online store? A blog? A tutorial? A forum? A photo gallery? This might be considered the “shape” of the item being sought. One has to be very clever to filter on the literary form, simply because there's nothing in the document that unambiguously says what the item is, as opposed to what it is about.

The most pressing need is to exclude online stores. I sometimes try to find good technical info on a technology gadget, only to find that I have to filter out 15,000 or more online stores that tell me nothing except that they carry the item in question.

Metadata frameworks exist could can handle this. The Dublin Core Metadata Initiative (DCMI) is the oldest and probably the best known, but it's little used outside of university circles, probably due to its complexity. In any event, it doesn't really have a spot for what I'm talking about. DCMI suggests a controlled vocabulary for “Type“, but what they mean by “type” is type of data (still image, sound, text, etc.) rather than the intended purpose of some collection of data of various types. A blog, for example, can include text, still images, sound, and video. Each piece of the blog may be stored in a separate file and each file tagged with its DCMI Type, but nothing tells me that it's a blog.

I doubt we'd need more than twenty terms in a controlled vocabulary for “literary form.” (We really need a distinct technical term for this; I'd like to use “genre” but that's considered a synonym for “type” in the DCMI definitions.) . . .

Even with a controlled vocabulary agreed upon and in the can, there remains the problem of how to apply the category tags to a useful number of Web items. Nobody said it would be easy. I'm throwing all this out just to keep the subject in play. I have a couple of ideas of how to do this (drawing upon my ancient plan for world domination called Aardmarks) but I'm getting tired of the subject and want to spend some time on other things. [Jeff Duntemann's ContraPositive Diary]

Leave a comment