Cross posted from AIIM …
Inspired by a recent project. Explained to my (almost) 12yr old. Guess who “got it” quicker.
Posted on February 17, 2012 by christianpwalker
Category: General, Metadata, Search
Tags: query, screw, search, steve
Try telling that to windows drive users with no restrictions! I wish!
Yes, good explanation.
However, am I being curmudgeonly if I point out that that the actual storage of the nail is still in ‘folders’ – aisles – and is near things of similar ‘content’ – type of hardware. And that the retrieval engine – person – has been trained in how to use metadata, has a map in their head and (usually) a lot of experience retrieving. And that the nail has had a lot of metadata attached to it before it got to the shelf? My curmudgeonly self wants information creators and retrievers to know that retrieval does not happen by magic but only after a long process of identifying useful terms and tagging ‘nails’ with those terms. And that they need to make good decisions about applying metadata when they create their stuff of ‘nails’. 🙂
I wouldn’t say you’re being curmudgeonly.
You’re correct that there would be a lot of training required on how to tag content and how to search for content. I don’t actually agree that you’d still have folders since if content is properly tagged it can be chucked into a big bucket and retrieved at will.
I’m not actually advocating getting rid of folders altogether, since I think that they do actually have their place. However, they are not a substitute for good information architecture and tagging. Think about how you would create and expose a folder structure for 10’s of thousands of insurance claim files. If you’re one of the people needing to go through content across multiple claims, it’s not practical to try and traverse an hierarchy that’s 6+ levels deep and has 30K or 40K nodes at the lowest level. Think about an ediscovery or FOI request; would you rather do a metadata search of browse folders?
OK, for a single screw that really might work.
But what if I requested an (electronic) employee file? Wouldn’t folders not come in very handy? Without them, we needed strictly enforced, controlled vocabularies. So, we’d exchange the folder (an aggregation) structure with another constraining construct.
Best to Canada!
I think an employee file is the wrong example in this case; check out the Shirky piece that Peter responded with. Employee files and the like are probably great examples of where a folder paradigm is valid. There’s only one entity at the centre (the employee), the content scope is known, the vocabulary is simple, etc. I think the biggest issue then becomes how unwieldy the folder structure / hierarchy is.
Where folders are horrible, is in true knowledge driven endeavours. Half the time when I’m looking for stuff I don’t know exactly what it is I’m looking for; I just kind of know it is sort of like something. If I had to rely on folders I would be screwed.
When we’re referring to a domain that is well understood, finite, defined (taxonomy / vocabulary) and/or heavily regulated, I think folders are fine. They’re even better when we can leverage the folder construct to automate the application of metadata to the content that resides within.
I’d love to get your thoughts about the Shirky piece.
Well, I read the Shirky peace meanwhile, and admittely, it offers quite some interesting aspects.
On the other hand, finding the weak points is also easy. One could start with its strong reference to del.ico.us (sorry for any misspelling :-)). As we know, that service was almost closed down by Yahoo during one of their numerous cost-cutting efforts. So history proved between then (2005) and now that the concept of social tagging wasn’t that successful as the author suggested or at least expected to happen. In the end, could it be that Mr. and Mrs. User have simply… been too lazy to tag?
Or that he re-iterates the IS-A discussion. Mamma mia, we had that already with object orientation back in the 90ies. He could have read the answer in the landmark 1994 Design Patterns book by the “Gang of Four” (still ranked #2574 on Amazon in 2012): http://www.amazon.com/Design-Patterns-Elements-Reusable-Object-Oriented/dp/0201633612/ref=sr_1_1?ie=UTF8&qid=1330171224&sr=8-1). It defines the IS-A relationship as static, “design-time” decision of a system, whereas dynamic relationships between two entities are considered “runtime”, late decisions. Transposed to the content world, IS-A is likely a pre-determined classification (by SMEs and/or aggregated ratings), whereas individual tagging and search results could be “runtime”, late decisions.
But boiling down critics to these facts would be fairly cheap. Much more, I think the author is pointing straight to where – IMHO – ECM really sucks (he did this likely without his own intention or notice). Shirky is stringent like an army officer with legitimate critics on ontology, thesaurus, and pre-defined classification schemes in general; but same time juggles around like a clown with terms such URIs and URLs, calls them “addresses”. He does not provide any further definition of these terms in the article. Well, it happens that URLs (and more specifically URIs) in fact ARE addresses and thus, …subject to hierarchical classification schemes themselves!
Doubts? So let’s look at your home address. If you live in a modestly civilized area, than the postal service classification of your home might consist of: street name, number within that street, postal code (internationally enriched with country code), name of town. Now, any of these parameters could change eventually. European readers perhaps remember when Germany, back in 1993, had to expand their postal code from 4 to 5 digits due to its re-unification and – to keep pace with modern logistics demands. It became an IT effort only comparable with #Y2K. Case somebody had lived, say, at Kennedy Allee 45, D-6000 Frankfurt, their new address after 93 was at Kennedy Allee 45, D-60596 Frankfurt. As one of the side effects of the new code scheme, the town name had become unimportant.
I’ll convert this now to URIs.
Conclusion: any address represents actually one single path within a classification scheme. And therefore it is (as well explained by Shirky) subject to change.
I have studied the nature of URIs and HTTP for half a decade now, for being member of the RESTful community. While it is general practice to say that URIs shall be opaque from a consumer’s perspective (i.e. consumers need not to care how URIs are structured), it becomes very relevant to the creator on how to structure such system. For example, postal services could change the addressing scheme radically and base it on a global coordinate system. You still would get your snail mail; however, postal services motiviation for such step would likely be that they introduced a super-trouper satellite navigation-based dipatching system throughout its logistics chain, based on a world-wide coordinate classification scheme.
In other words, Shirky is subscribing a medecine that is based on the illness he wants to cure: pre-classification. Somebody already had to classify the thing (resource) that others subsequently embellish with tags.
I like to be practical: look at this very blog entry: by the time of writing, its URL is: christianpwalker.wordpress.com/2012/02/17/search-metadata-and-bye-bye-folders/. So, friendly Chris provides us with a lot of context already in the address. He thinks that the blog entry is well classified/addressed (and it is!) on the wordpress host (so it must be blog), provides us with his name (sub-domain), the publishing date (dates tend to be extremely stable classification elements), and the title of the publication. Maybe one day he moves on to another provider, but I’m fairly sure that “/2012/02/17/search-metadata-and-bye-bye-folders” will make it as URI path element (IF the future provider hopefully supports this scheme), only the hostname will need to change (unfortunately).
Of course, dear Chris could have been very unfriendly, by saying http://www.somewhere.com/a/id=456789. See, it makes a difference. But that’s exactly what most ECM systems offer. They’re so selfish… and offer us their (private) primary key as part of the URL (actually even worse, as query parameter). When such content gets moved to a new repository, it gets a new URI with a new ID included, because the next system will likely be selfish, too. And here it is, our address change.
There has been a lot of talk on silos recently, but I rarely came accross a recipe with an old and extremely proven medecine: the address.
And I think here comes the tragedy of the ECM industry: there’s absolutely no notion of a long term path to information chunks. 17 years ago (!), the W3C coined the term “resource” (the ‘R’ in URL, URI, etc.) , because “document” became too restrictive. An URI can essentially point to anywhere and anything (information resources, or physical things like homes, or… to nowhere). Well designed, it can become a long term address to information, providing a lot of useful context to its consumers (incl. search engines). I’m simply depressed how the ECM industry ignores the concept of good addresses, i.e. neutral, system-independent addresses, to information chunks. Obviously, it’s not in vendors interest.
But back to Shirky’s article. As life proves over and over, rarely are things black or white. There’s always some gray. The other day, I tried to find a book which I held in my hands some 10 years ago; I wanted to recommend it to somebody. Too bad, just remembered terms like “information architecture” and “users are lazy”. So I entered these terms plus “book” into Google. Well, I don’t need to tell you what crappy hitlist I got back. One reason is of course that Google does not know what I actually mean by “book” (a common problem with late-only-classification). This massively improved at Amazon, as I can tell their search engine that I am looking for a printed book in English: they did a pre-classification for me. And they can do that without risk: perhaps Harry Potter can mutate a book into a dish washer. Effectively, I choose one of their pre-defined search class first, then searched and browsed within a few dozens of hits.
I think that’s key for any future information design: finding a system’s right blend of pre-classification, search and (social) tagging. And Boy, base it on a meaningful and long-term addressing scheme. Also search engines will appreciate that. And of course humans… who wants to find your home address within a numeric coordinate system? I don’t. Kennedy Allee 45, Frankfurt, Deutschland, sounds much better to me.
There would be many more things to say about addresses and URIs, for instance, that YOU should own yours.
Where can I officially crowd-tag your article?
Tags: #infogov #e20 #information_architecture #REST #HTTP #URI #URL #IRI #search #eDiscovery #bigdata #tags #shirky #ECM #classification #IA_needs_standards
Didn’t find the book on Amazon either. They didn’t understand “lazy users”. Plus I’m afraid, the title is out of print anyway.
Twitter’s @kevinc2003 just gave up on #wcxm tag. He returned to #wem, for Web Experience Management, which conflicts with the West Edmonton Mall.
You moved a resource (e.g. a blog page) to some other location. Which is the HTTP return code to let your clients know the correct address to redirect the request to?
Don’t make me trot out ye olde “Ontology is Overrated” link, yet again…
Oh ok, here it is: http://www.shirky.com/writings/ontology_overrated.html
Well, since you had the decency to trot it out, I guess I’ll have the decency to read it. 🙂
How indecent of you! 😉
Fill in your details below or click an icon to log in:
You are commenting using your WordPress.com account.
( Log Out /
You are commenting using your Google account.
( Log Out /
You are commenting using your Twitter account.
( Log Out /
You are commenting using your Facebook account.
( Log Out /
Connecting to %s
Notify me of new comments via email.
Notify me of new posts via email.
This site uses Akismet to reduce spam. Learn how your comment data is processed.
PHIGs IMC Inc.
Blog at WordPress.com.