Metadata, or information about information, is potentially boundless, yet can be harnessed to enable more effective business decisions.
by Loraine Lawson, IT Business Edge
The Business Edge's Loraine Lawson spoke with Evan Levy, a partner at Baseline Consulting and an instructor at The Data Warehousing Institute, about why metadata matters and the business problems it can create.
Lawson: Could you explain what metadata is?
Levy: The whole premise of metadata is, give me the information and context of the data that Im looking at or want to use.
So you might get a five-digit ZIP code, might get a street address, but ultimately what you want to understand is tell me about this data. Where did it come from? If its a field in a database or field in the computer system, where is it located in that computer system? When was it created? So if I want to talk to someone about this information, I can describe what it is, because its not really tangible.
I go to the grocery store and I ask for red delicious apples. When I hold it in my hand, whats metadata about a red delicious apple? Its type, its variety, how much it weighs, where did it come from, when was it picked off a tree - the same thing with data. Metadata is the information about the data.
Lawson: Now when you're designing Web pages and you put in metadata, you put it in the code. Is that how it works for a database too?
Levy: Actually, its funny. There are several different ways that metadata is stored in technology. You just mentioned I could put metadata or comments in the actual Web page, but I think you have to consider one thing. And Im being a little esoteric, so pardon me, metadata is the content or the information about the contents you're talking about.
Now, where its stored is usually the challenge, because there arent standards for how that information is always stored. It varies depending on the type of technology you're talking about. In a database, theres something called a dictionary, but realistically that information isnt always filled in.
Thats actually part of the premise of master data management. Metadata is almost boundless. If you consider the concept of a Social Security number, one would say, well, what about the rules of who is allowed to look at a Social Security number? Someone might say, Hey, thats metadata also. The premise of master data management is being able to decouple all those rules and details about data from where people typically store it, which is in a database or in an application and in fact a mechanism of coupling that information with the data itself. Does that make sense?
Lawson: Yes. And do you want it coupled with the data or do you want it decoupled?
Levy: Usually youll hear with master data management the whole premise of decoupling - that part is where applications are coupled to the data.
In fact, you would like metadata to be coupled or attached to the data itself. Lets go back to our apples example. You go buy a jar of applesauce. You want to know what the brand is. You want to know how much it weighs. And you want that right on the jar. You dont want to go look someplace else. I mean, how annoying is it that you cant look up the price of the item on the item?
The biggest challenge with metadata is that theres not a good way to attach that information about the content to the content. Thats part of the rationale behind XML, by the way. Extensible markup language not only gives you the value, but it gives you all the details about the value. If you rip open a Web page and you see the HTML, if this is what you're referring to which is, Okay, heres the value five and I have tags about what the font color is and all the other stuff that describes that five. You can also add other tags - where did it come from, who is responsible for it, security details and so forth.
The challenge is metadata tends to vary based upon the way its delivered.
Lawson: And does that create business problems? Or just technology issues?
Levy: Enormous. You know, the real issue is exacerbated by technology, its not created by technology.
Because data is not tangible - you cant touch it, its not physical - its kind of hard to attach information to it. But the fact is, you see those problems all the time when someone fills out a form or prints a report: Where did this piece of paper come from? So, what happens, people as a business standard say, You always have to put the date on the bottom left, the page number on the bottom right, who is responsible for it on there too. So you have all of these business conventions. And people sometimes follow them, sometimes they dont, but they cant always be enforced.
Lawson: So what kind of business problems does it create? Can you give some examples?
Levy: Sure, sure. Theres a zillion of them. The first example is, so you want to attach a board to a tree outside. I need a screw. Okay, go get a drywall screw. But one screw is not like any other screw. You see a brass screw or a steel screw that won't rust. A drywall screw will. So you put the board on the tree and six months later, it rusts and the board falls off. Why? Because the information about what you were using wasnt available.
From a more practical perspective, when it comes to data and metadata as opposed to metadata about objects, Verizon launched a marketing campaign where they advertised they would sell long distance to New Jersey. What they didnt realize was those names werent approved to be sold to it didnt have the metadata. It's really more of a business rule, but all of those customers that opted into marketing, what they didnt know was the regulatory approval wasnt there. That's not data -- that's actually data about data.
There are other instances, and you see this more often than not, where someone comes to report and someone says, "Well, where did this come from?" It's fairly common in companies where people run reports from two different places, but there's no way of knowing that it was from two different places.
My favorite is when people want a cash report and an accrual report, but because it wasnt labeled correctly, they don't know that they're both accurate, but they show two different numbers.