Irregular Observations

Friday, October 11, 2013

It Liiives, It Liiives Maaaster!

Well like a Frankenstein rising to haunt the village I'm going to attempt to bring this blog back to life. This is inspired by two recent events: first my job has folded up and gone away, and second there's been some new interesting discussion on the still living XML Dev list. So, this might turn into an attempt to get myself some kind of attention job-wise, or maybe a way to monetize page views (that would be something!), or maybe, most likely, just more ramblings about metadata, modern data management and the web. The new discussion on XML Dev has centered on modelling. It was sparked by a query as to whether XML Schema can be used to model data in general. As someone who has always seen a need for a higher level view of data than anything XML or DDL or even a logical ER diagrams could ever provide this struck me as about of giant step in the wrong direction as one could wish to make. What surprised me a little was that I found myself sketching out a view of the world where metadata and metadata about metadata could be used to dynamically generate models of any flavor and more. I was even bold enough to propose that this could now be done at "Enterprise" scale, something I would not have argued for as recently as two years ago. The change is that the graph databases such as Neo4J, Titan and others are now mature enough to risk the business on. So now you have a way to store your metadata in something akin to it's natural habitat. Even better, you can visualize the data, and you can scale the whole thing out to span terabytes and petabytes should the need arise. This is game changing. Ten years ago I designed a system driven purely by metadata. Given the technology of the time and the resources available this got built out on top of a relational DB. The result worked but it was hard work to make it scale to even the terabyte range. I believe that I could now go back and rebuild the guts of that system with a quarter of the resources and end up with a result that scaled pretty much as far as one cared to drive it. What would one do with such a system you ask? (Well, that's probably a polite way to put it...) You'd use it to describe the data inside your Enterprise down to the deployment level and to build up the relationships of the actual data. The actual data can now reside in Casandra, or hBase, or Berkeley DB, or Redis, or Raik or even your traditional relational database as you might desire. Under the covers indexing into the graph to find primary entities might be Lucerne or Elastic Search but from there graph traversal directly discovers your data with minimal overhead and join optimization becomes a thing of the past. This is the nimble startup architecture all grown up and ready to take over an existing Enterprise. Of course that won't happen overnight so, at some point, you may use the metadata to create XSD, DDL, WSDL, or other pieces of the alphabet soup standards that your existing systems currently use to talk to each other. That too can be driven by the metadata as needed. Ultimately, what you want is a system where concerns about what physical deployment and IO characteristics are needed for a given piece of data are no longer made by humans. A marketing manager sketches out a proposal for a new campaign, the description of that campaign get's fleshed out, analysts identify the pieces of important data. Automagically, deep within the cloud / data center, a couple of terabytes shift from slow cheap drives to SSD, new indexes get built, some form of web descriptors / metadata for RESTful services gets built and handed off to existing front end generation, screens get generated, artistic assets get merged and the campaign is live. Conception to completion might be measured in hours or days, but certainly not weeks. I'm know there are a dozen missing pieces for this to become reality for everyone, but I'm pretty sure some companies are working on making it happen. If your's isn't one of them you may end up lost in the dust...

Friday, August 01, 2008

All right then; Opaque URI's it is

At first it would appear maybe not:

http://knol.google.com/k/jon-awbrey/semeiotic/3fkwvf69kridz/4#

However, the reduced form:

http://knol.google.com/k/-/-/3fkwvf69kridz/4#

is also being supported.

This of course makes sense if your intention is to become a global authority; automated translation anyone?

Now that issue has been put to rest (pun intended, the implications should be obvious) the next bogeyman is how to model the resultant mess. One could argue that this is a beast all onto itself, but if you ask me approaching this from the intersection of database modeling, code modeling, and semantic structuring (the web schema world) will prove the most useful in the long run. I have my own biases but that's a topic for another day. In the mean time one can only wonder if this marks the point in history that will eventually be looked on as the dawn of the first machine intelligence? Bwahahahaha......

Thursday, November 17, 2005

Would a:rose by any other URN smell as sweet?

Here I continue with the questions raised in the previous post. My answer to this question is basically summed up in this xml-dev post Reading between the lines, the answer to the subject line is “maybe”. If it’s a different URN it’s a different rose; they may not smell the same at all.

Naming things distinctly is crucial to the REST camps vision of the world. If you can’t name something you can’t identify it and retrieve it. However, this means we need a lot of URN’s. Not only do we need a lot of URN’s we need to have some way of understanding the URN’s. An identifier might be opaque (what does id=1234567687 mean?), but once you start naming things you really want to know the semantics of the names. If you don’t know the semantics you need some way of discovering them. If I’m looking for roses I need to know that Rosa and Rose may mean the same thing.

Alternatively, you don’t name everything. You serve up bigger blobs that people can do discovery inside of. When looking for roses you’ve got to know to look for florists first. Either way, discovery is part of the future of the semantic web. That seems to mean that pure REST isn’t possible in the broadest sense, but I wonder if anyone ever thought it would solve all the worlds problems?

More thoughts on discovery and naming from xml-dev.

Wednesday, April 20, 2005

Separated at birth

So what do you think, is this just coincidence? The implications of it being one and the same person are "interesting". Maybe there's truth to the idea that Microsoft is "evil"? Even worse, is there perhaps an "evil" penguin trying to penetrate the Linux world?

For the most part humans aren't going to have trouble deciding these are really two different people. However, computers don't have it so easy. Part of the solution may lie in ontologies. Ontologies tell us how everything is related to each other. Once you know something is related to something you have a clue if two things are the same thing: are they related to the same things? If not, they can't be the same thing. If they are related to the same things, well we still may not know for sure. Conversely, we can't build ontologies if we don't agree on identities. Ontologies are work, people have to build them and agree to them. Identifying things for ontologies seems even harder.

Don't look for WWW URIs for help on this issue. In particular, don't ever assume that you really even know what lies at the end of a URI. It's a resource, but there really isn't a concrete definition of a "resource". That's mostly by design, the WWW doesn't think URIs in themselves hand out guarantees. I can buy into this, but if we can't know that two resources are the same then building ontologies becomes even more complex.

So is there a solution on the horizon? Not that I can see (pun intended...)

Friday, November 05, 2004

This is my brother Bubba, and this is my other brother Bubba

One lazy summer morning several years after I moved to Memphis I was out mowing the lawn when a beat up red station wagon sputtered to a stop in front of my house. A young woman wearing blue jean shorts and a tube top emerged followed by a very large fellow wearing overalls with no shirt underneath. A moment later another even larger fellow exited the back seat. (I can't recall what he was wearing.) As I wandered over to the car the young lady dispatched the first fellow with the statement "Bubba, go check that we're really out of gas". I volunteered that I had gas for the lawn mower across they yard. This time the lady fixed her eye on the other fellow and told him "Bubba, go help him carry the gas over here". When I questioned whether both fellows where in fact named Bubba I was told that neither was really named Bubba but both her brothers where called Bubba. I don't know about you, but that that certainly cleared up any confusion I might have had...

Having done data modeling for many years I can't help but wonder whether the family tree in this particular case was in fact acyclic. I suspect neither Bubba would have known what I meant by such a query, but both of them would have been able to tell me how they "was related to each other".

The reason for my rambling on about this here is to point at an ongoing discussion about relationships known as xml-dev. This mailing list is obstensibly about XML related issues. Turns out you can't really get very far in talking about XML before you get to talking about how things are related to each other. To me, it is this underlying theme to xml-dev that is the most important reason to read the list. Ultimately, what we do with information systems is to attempt to relate pieces of data together so that we can get some better understanding of the data. Relational databases, Resource Description Framework (RDF), Ontologies, the semantic web, triples, tuples. All different ways of getting at the same issue.

As posts turn up on xml-dev that help to illuminate this central issue I will on occasion point to them here:

The first of these is on the issue of dynamically generated schema. The conclusion I arrived at as I worked through this discussion is that currently there is no good way to model all the different ways that things are related to each other.

The second thread is even longer and more dispersed. It started as a discussion on what standards needed attention after XQuery but much of the discussion ended up centered on the theme of how do we relate things to each other. My conclusion this time is that the ultimate goal of the semantic web is admirable, but a long way from fruition.

Tuesday, August 31, 2004

Evidence of brain tampering by space aliens

Friday I had LASIK. I really hadn’t had all that much time to contemplate the implications of the process before I did it. We have two children 19 months and 5 years and between home and work time for reflection is rare. Saturday we attended a dinner party and after a couple of conversations where I described the procedure I got to thinking about the whole process. Some guy, who I’ve previously met for all of 2 minutes took tiny little diamond coated knives and cut open my eyeballs and then proceeded to burn off parts of the insides of them with a high powered laser. Not only did I volunteer for this process, but I paid good money for it. Perhaps even stranger, I’m happy with the results.