Friday, October 11, 2013
It Liiives, It Liiives Maaaster!
Well like a Frankenstein rising to haunt the village I'm going to attempt to bring this blog back to life. This is inspired by two recent events: first my job has folded up and gone away, and second there's been some new interesting discussion on the still living XML Dev list. So, this might turn into an attempt to get myself some kind of attention job-wise, or maybe a way to monetize page views (that would be something!), or maybe, most likely, just more ramblings about metadata, modern data management and the web.
The new discussion on XML Dev has centered on modelling. It was sparked by a query as to whether XML Schema can be used to model data in general. As someone who has always seen a need for a higher level view of data than anything XML or DDL or even a logical ER diagrams could ever provide this struck me as about of giant step in the wrong direction as one could wish to make. What surprised me a little was that I found myself sketching out a view of the world where metadata and metadata about metadata could be used to dynamically generate models of any flavor and more. I was even bold enough to propose that this could now be done at "Enterprise" scale, something I would not have argued for as recently as two years ago. The change is that the graph databases such as Neo4J, Titan and others are now mature enough to risk the business on. So now you have a way to store your metadata in something akin to it's natural habitat. Even better, you can visualize the data, and you can scale the whole thing out to span terabytes and petabytes should the need arise. This is game changing. Ten years ago I designed a system driven purely by metadata. Given the technology of the time and the resources available this got built out on top of a relational DB. The result worked but it was hard work to make it scale to even the terabyte range. I believe that I could now go back and rebuild the guts of that system with a quarter of the resources and end up with a result that scaled pretty much as far as one cared to drive it.
What would one do with such a system you ask? (Well, that's probably a polite way to put it...) You'd use it to describe the data inside your Enterprise down to the deployment level and to build up the relationships of the actual data. The actual data can now reside in Casandra, or hBase, or Berkeley DB, or Redis, or Raik or even your traditional relational database as you might desire. Under the covers indexing into the graph to find primary entities might be Lucerne or Elastic Search but from there graph traversal directly discovers your data with minimal overhead and join optimization becomes a thing of the past. This is the nimble startup architecture all grown up and ready to take over an existing Enterprise. Of course that won't happen overnight so, at some point, you may use the metadata to create XSD, DDL, WSDL, or other pieces of the alphabet soup standards that your existing systems currently use to talk to each other. That too can be driven by the metadata as needed.
Ultimately, what you want is a system where concerns about what physical deployment and IO characteristics are needed for a given piece of data are no longer made by humans. A marketing manager sketches out a proposal for a new campaign, the description of that campaign get's fleshed out, analysts identify the pieces of important data. Automagically, deep within the cloud / data center, a couple of terabytes shift from slow cheap drives to SSD, new indexes get built, some form of web descriptors / metadata for RESTful services gets built and handed off to existing front end generation, screens get generated, artistic assets get merged and the campaign is live. Conception to completion might be measured in hours or days, but certainly not weeks.
I'm know there are a dozen missing pieces for this to become reality for everyone, but I'm pretty sure some companies are working on making it happen. If your's isn't one of them you may end up lost in the dust...
Subscribe to:
Posts (Atom)