Handling legacy data

I have a new client these days:  a small division of a major publisher.  This division represents several different “imprints” that the company has developed or acquired over the years.  Currently, those many imprints have their Web presence scattered over a handful of hosting providers using different content management and e-commerce systems.  The idea is to consolidate them so that there is one hosting provider and a single CMS/e-commerce solution.  The primary driver here is the employees rather than upper management.  Being a large company, and a publishing company in particular, there is some resistance to change so we have to move carefully.

In a discussion with the client today, one of their corporate IT managers (a big champion of this effort) discouraged us from trying to use a Web service to interface our new systems directly with the existing back end database that handles the manual order processing and accounting systems.  Those are old mainframe databases, with the typical legacy restrictions like limited field widths, upper-case characters, and inconsistent field usages that have evolved over the years.  The reason?  To paraphrase his observation:  “Connecting a Web service to our back end database is the fastest way to spread bad data throughout the organization.”  In this case, “bad” doesn’t necessarily imply “wrong.”  Just poorly formatted and inconsistent.

The answer is to create a middle tier that collects data from multiple legacy sources, uses heuristics and a healthy bit of human intervention to determine which of many possible conflicting pieces of information is authoritative, and then store that data in the format that downstream systems can use.  You can then rewrite or eliminate some of the legacy systems over time without affecting the downstream applications.  On the surface this might look like an expensive way to go, but it’s much less painful and more reliable than trying to reconcile conflicting data at the application end, or changing the legacy system’s data storage rules to accommodate new systems.