Project Management from the trenches: Data Configuration Management

Over a period of decades IT management has developed a mature discipline around Software Configuration Management (SCM). The growing challenge in IT is that data has assumed an increasingly common role of controlling and affecting software logic, yet often sidesteps the rigorous change management controls that are in place for software.

What is Software Configuration Management?
Simply put, this is process of tracking and controlling changes in software. Configuration management practices include revision control and the establishment of baselines. From a management perspective, this allows for control over what changes are made, what are the differences between software versions, and how to roll them back. A robust set of tools for managing changes, versioning software components, comparing source code, and automating builds exists.

The power of data
Users have always been able to change data via applications. However there are certain categories of data I would argue need to be managed with the controls already devised for software:

Business Logic
Configuring systems to take a different courses of action based on data settings. What was once hard-coded can be exposed to end user configuration.
Rules based systems
These are systems are explicitly designed to formalize rules, generally eschewing code for data. It is an easy step to enable end users to edit the rules.
Templates
These affect display, formats, transformations, and any number of system inputs/outputs. These include XSLT.
Dynamic data
It is considered extremely chaotic to have code actually change code real-time. This rarely occurs on-purpose outside academia. Code is typically considered static and unchanging. However Configuration Data is easily changed real-time.

The heightened challenge of data
Data typically is not retained with the same rigor as source code:

Data often exists as a point in time
When a data element changes, the history is typically not retained. Databases commonly do not allow for native rollbacks in the way source code management systems allow.
Data does not allow for comments
Source code (based on the underlying language) allows for in-line comments. Data almost by definition does not support comments.
Data versioning today requires point-solutions
If you want a history of data changes, it typically requires the addition of a dimension for each element. This is custom coding, and in the pressures of today’s development and the need for performance, the data cannot easily be versioned.
Data is undated
Examine a database; can you tell when a given field within one record was changed? And by whom?
Data does not execute serially
What makes source code relatively easy to walk through is the sequential and serial nature of executing instructions. This is a part of the Von Neumann Architecture; this architecture first described by the visionary computer scientist John Von Neumann first segregated data from code, and described a control unit (today a CPU) for serially executing code. This leads to the next point.
Data has no meaning without context
The interpretation of data is left to the reader. The data by itself can be asserted to be meaningless without the context of its definition, purpose, restrictions, or values. Source Code by definition can be interpreted through the lens of the compiler or interpreter for which it is explicitly designed.

There are several gradual trends that have lead to data driven systems taking on aspects of source code:

Cloud Computing / SAAS
Software As A Service may just mean hosted application software, but if you consider when there are economies of scale via multi-tenancy, you have multiple instances of the application, serving many customers. Each customer may have the software configured slightly differently, and herein lays the rub. Customization is often done via data configuration settings, and those setting need to be treated with the same care as source code.
Need for Speed
Business desire to move quickly conflicts with configuration, change and release management which imposes control disciplines. Business users gradually realize they can bypass the iron grip of these processes by changing configuration information.
Template standardization
As software is commoditized, templates become standardized; both within applications and within the industry. This provides a rich and powerful capability for transforming software behavior without programming or touching source code.

The crux of the issue
There has long been a focus on Intellectual Property (IP), data protection, ensuring against data loss, and data availability, but what about managing changes to data that can impact the organization through how systems behave? Are changes tested and tracked, approved, and deployed on a schedule, with the ability to undo? Is there a systemic way to associate system behavioral changes with control data changes?

Network and security specialists have long grappled with this issue, as firewall rules and network routing are commonly data that has the capacity for significant impact on the organization. Access control related to financial data has taken on stricter requirements as a results of SOX. However Configuration Data remains the 900lb elephant in the room many prefer to ignore.

Best practices
1. Limit users that can change “Configuration Data”
2. Ensure there is a log of Configuration Data changes
3. Ban Configuration Data that is changed dynamically by an application.
4. Demand the ability to roll back from any change, and retain the control of roll-back within IT
5. Segregate roles of those that change data, and those that approve the changes
6. Have data changes made first in a test environment
7. Consider using QA to vet changes
8. Define Configuration Data clearly, and put Configuration Data changes through existing change control processes
9. Establish policies that reflect the above, publish policies and track that policies are accepted annually by all affected staff.

Thoughts for the future
A checksum can be used to verify that configuration data has not changed. Systems can be designed to report changes to configuration data, or even be designed to refuse to start, if it is preferable not to activate a system than to run a system with suspect data.

If Configuration Data is effectively indistinguishable from code in its effect, then why should Configuration Data be subject to less stringent controls than code? In summary, Configuration Data needs to be managed with the same level of control as source code. To be effective, on e needs to first define crisply what Configuration Data is, so as not to swamp an organization with unnecessary controls.

Project Management from the trenches

Sunday, January 25, 2009

Data Configuration Management

No comments:

Technology Management

About Me