We all take for granted that our live production systems are safe. We configure file system security to stop users casually navigating to data directories, we implement encryption on data to remove the worry of backups falling into the wrong hands, we define role based security to lock down access within applications and we even go to the trouble of encrypting traffic on our network to stop the more technically savvy from inspecting packets of data. We all do these things, right?
But what about your development, test and even training systems - how safe are they? Having developed toolkit-based workflow software for almost 20 years, I am very familiar with the notion of having numerous development and test systems to implement changes before publishing to live. I am also very aware, and increasingly concerned, by the practice of taking a copy of the production database to use for development and test purposes. I understand why - it’s a convenient way to create a representative dataset that makes it simple to build and tweak workflows whilst the volumes of data allow load and performance testing.
But consider this, if you copy your production database:
- Secondary environments contain data that is not being held for its intended purpose and could fall foul of data protection regulations.
- Internal security processes may not be as stringent for non-production environments and could lead to data breach or inappropriate access to confidential and sensitive information.
- Performing development, test or training exercises on a copy of live data could result in workflows being executed accidentally generating letters, invoices, emails or updates to other internal or external systems.
With the introduction of GDPR regulations next year, can you safely continue with this approach?
Actually, yes. I have recently been researching the topic of Data Obfuscation (DO). Simply put, DO is the process of masking, nulling or substituting data to randomise and anonymise sensitive data leaving a representative working system. It sounds simple but in practice can be quite a challenge although once implemented should be repeatable without much additional effort.
Typically the steps you take are:
- Data Identification: The process of deciding which items of data must be changed. Many systems have elements of fixed data common across all implementations and custom data created using the tools within the application. Your system developers, product owners or business analysts have the knowledge required for this task.
- Data Classification: Each item identified should be classified with an appropriate obfuscation method. I tend to favour three of the most common forms:
- Data substitution allows identifiable data to be replaced with representative fictitious alternatives.
- Nulling data resets original values back to blank.
- Masking data replaces one or all characters with a single alternative, for example invoices typically mask the majority of an account or credit card number with an ‘X’, leaving a few digits for identification.
- Replacement Datasets: The most popular obfuscation technique replaces identifiable data with random alternatives. To achieve this a standalone repository of replacement data must be created in a format matching the original data being obfuscated.
- Field Mapping: A mapping exercise must be performed linking live data fields with their replacement fields, mask settings or request to null data.
Fortunately for Lexis Visualfiles users, help is at hand. An obfuscation tool has been created to simplify these steps. Designed to run against non-production systems and based on recognised best practice, it helps you identify data by listing all fixed and custom data fields, easily create or import replacement datasets and defines masks and nulls. It even saves your settings for repeated use each time a new copy of production data is taken. Do you know what data is held in your development and test systems?