Sunday, 30 July 2017

Keeping track of databases

In most production settings, you end up with a number of databases. Some conform to a naming scheme - perhaps a name of a well used application, perhaps according to customer, perhaps according to some internal identifier. However, in most cases, the current scheme has not always been in place, and there's a collection of unknown databases (or ones that simply don't fit the pattern), which don't cause much trouble, but simply sit there, waiting to trip up the person who has rarely if ever interacted with that database, but is charged with supporting it.

As regulations such as the EU's GDPR come into focus, we are increasingly called to account for all the data we hold, particularly if it is personally identifiable. What this means, is we need to be able to identify what's in these 'miscellaneous' databases, and to what they relate. Frankly, it's a good practice anyway, so you can better manage the data that you hold. It's also helpful for maintenance work to understand the effect if you take a database instance offline; even if your organisation is a 24x7 operation, you will probably have databases that don't need to be online all the time - for example the payroll database (as the team who look at this only work in the day). You may have varying definitions of 'day', particularly if you cross timezones.

The question then is, how to identify what these systems relate to. You may have a documentation source for each database (I often find ours to be regrettably out of date), but even if this is in date, it is designed to be looked at for a specific system, so this still means trawling lots of documentation to find out what we want to know in this case - and the documentation doesn't automatically update as databases move.

The solution I've started to use is extended properties on the database. These are a place to put an sql variant value (which can hold text or numbers for example) properties of an entity, and which forms part of that object and can be scripted out with it - so they can be source controlled. They move with the database, so as it is backed up / moved, the properties stay with the object to which they relate.

So why doesn't everyone do this? Well, firstly, it's a schema change. This is a blocker in some organisations, or for some supplier-supported systems. I've found that asking, and explaining why, goes a long way here. Secondly, there's the question of how to use them.

Extended properties can be viewed in the SQL Server Management Studio GUI, however this is clunky at best, and I tend to use scripts to manipulate them, to ensure consistency.

Extended properties can be added, edited, or deleted using system stored procedures. To read them, a function (fn_listextendedproperty) is called, or a system view is available. The key problem here is how to read from all the databases you have in an estate, both within an instance and across multiple instances.
I use policy based management, with a check that looks for the specific extended property I use, to gather this information, but you could equally well query this from Powershell, as a property on the database object as outlined by Buck Woody.

Using properties like this mean you can also tie to other data you might have - like who is responsible for which systems in the wider business - to give a list of people to notify about potential issues to database instances, which can be updated without updating lots of documents; this means it's more likely to be up to date, when you need it most.

This list of databases, purposes and potentially business owners may also be helpful to your information assurance & governance teams, as they look to ensure they have a good understanding of what databases you hold, and why.