Sunday, 30 July 2017

Keeping track of databases

In most production settings, you end up with a number of databases. Some conform to a naming scheme - perhaps a name of a well used application, perhaps according to customer, perhaps according to some internal identifier. However, in most cases, the current scheme has not always been in place, and there's a collection of unknown databases (or ones that simply don't fit the pattern), which don't cause much trouble, but simply sit there, waiting to trip up the person who has rarely if ever interacted with that database, but is charged with supporting it.

As regulations such as the EU's GDPR come into focus, we are increasingly called to account for all the data we hold, particularly if it is personally identifiable. What this means, is we need to be able to identify what's in these 'miscellaneous' databases, and to what they relate. Frankly, it's a good practice anyway, so you can better manage the data that you hold. It's also helpful for maintenance work to understand the effect if you take a database instance offline; even if your organisation is a 24x7 operation, you will probably have databases that don't need to be online all the time - for example the payroll database (as the team who look at this only work in the day). You may have varying definitions of 'day', particularly if you cross timezones.

The question then is, how to identify what these systems relate to. You may have a documentation source for each database (I often find ours to be regrettably out of date), but even if this is in date, it is designed to be looked at for a specific system, so this still means trawling lots of documentation to find out what we want to know in this case - and the documentation doesn't automatically update as databases move.

The solution I've started to use is extended properties on the database. These are a place to put an sql variant value (which can hold text or numbers for example) properties of an entity, and which forms part of that object and can be scripted out with it - so they can be source controlled. They move with the database, so as it is backed up / moved, the properties stay with the object to which they relate.

So why doesn't everyone do this? Well, firstly, it's a schema change. This is a blocker in some organisations, or for some supplier-supported systems. I've found that asking, and explaining why, goes a long way here. Secondly, there's the question of how to use them.

Extended properties can be viewed in the SQL Server Management Studio GUI, however this is clunky at best, and I tend to use scripts to manipulate them, to ensure consistency.

Extended properties can be added, edited, or deleted using system stored procedures. To read them, a function (fn_listextendedproperty) is called, or a system view is available. The key problem here is how to read from all the databases you have in an estate, both within an instance and across multiple instances.
I use policy based management, with a check that looks for the specific extended property I use, to gather this information, but you could equally well query this from Powershell, as a property on the database object as outlined by Buck Woody.

Using properties like this mean you can also tie to other data you might have - like who is responsible for which systems in the wider business - to give a list of people to notify about potential issues to database instances, which can be updated without updating lots of documents; this means it's more likely to be up to date, when you need it most.

This list of databases, purposes and potentially business owners may also be helpful to your information assurance & governance teams, as they look to ensure they have a good understanding of what databases you hold, and why.

Friday, 23 October 2015

Always check the basics - backups

Are your databases being backed up? Can you restore from them? Are you sure?

SQL Server has a built in way to check when the database was last backed up. You can access this data in a couple of ways:

Via T-SQL  - query the sys.databases table. You can identify the last time the database had a full, differential or transaction log backup

You can also use a Policy Based Management policy to do this - I use this to at least ensure that each database has a full or differential backup from within the last day, and as a prompt to act if it doesn't.

You can of course use a number of means to back up, but the only way of testing the backup properly is to try restoring it. By running a restore you check the backup, and of course the backup chain in the case of transaction log backups.

It's only by testing the restore that you can be sure that the backup is any use - and that your data is safe. So the follow on process is to ensure that you run consistency checks (DBCC CHECKDB in SQL Server) on the restored version.

Now we know:

  • The database was backed up
  • It can be restored to when the backup was taken
  • The restored database is not corrupt
  • By implication, the original database wasn't corrupt when the backup was taken
This means it is a backup that you, and your business can rely on.

The next step is to work with your backup software to automate this process, so you can be sure that the backup meets this criteria each time it runs.