Musings on Data, Data Management and SQL Server: 2015

Friday, 23 October 2015

Always check the basics - backups

Are your databases being backed up? Can you restore from them? Are you sure?

SQL Server has a built in way to check when the database was last backed up. You can access this data in a couple of ways:

Via T-SQL - query the sys.databases table. You can identify the last time the database had a full, differential or transaction log backup

You can also use a Policy Based Management policy to do this - I use this to at least ensure that each database has a full or differential backup from within the last day, and as a prompt to act if it doesn't.

You can of course use a number of means to back up, but the only way of testing the backup properly is to try restoring it. By running a restore you check the backup, and of course the backup chain in the case of transaction log backups.

It's only by testing the restore that you can be sure that the backup is any use - and that your data is safe. So the follow on process is to ensure that you run consistency checks (DBCC CHECKDB in SQL Server) on the restored version.

Now we know:

The database was backed up
It can be restored to when the backup was taken
The restored database is not corrupt
By implication, the original database wasn't corrupt when the backup was taken

This means it is a backup that you, and your business can rely on.

The next step is to work with your backup software to automate this process, so you can be sure that the backup meets this criteria each time it runs.

Tuesday, 8 September 2015

Strategies for managing an enterprise - Policy Based Management (T-SQL Tuesday)

You may be aware of T-SQL Tuesday; it's a concept conceived by Adam Machanic ( b | t ) some years ago, to encourage a number of posts on a different specific topic on a monthly basis. This month's topic, as hosted by Jen,McCown, is Strategies for managing an enterprise.

A Strategy is a plan of action designed to achieve a long term goal or aim, so I thought I'd write about how I'd planned and achieved a solution to the problem of assuring myself and the business that our SQL servers met the policies and best practices we aspired to achieve.

I had recently been faced with the need to gain confidence in the state of a set of SQL Servers whose management hitherto has at best been a part time activity, and not owned by any one person or group. This had meant that, understandably, consistency and best practice have suffered, to the extent that a base level of audit and assurance is required. There's a few ways to achieve this, and I needed to come up with a strategy that would hit several goals.

Easy to implement

Technically advanced technologies are great, but sometimes you just need basic checks and the ability to build on the basis they provide. In our case, we needed to use the existing skills base to achieve our goal.

Automated

I didn't have the luxury of assuming there will be a full time DBA in the organisation in the future, and in any case we are always pushed to do more with less resource, so it is likely that any solution that involves manual steps will not be carried out regularly. We therefore needed an automated solution.

Visible

For any audit tool to be effective, particularly a tactical one, we needed to be able to provide a level of reporting, to allow the business to perceive the health of the assets, and have confidence that issues are addressed.

Comprehensive

There's no point in having a solution that doesn't cover all the bases. We needed one that will allow us to expand on the 'out of the box' functions to do custom metrics.

Cost effective

There are a lot of valuable solutions out there that really do provide great monitoring for large server estates, allowing real-time automated resolution of issues, and confirmation of policy adherence. Our needs were more modest however, and it should be noted that the goal was to achieve the monitoring without spending out on licences; this is something we can aspire to once the base level of audit is in place.

So once these goals are defined, I looked at some options. Third party solutions like Nagios, which is a great monitoring tool I've used in the past, were not an option due to existing skills bases. I needed to use something that would use existing developer or administrative skillsets. I explored Policy Based Management (PBM) in SQL server, and decided this was a good avenue because it allowed simple best practices to be easily monitored, in some cases bad practices prevented, and could be easily rolled out to the enterprise via a central management server.

Using PBM, you can write policies that prevent some behaviours, but due partly to existing codebases, and in some cases third party products that we can't modify, as well as a relatively relaxed regulatory framework, I've chosen to alert rather than prevent. I can write conditions that only target certain server classes, and have taken to using extended properties on databases to denote whether certain conditions apply (and on the master database to denote server environment characteristics, such as Production/UAT/Dev). This gives the flexibility to write one policy for a condition and have it apply wherever an applicable system is hosted in the enterprise.

This setup was still missing one part however - the visible bit. By default, when you have a non-compliant server, all you get is an icon in SQL Server Management Studio, and an error code fired which means you can alert via an event. This isn't even very visible if you know what you're looking for - and doesn't tell you what's wrong.

The solution was to use the Enterprise Policy Management Framework, a combination of Powershell and SSRS to query an estate of servers and present the results in a report, which could be displayed centrally (think intranet, or screen in the management office), or scheduled and circulated like other SSRS reports, and is simple to visually check on a regular basis - a screen full of red is bad!

Did it work? Certainly, and it allowed identification of issues such as backups being missed off the schedules, security permissions which were against policy, orphaned users and the like. This allowed the strategy to achieve the aim of giving confidence in the level of servers across the enterprise.

There are some areas for further thought, not least in terms of the security permissions required to run reports, which are necessarily high - however any system which monitors areas such as sensitive configuration will require a comparatively high level of access. I'm also looking at modifying some of the reports to better fit our needs, but this is a minor change given the data is now in place.

I found that adopting a strategy of implementing a monitoring solution to check the basics are in place has given me the confidence that we have a firmer base to build on. The strategies and goals they aim to achieve will evolve, as they all should, as time is freed up and the needs of the business dictate. I've not given up on implementing a more integrated monitoring solution in the longer term, but this is a great way of assuring the business that the service we provide is fit for purpose, and can be relied upon.

Monday, 20 July 2015

Solving data problems cost effectively

I believe that most bad data can be categorised as due to one of:

Human data entry issues
Bad processing of data we hold
Corruption of data in storage

The last of these is a longer topic, and whilst there are mitigation strategies they are mostly a cost / spending exercise to achieve the required level of risk reduction. Entry issues have a few mitigation strategies - double entry can be used (principally used historically in finance audit systems) to prevent typos, but in an increasingly real-time data collection this is less practical. However what can be used are methods such as field validation rules, and repeating back to the user related data, to check the details. An example might be to look up the postcode a user entered and present a selection of valid addresses (assuming you have a source of these) for that post code. Thus, if an incorrect postcode has been entered the mistake should be obvious immediately, allowing it to be checked. Another method would be to indicate the location on a map. The method of checking will depend on the way that we have of communicating with the person from whom the data is collected. Part of the goal here is not only to ensure that our data is not erroneous, but that we do actually have the correct data. If you think of a scenario where we are collecting data from people in Plymouth, who will all have a post code starting PL... then if a users enters a post code starting PO.. (Portsmouth) then this is clearly incorrect - however it doesn't tell us what the correct post code actually is. If looking at questions of a more view based type, it may be possible to ask a similar question in two ways and cross check the answers.

Once we have the correct data then we need to ensure the correct processing of it, and it is here that the issues can get costly. The cost of getting data is small compared to the cost of getting it *again* if that data is lost - not only is there a monetary cost but also a cost to reputation. So this implies that we need to ensure our processing of data is valid. The best defence we have against the human quality of fallibility is testing, and more generally quality control. I would suggest that any software (most data processing is now done in software, but the principles can still be applied to manual processes) that is not meeting a basic quality standard should not be judged fit for use. Such standards, whilst defined to meet your project / organisational needs, should consider:

Have we identified the requirement this fulfils?
Does the documentation explain how we fulfil it?
Does the testing check all aspects of the requirement?
Does the testing check for entry of unusual data?
Does the unit of code work well on it's own?
How about if we put it within the overall system?
Is it maintainable (i.e. according to policies, design standards, etc.)?
Has someone else code-reviewed it, and the documentation and test, to check for errors?

The last point is important because a test failure would simply imply that the results didn't match what the test expected - it is up to the requirements documentation (or business representative) to arbitrate as to which is incorrect, the test or the code. I believe that all of these are valid and should form your definition of when code is "done" however you develop your solution - a traditional waterfall approach, an Agile methodology, or a more ad-hoc process. By catching these issues before a product is released, and actively processing real data, we remove the costs associated with loss of data, or correcting for accuracy problems which result from our processing.

Whilst we have above covered some basics this is clearly a much broader topic; it amazes me that I still see organisations who do not have robust processes to ensure their data processing does not only meet the legal obligations such as the Data Protection Act (such as Principal 4, accuracy), but also avoid compiling information inaccurately and therefore leave themselves open to making decisions on a flawed set of data, potentially invalidating the decision, and risking significant financial and reputation damage.

Tuesday, 14 April 2015

Information Vs Data Vs Security

A subject that has been close to my mind recently, is as to when data becomes information, and how the security of that asset is managed - and by whom.

We have probably all seen the triangle:

By Longlivetheux (Own work) [CC BY-SA 4.0 (http://creativecommons.org/licenses/by-sa/4.0)], via Wikimedia Commons

Which as you will gather, denotes that Information is comprised of (useful sets of) Data. There's also the concept, at least in UK law, of data, which comprises of Information:

Data means information which –(a) is being processed by means of equipment operating automatically in response to instructions given for that purpose,
(b) is recorded with the intention that it should be processed by means of such equipment,
(c) is recorded as part of a relevant filing system or with the intention that it should form part of a relevant filing system,
(d) does not fall within paragraph (a), (b) or (c) but forms part of an accessible record as defined by section 68, or
(e) is recorded information held by a public authority and does not fall within any of paragraphs (a) to (d).

https://ico.org.uk/for-organisations/guide-to-data-protection/key-definitions/

So it is clear that these terms are not defined exclusively, and that the terminology alone may present a problem.

The business often has a nominated Information Asset Manager, who looks after the types of information an organisation processes, and is responsible (perhaps to a Senior Information Risk Officer, or SIRO). The organisational policy may well give this person (delegated from the SIRO) the last word on policy - however this doesn't mean this person has the technical background to ensure that policy is implemented - or even practical.

There may also be someone in the IT infrastructure whose remit is security - either a dedicated IT Security Officer, or perhaps someone who has this responsibility as part of their work. Again, this is quite a wide ranging remit, and this person may feel it is better left to subject matter experts in specific systems, and lay guidance as to the principles to be observed.

This leaves the DBA with two potential places to go for security instructions, yet neither may understand the detailed technical processes necessary to actually undertake the remit, as their remit is either business focused or too wide for the focus on the specific technology of databases. That's where the DBA brings his or her expertise to bear, and as such brings value to the business.

Often the DBA role is also to mediate where the tension exists between data security and information use. This is because they are the guardian of the security on the data - and so the first place that gets approached when technical permission is denied. This brings to light the difference between a technical prevention measure, and a business policy allowing, forbidding, or laying restrictions on a course of action.

Whilst clearly data security will almost always win the balance between security and flexibility, the role of the data professional is increasingly to suggest a way to merge the data once permission is secured and in accordance with best practice. This is a difficult balancing act, which may not be fully understood by information users.

So how can we alleviate the issues this generates? Firstly - publicity. Explain what is necessary, and have a ready to roll out example of why you wouldn't like your data to be misused. Perhaps an example might be of sensitive personal data being unexpectedly available to the world for misuse, or the potential for processing of data in a way it wasn't gathered for - both breaches of the DPA, but still things that get requested disturbingly often.

A second avenue is to engage with the other data professionals in a proactive way, and ensure that clear paths are laid down to get issues circulated for discussion - and engage with the business to resolve them. Generally, there's a good way to do things, and a bad way - for example, would anonymised data do for a task? Could the risk be mitigated somehow? Perhaps we can't do something with the data we have, but changing how it is captured to ensure the correct permissions are granted gives us the ability to do this in the future. If the policies don't exist (or haven't been reviewed recently) then these can be bolstered to ensure they are fit for purpose - and in line with recent changes to regulation.

With these approaches, we lay down a safe way of working, that also keeps us within legal boundaries and hence providing value to the business with both of these points.

Wednesday, 4 March 2015

Justifying Unit Tests for Databases

It’s a shame that unit testing of databases is, relatively speaking, the poor cousin of application unit tests. After all, the same benefits of surety of code and ease of refactoring can be achieved by this simple means, and ultimately costs saved by ensuring that the (inevitable) bugs in an application and database are caught earlier in the product lifecycle.

Whilst Unit testing is a key point in Agile software development, there are too many projects out there where the existing codebase is perceived to be too large to allow unit tests to be built up for a product. This isn’t necessarily the only way to tackle an existing codebase- I’d recommend starting with bug fixes, and building up from there.

I can foresee that an idea that is common in microservice provision, to allow a development team to “adopt” or retain responsibility for a software product for the lifecycle of that product – from initial concept to final retirement, with all the rollout, upgrade and support in the middle – will be what finally convinces database developers that it’s not worth risking buggy code – or the chance of breaking something else when the inevitable support call does come. This merging of the traditional development and operations functions into one service-oriented team makes it in ones’ own interest to minimise total effort on issues – and a few other best practices like good documentation and minimising “just in case” code along the way.

As a strong proponent of data integrity, I see unit testing as part of the armoury of tools necessary to develop and maintain a system to capture information in a robust format, and ensure that the data retains it’s integrity through its’ lifetime.

If you want to find out more about how to justify unit testing for databases, I’ve recently published a course on this very subject through Pluralsight. Whilst Pluralsight isn't free to access, there is a free trial available, so set some time aside and watch it to help you build a business case for adopting this best practice.