You may be aware of T-SQL Tuesday; it's a concept conceived by Adam Machanic (
b |
t ) some years ago, to encourage a number of posts on a different specific topic on a monthly basis. This month's topic, as hosted by Jen,McCown, is
Strategies for managing an enterprise.
A Strategy is
a plan of action designed to achieve a long term goal or aim, so I thought I'd write about how I'd planned and achieved a solution to the problem of assuring myself and the business that our SQL servers met the policies and best practices we aspired to achieve.
I had recently been faced with the need to gain confidence in the state of a set of SQL Servers whose management hitherto has at best been a part time activity, and not owned by any one person or group. This had meant that, understandably, consistency and best practice have suffered, to the extent that a base level of audit and assurance is required. There's a few ways to achieve this, and I needed to come up with a strategy that would hit several goals.
Easy to implement
Technically advanced technologies are great, but sometimes you just need basic checks and the ability to build on the basis they provide. In our case, we needed to use the existing skills base to achieve our goal.
Automated
I didn't have the luxury of assuming there will be a full time DBA in the organisation in the future, and in any case we are always pushed to do more with less resource, so it is likely that any solution that involves manual steps will not be carried out regularly. We therefore needed an automated solution.
Visible
For any audit tool to be effective, particularly a tactical one, we needed to be able to provide a level of reporting, to allow the business to perceive the health of the assets, and have confidence that issues are addressed.
Comprehensive
There's no point in having a solution that doesn't cover all the bases. We needed one that will allow us to expand on the 'out of the box' functions to do custom metrics.
Cost effective
There are a lot of valuable solutions out there that really do provide great monitoring for large server estates, allowing real-time automated resolution of issues, and confirmation of policy adherence. Our needs were more modest however, and it should be noted that the goal was to achieve the monitoring without spending out on licences; this is something we can aspire to once the base level of audit is in place.
So once these goals are defined, I looked at some options. Third party solutions like
Nagios, which is a great monitoring tool I've used in the past, were not an option due to existing skills bases. I needed to use something that would use existing developer or administrative skillsets. I explored
Policy Based Management (PBM) in SQL server, and decided this was a good avenue because it allowed simple best practices to be easily monitored, in some cases bad practices prevented, and could be easily rolled out to the enterprise via a
central management server.
Using PBM, you can write policies that prevent some behaviours, but due partly to existing codebases, and in some cases third party products that we can't modify, as well as a relatively relaxed regulatory framework, I've chosen to alert rather than prevent. I can write conditions that only target certain server classes, and have taken to using extended properties on databases to denote whether certain conditions apply (and on the master database to denote server environment characteristics, such as Production/UAT/Dev). This gives the flexibility to write one policy for a condition and have it apply wherever an applicable system is hosted in the enterprise.
This setup was still missing one part however - the visible bit. By default, when you have a non-compliant server, all you get is an icon in SQL Server Management Studio, and an error code fired which means you can alert via an event. This isn't even very visible if you know what you're looking for - and doesn't tell you what's wrong.
The solution was to use the
Enterprise Policy Management Framework, a combination of Powershell and SSRS to query an estate of servers and present the results in a report, which could be displayed centrally (think intranet, or screen in the management office), or scheduled and circulated like other SSRS reports, and is simple to visually check on a regular basis - a screen full of red is bad!
Did it work? Certainly, and it allowed identification of issues such as backups being missed off the schedules, security permissions which were against policy, orphaned users and the like. This allowed the strategy to achieve the aim of giving confidence in the level of servers across the enterprise.
There are some areas for further thought, not least in terms of the security permissions required to run reports, which are necessarily high - however any system which monitors areas such as sensitive configuration will require a comparatively high level of access. I'm also looking at modifying some of the reports to better fit our needs, but this is a minor change given the data is now in place.
I found that adopting a strategy of implementing a monitoring solution to check the basics are in place has given me the confidence that we have a firmer base to build on. The strategies and goals they aim to achieve will evolve, as they all should, as time is freed up and the needs of the business dictate. I've not given up on implementing a more integrated monitoring solution in the longer term, but this is a great way of assuring the business that the service we provide is fit for purpose, and can be relied upon.