Musings on Data, Data Management and SQL Server: 2012

Monday, 31 December 2012

Testing variable messages just got easier in tSQLt

I recently noticed that the open source unit testing framework I use (and have previously written about) - tSQLt - has had a new release, and with it a new type of assert. Asserts are how you test whether the result you got in a test was what you wanted, and hence whether the test should pass or fail.

This new assert routine, AssertLike, solves one of the more common situations which I've had to work around - that where you know roughly the message you are checking for, but not the totality of it. Let's take the example of a stored procedure which gets all addresses for a given state ID. This is somewhat contrived - there are reasons why you wouldn't do exactly this in a real application, but it makes for an easy to follow example. I'm using the AdventureWorks 2012 database.

/*

   Version       :  1.0

   Written on    :  26/12/2012

   Written by    :  Dave Green

   Purpose of SP :  To retrieve all address records in a given state.

*/

CREATE PROCEDURE dbo.usp_GetAddressesInState

@StateProvinceCode VARCHAR(100)

AS



DECLARE @StateProvinceID INT, @ErrorMessage VARCHAR(100)

--Attempt to get province ID from StateProvince table
SELECT @StateProvinceID = StateProvinceID
FROM Person.StateProvince

 WHERE StateProvinceCode = @StateProvinceCode

IF @StateProvinceID IS NULL --If the stateprovince wasn't valid, raise an error

BEGIN

   SET @ErrorMessage = 'Unable to get StateProvinceID'

   RAISERROR (@ErrorMessage,16,1)

END

--Return customer records

SELECT  AddressID ,

        AddressLine1 ,   AddressLine2 ,

        City ,    StateProvinceID ,

        PostalCode ,  SpatialLocation ,

        ModifiedDate

FROM Person.Address WHERE StateProvinceID = @StateProvinceID
GO

This Stored Proc checks that the Province ID exists before it returns the results; this means that no results mean that there are no results, rather than an invalid (or not on record) state. This sort of distinction is quite common for applications where you want to return a more helpful message to the user to allow them to take action.

We can test this SP using the following tSQLt code:

/*

   Version       :  1.0

   Written on    :  26/12/2012

   Written by    :  Dave Green

   Purpose of Class :  Demo of AssertLike

*/

EXEC tSQLt.NewTestClass 'ALDemo'



GO



/*

   Version    :  1.0

   Written on :  26/12/2012

   Written by :  Dave Green

   Purpose    :  check that invalid States give an error

*/

CREATE PROCEDURE ALDemo.[test invalid State code produces error]

AS

/**********************************************

**** To verify that the SP throws an error

**** when fed with an invalid state code

***********************************************/

--Assemble

--Act

  --  Execute the code under test

BEGIN TRY


 EXEC dbo.usp_GetAddressesInState @StateProvinceCode = 'test' -- varchar(100)


   --Assert

 EXEC tSQLt.Fail 'Expected error was not thrown'

END TRY

BEGIN CATCH
DECLARE @Actual NVARCHAR(MAX) = ERROR_MESSAGE()

 EXEC tSQLt.AssertEqualsString @Expected = N'Unable to get StateProvinceID', -- nvarchar(max)

    @Actual = @Actual, -- nvarchar(max)

    @Message = 'The error thrown was different to that which was expected' -- nvarchar(max)



END CATCH



GO

EXEC tSQLt.Run 'ALDemo'

-- Run the test

This works successfully, and tests the proc, checking that the correct error message is thrown (i.e. there isn't another error which is causing the proc to fail first).

This is all well and good, but then a requirement for change comes that the error message shown to the caller should specify the erroneous value. This is easily accomplished in the SP:

/*

   Written on :  26/12/2012

   Written by :  Dave Green

   Purpose    :  Get address records for a given state.

*/



/* Updated by Dave Green 26/12/2012 to return any erroneous state in the error message.*/

ALTER PROCEDURE usp_GetAddressesInState

@StateProvinceCode VARCHAR(100)

AS



DECLARE @StateProvinceID INT, @ErrorMessage VARCHAR(100)

--Attempt to get province ID from StateProvince table
SELECT @StateProvinceID = StateProvinceID
FROM Person.StateProvince
WHERE StateProvinceCode = @StateProvinceCode

IF @StateProvinceID IS NULL --If the stateprovince wasn't valid, raise an error

BEGIN

   SET @ErrorMessage = 'Unable to get StateProvinceID - Province '

                       +@StateProvinceCode+' may be invalid.'

   RAISERROR (@ErrorMessage,16,1)

END --Return customer records

SELECT  AddressID ,

        AddressLine1 ,   AddressLine2 ,

        City ,    StateProvinceID ,

        PostalCode ,  SpatialLocation ,

        ModifiedDate

FROM Person.Address WHERE StateProvinceID = @StateProvinceID
GO

However, how can we adjust the unit test to accommodate this? The error message will change each time the SP is called - to reflect the input parameter.

The answer has always been to put in place custom logic in the 'CATCH' block - 'If error message is like this, then fine, else fail'.
This is realised in code as:

BEGIN CATCH

   DECLARE @Actual NVARCHAR(MAX) = ERROR_MESSAGE()

   IF @Actual IS NULL OR @Actual NOT LIKE 'Unable to get StateProvinceID - Province % may be invalid.'

   EXEC tSQLt.Fail @Message0 = 'The error thrown was different to that which was expected'

END CATCH

Of course, if you are checking a specific value, or date/time, this gets more complicated. A line number in the error message makes life even more unpredictable. This is in my view a little ugly, and doesn't lead itself to easy reading in the same way that the simple assert did. The good news is that the latest version of tSQLt now has an AssertLike assert procedure, so we can now simplify this as:

BEGIN CATCH
 DECLARE @Actual NVARCHAR(MAX) = ERROR_MESSAGE()
 EXEC tSQLt.AssertLike @ExpectedPattern = N'Unable to get StateProvinceID - Province % may be invalid.', -- nvarchar(max)
     @Actual = @Actual, -- nvarchar(max)
     @Message = N'The error thrown was different to that which was expected' -- nvarchar(max)

END CATCH

The new assert gives us a standardised approach which also includes a null check (on both sides). This allows a neater approach than custom logic for this basic type of check, and has particular applicability where the line number of an error is reported back as part of the message. This is because tests would then not break solely for a comment being added to the procedure, thus reducing the work required to 'fix' such tests when such a comment is added.

This seems to me to be much neater, and should be more a maintainable test for the future too, as it follows the same form as asserts in other tests.

Tuesday, 20 November 2012

When database code just doesn't smell right

Have you ever had a feeling that although your (or your developers') code works fine and passes all the unit / functional tests, perhaps it isn't all that it might be? It may be that you have inherited historic code, the requirements have developed organically, or perhaps that it was simply written in a hurry as a proof of concept that has stayed in production far longer than you would like to admit.

There are best practice tools such as SQL Cop to help identify this in databases, but perhaps you want something a little different. There can be a requirement to do this not only on a compiled database, but also on a series of script files. Perhaps these differentiate your database for different scenarios, or perhaps they're utility scripts you simply want to have a heads up on any hidden nasties that they contain.

This was something that had obviously struck Dave Ballantyne, and so he's created TSQLSmells, an opensource powershell script which analyses SQL script files for common "smells". Whilst it's a work in progress at the moment, Dave is actively developing it, welcoming and responding to feedback.

I've had a working Adventureworks database in a continuous integration job for some time, so this seemed the perfect test implementation to apply to TSQLSmells.

Firstly - the requirements. TSQLSmells needs powershell version 3, so you may need to update your copy to use it. This is a free download from Microsoft, so whilst I'd recommend trying it on a testbed first, it shouldn't cause any issues.

Next, let's look at how to run the script. The command line I use is as follows:

powershell -ExecutionPolicy Bypass -File "c:\cifiles\TSQLSmells.ps1" -FolderPath "%teamcity.build.checkoutDir%\Database" -ConfigFolder "C:\CIfiles" -minSeverity 10 -SupressGUI

Now, you will see I'm calling powershell bypassing the execution policy. You could update your build agent to set this appropriately, or even sign the script if you wanted. The File parameter points to the location of the powershell script. FolderPath is used to denote where the scripts under test are. I'm pointing it at a directory where my source controlled database is located. You can use a FileSpec parameter to denote when your files have a different extension to .sql (the default).

TSQLSmells denotes a "severity" for each error it searches for. These are configurable within the errors.txt file (which is a simple CSV format), and this allows you to exclude certain types of error if you do not want these to be reported upon. This is where the MinSeverity parameter comes in, as this is the threshold for reporting errors - simply set it to be not lower than the value you wish to exclude. You can also exclude specific error numbers with the ExcludeErrors parameter.

The output to the script is put in the config folder, and called Smells.xml. Whilst this isn't ideal from a CI point of view, it's a relatively simple matter to update the output location in the script so that the file is output within the CI job folder. I store the file as an artifact on each build, so that it can be referred to, and stored with test results.

Although the Smells.xml file is not in a format which the CI system directly understands, it is an easily understandable format and would not be difficult to associate with an XSD file for display. (I am exploring with Dave at the moment whether the output could be altered for better integration with CI systems in the future.)

If you don't want to run the script as part of a CI process, it does have a GUI element to it - you'll note I used the SupressGUI parameter above, if you exclude this parameter then the GUI will be run to return the results to you.

The choice to use powershell rather than SQL to analyse the database scripts has both pros and cons, the trade-off being that whilst you can examine specific scripts outside the database, you don't have the easy extensibility for those not familiar with powershell. This may include many database devs/DBAs, although powershell is becoming more widely adopted. I feel that this is a distinguishing feature which gives TSQLSmells an edge - it allows scripts to be examined outside of a database, and to allow them to be examined before they are put into the final environment. Although I am in this example using a source controlled database, there is no reason why the database need be involved at all. This gives an ability for a sanity check to be light-weight - and therefore run ad-hoc more frequently.

I think that this script shows great promise and will for me form another method of screening code for potential issues. Whilst it's not a cast-iron guarantee of a problem, it's a useful way of highlighting potential problems, and gives a list of files to check in more detail.

Thursday, 1 November 2012

Competition Winners

I'm pleased to announce that we have picked the 5 lucky winners of a copy of Red Gate Source Control.

The winners are:

Michelle Gartner (@michmlr)
Brandon Grant (@brandongrant223)
Jamie Walden (@jamiewalden)
Brian Beverage (@B_Beverage)
JustinS (@jswinf)

Red Gate will be in touch shortly to give you your license!

(If you have any problems - please drop me a tweet - @d_a_green)

Thanks again to all those who entered.

Thursday, 11 October 2012

How do you convince the business that Source Control for databases is a good idea?

Read on to find out how you can win software to help you integrate Source Control into your development environment!

What is source control?

Source control is a means of keeping every change you make to your code, so that you can at any point see what's changed, or go back to a point in time. This is a relatively new concept to database developers, who are used to dealing with the database as a database, not as a collection of objects and data which can be scripted.

Source control has been in use by application developers for many years – and there are many flavours to choose from; some open-source (e.g. Git, Subversion), some closed source (Accurev, Team Foundation Server). There are many pros and cons to each system, and there’s a good source control comparison here which may help to work out which might suit you best. You may have one elsewhere in the business which can be used. This may help to reduce resistance to adoption – but you need to ensure that the source control system you choose can work for you.

The second linked area to consider is integration with your development environment. I find that application switching is time consuming and saps concentration – so I want an easy to use solution which isn’t going to distract me. Some developers will use SQL Server Management Studio for database development, others will use Visual Studio. Some even use a text editor like Notepad. With the exception of the latter category, there are tools that will integrate with your environment to some degree or another - whatever you use.

A key factor in how you use source control is whether you go for a shared development model, or a dedicated model. Whilst there are advantages to both, I've found that the dedicated model integrates better with source control, and developer SQL Server editions are really quite cost effective so the cost of multiple systems is less of a factor than it once was.

It is perhaps worth noting that many developers have very strong feelings about the source control system that they are comfortable with, and the reasons they don't like others. I suggest that before you fully implement a given source control system you think about your workflow, what you are trying to develop, and read the adverse comments about your chosen source control system. You'll no doubt find reasons others don't like it - and you can then get a better feeling as to whether this could prove a problem for your team in the future.

Why is it good for developers?

Source control gives you version control of your changes including history for all changes – and a dedicated development model will allow you to do your work without affecting others – yet also work collaboratively should you wish to. The use of database continuous integration requires source control – or at least a predictable build script. I’m lazy – and so I prefer to get the easiest option, in this case source control. Continuous Integration is a key component of certain development methodologies such as Agile (Scrum, etc.) so if your company is looking at implementing these then you may need to consider it.

One area where Database scripting tools have previously lacked polish is upgrade scripts – how to make DDL changes without losing data, or having to script each upgrade. This saves time and allows a repeatable upgrade – which is great for ensuring your production systems can be predictably upgraded, and even lead on to continuous deployment. Source control allows upgrade ("migration") scripts to be used to convert between specific, known versions of the software. Most tools will "guess" and some allow you to override specific changes and save the override.

Features such as branches or streams allow you to develop new code separately to patches to existing systems, and have a way to get to the database code at any point in time. By having your application in the same source control as your database, you give yourself the key ability to get your application and database from any given point, in a cohesive way so that you know it should build – which is great for reproducing the system in a known state, for bug fixes etc. You can use the power of your source control system to get change history, and follow changes to help resolve issues.

Really key however in my opinion is getting the solution integrated into the development environment. This really reduces the overhead of having source control, and thus you get benefits more quickly. This can often be the first area to suffer when costs are cut, and reduces the immediate productivity benefit of the solution.

What does the business get?

For any change to development methodologies to become adopted by an organisation, it needs to be bringing something to the business – not just making Developers’ lives easier. So what does Source Control give the business?

A good source control implementation will mean that developers are developing against the same code base – less need for concern about a forgotten table, so the resultant installation will be repeatable. This means it scales well for new employees, and existing employees are usually given a list of changed files/objects – so you’re less susceptible to human error in forgetting to check in a change.

The business may want to support a number of different versions of the product – for example customer specific installations, or support of previous releases for a period of time. By having the application and database code in the same source control repository you know you are always looking at a consistent version of all layers (e.g. Application, Data Access Layer, and Database) so you know it will all hang together.

Increasingly we live in a world where auditing is required. Source control gives you an ability to know when a change was made, and by whom. With the aid of a standard style of commit message, you can also know why the change was made (perhaps a link to the support ticket or feature request?). Using Source Control provides you with a known truth for the database under development. Think how much time and worry it could save if you know exactly where to get all the database changes to implement, and you know that it's all being backed up for safe keeping should the worst happen.

By implementing Source Control you lower the barriers to implementing deployment testing, perhaps within a continuous integration environment. This means you can test your deployment on supported hardware/software configurations and run a standard set of tests to give confidence in the product. I’ve previously written about how to set this up.

Why doesn’t everyone use it already?

Whilst application developers have used source control for a number of years, it is relatively new to the database development scene, and as such the tools to make it easy have only started to appear in the last year or so. Database developers (and particularly DBAs who need to be on board) are often reluctant to leap to the latest change until it has been proven to be robust and reliable. This isn't a bad thing - new tools and ways of working take time to adopt, and you want to be sure you have the right one.

Adopting source control is a change to the toolset. You need therefore to be aware that it will take investment in the short term, to yield benefit in the longer term. Immediate benefits in terms of having a central location for all changes which can be easily backed up and rolled back is something often overlooked by those who haven't used such systems in the past.

To get a change in the development practices and tools, you need a champion in the business - this is really important - because developer led solutions rarely get the budget approval (in terms of time or money) to proceed. You do of course need a development team that is open to it too!

In terms of costs, this will depend somewhat upon the size of your team, but more importantly the features that you want to deploy, and whether you go for a commercial or free product. There's no right answer here - but I suggest you start at the features you want, and go from there - the cost of a commercial product is offset by support, and the adoption costs of any source control systems should be quickly made back in terms of increased productivity and reduced errors. If you want to adopt the dedicated model (which I recommend to avoid conflict) then you'll need to think about a developer database installation - these can be quite cost effective, and may be covered by existing licensing (e.g. MSDN). You need to also think about licensing for a continuous integration server if you're going that way - it's often not included in the team's tooling licenses (but this could be done slightly later, once the team has become comfortable with source control). You need to budget some space for the code repository to live (and be backed up) as well as some time to manage the system - at least in its early days whilst things are settling down.

The most important side is probably the user side - but I have found that it is sometimes helpful to try it with a small group first, to allow any problems with workflow to be picked up and the tooling choice confirmed.

Overall

Why not try it? You can adopt it in stages - perhaps start using it only on a new project, or run source control in parallel with your current arrangement. The business benefits of having a more robust development process are clear, particularly where you have many flavors of your product. It allows a better isolation between development and production (important for certain industries).

Source control is rapidly becoming a must have for database developers, in the way that it as been for application developers for some time.

Win a Red Gate Source Control license!

If I haven’t convinced you that it’s a better way to work – why not find out for yourself for free? Red Gate have kindly provided 5 licenses for their Source Control product to give away! Red Gate's Source Control software is an interface with SQL Server Management Studio, to give a user friendly way of scripting out all of your changes and checking them into your source control system, as well as implementing others' changes in your database. It can work with Subversion (GPL), TFS, Vault, as well as many other systems you may already have (I've used it with Accurev), or even to a file location, to allow you to try out source control for databases in your environment.

To be in with a chance of winning, simply tweet a link to this post and leave a comment below, including your twitter handle so we can contact you. For example, why you think that database source control is a good/bad idea? Why not detail your experiences - good or bad - of using or adopting source control for databases? If you don't leave your twitter handle you can't win - sorry!

Because there's no right or wrong answer - it's all based on your viewpoint - this is a prize draw. You can enter until 26th October 2012 (23:59 BST).

Wednesday, 19 September 2012

Displaying database unit test package names in CI

I've been using tSQLt tests for a while now with Continuous Integration, and one of the things I've noticed is that test results from nUnit or JUnit are presented with a nice package name, but that the tSQLt tests either have no package name, or list as (root) - depending on your CI system.

TeamCity

Jenkins

I have recently had the occasion to realise quite how annoying this can be, particularly if you're using Jenkins and running tests with the same names in multiple databases (TeamCity does list the database if you dig into the tests tab, but oddly not everywhere). This presents the problem that you can't easily see the name of the database in which a failure has occurred, to aid prompt troubleshooting.

So - I've done a little digging, and found this post on why this might be the case for Jenkins. I then modified the way which tSQLt tests are output to achieve the test labeling which Jenkins needs, i.e. "classname.testname". I felt that database name was the best class to suit my needs, but you can modify to suit your requirements.

The modification is to [tSQLt].[XmlResultFormatter]. I changed the selects in the union, for example in the snippet below:

I've changed the 'Class' lines to read:

Do this for all the 'Class' lines in the procedure. There is 1 change in the second union, and 2 in each of the third and fourth unions.

Making this change gives us the following result when we re-run the tests from Continuous Integration:

TeamCity

Jenkins

As you can see this small modification helps to aid visibility to the test failure display.

Once you've modified the framework, you can either script the change as part of your build process, or if you have a source controlled database, then you can simply change the source controlled version of the [tSQLt].[XmlResultFormatter] stored procedure.

By the way - you can also output test execution times to your CI system by modifying this same procedure - see this mailing list post for details.

I think these changes really help to integrate database unit tests within the CI process, and make them easier to use in a complex development environment.

Wednesday, 12 September 2012

10 things I wish I'd known when I started with tSQLt and SQLTest

Simple Talk have published my article which details some tips and tricks I've learned about unit testing using the tSQLt framework. More details here.

Sunday, 2 September 2012

Agile Database Practice Training

As you will probably know from reading previous entries on this blog, I've become a convert to using Agile approaches to database development. When coupled with Continuous Integration and Test Driven Development, it can really help you to turn around database development quickly.

And it's not just developers who need to be part of this loop. Continuous Deployment and automated best practice testing can help allow DBAs to be responsive to deploying new developments, without huge amounts of extra work. The key advantage of this? You can turn around software developments more quickly - and that means you are more valuable, and your employer can do better in the market. At a the recent SQL in the City conference in London, Red Gate Software's Joint CEO Simon Galbraith gave a very impressive overview of how Agile approaches had helped Red Gate to achieve better market adoption and a product better tuned to market need by adopting Agile approaches.

The authors of the tSQLt database unit testing framework (which is what I use for database unit testing and test driven development) are running a course on how to get started. This covers everything from how to use tSQLt unit testing to write tests to how to adopt this within a CI environment and fit Agile development into the development cycle. The course follows the US tour of SQL in the City and you can find more details (and book your place!) on this website. It's not free, but should provide great business value to anyone looking to adopt Agile practices within database development, or as part of a wider development team. The course also includes a license for Red Gate's Source Control software, as well as an automation license and a discount on SQL Test - together worth $1235 - which gives you the tools necessary to get started.

It looks like a great course and one I wish had been held after the London SQL in the City event - the session that Dennis and Sebastian presented was a great introduction to tSQLt as a test framework.

Monday, 23 July 2012

Database CI the easy way
- A first look at Red Gate's new TeamCity plugin

Update (June 2014): Since this article was written, the folks at Red Gate have announced a new version of the plugin, which enhances and speeds up the setup process. You can read more about it here, and thanks to Alex for the comment below alerting me to it.

I was recently tipped the nod by David Atkinson of Red Gate that they have a new plugin for the TeamCity Continuous Integration system in Early Access Preview. The plugin allows GUI integration of SQL Compare with TeamCity build steps – a great improvement for those starting out with TeamCity who may not be familiar with how to set it up, and who might find command line steps a little daunting.

This means that this step in my earlier article at Simple Talk can now be streamlined and made much easier to use; this also applies to the steps I demonstrated at the recent SQL In The City conference in London.

Now, this new plugin doesn’t do things that you couldn’t do with the command line, but it does make the whole process easy to use, as well as easier for others to maintain. To me, this is a key part of getting people to use the system - if it is easy and well supported then you can more easily convince developers and managers that it can be adopted. It also makes your initial setup easier and avoids a lot of configuration errors! Of course, the nice interface doesn't prevent you using the command line if your needs are a little more bespoke.

So, let's have a look at the plugin then. Installation is manual, but easy – just drag and drop the plugin into the TeamCity Plugins directory. There’s some simple instructions included to help you if you’re not too sure.

One thing I should mention – you need to be on version 7.x of TeamCity. My test area wasn't, and I had to upgrade – but if you’re new to TeamCity you’ll probably be downloading and using the latest version.

As an example, let’s take an existing build configuration and update it to the new way of doing things.

So, my existing build steps (actually within a template, but the steps are the same whether you use templates or not) are:

It's the second step, "Update database schema" which we are looking at. The actual step currently has the command line :

"C:\Program Files (x86)\Red Gate\SQL Compare 10\SQLCompare.exe" /scripts1:"%teamcity.build.checkoutDir%\Database" /database2:%DBName% /server2:"%SQLserver%" /ignoreparsererrors /sync

which I am sure you will agree is a little ugly.

So let's create a new step (the Add Build Step link) and select the new build runner type of Red Gate (SQL Server):

This gives us a nice screen into which we can put the parameters previously used. I've used parameters here (which are defined as TeamCity build parameters) - but if you're not using parameters, standard file paths / server & database names are fine.

Note - I've used Windows Authentication here - I'd recommend you do the same if possible. That uses the Build Agent's credentials to authenticate to SQL Server. If you choose SQL Server Authentication, your access credentials will be stored and appear in the build log in plaintext, which isn't ideal. Red Gate assure me they are aware of this issue and are looking at how to resolve it.

So, having created the new step, I then disabled the previous step (there's nothing like having a simple rollback solution!) and re-ordered the build steps so the new one would execute at the same point in the build that the now disabled version had:

It's worth noting that the plugin helpfully makes the parameters used visible at this summary screen, which certainly helps to aid visibility of the process.

At this point, I ran a build, so that I can check the process is working. The good news, of course, is that it did work.

Now, if we go to the build log of the process, we can see a little more detail:

A few things to note from this build log extract:

The command line used is output as part of the build (highlighted above). This is great for debugging if things aren't quite as you expect, and gives a great starting point for your own tweaks to the process.
The actual copy of SQL Compare used is unique to the plugin - it's not using the same version as if I'd launched it from the command line. This would be worth exploring if you're planning a version upgrade, and worth being aware of if troubleshooting. In this case the plugin is actually using a more up to date version than I have installed on my test system.
Red Gate's Automation License is being used for this tool. As you will see from the screenshot above though, the tool comes with a free 14 day trial period.

As you can see this is a simple process to source control a database, and one that I can see being a great help in quickly getting up and running with continuous integration for databases, whether in a database-only build (as this simple example), or part of a bigger application test process.

I hope that Red Gate can expand this plug in to cover steps like clearing out the existing database (perhaps a tick box in the current implementation?) - this would cover my first step, which I think most implementations will want in order to give a predictable build. Another useful step would be to integrating with SQL Data Compare to load source controlled data into the database, which a number of implementations use. Ideally, a mature plugin would also run any SQL Test (i.e. tSQLt) tests defined in the database schema, and load the test results back in to TeamCity. This would then form a complete Continuous Integration solution for databases, without any need for the command line steps, and so allow easy adoption and maintenance.

All in all I'm very pleased with the work that Red Gate have put in to producing an easy to use tool here; it's something that I can see fulfilling the majority of needs, and expanding over time to do more common Database CI tasks.

Why not drop Red Gate a line to give it a go yourself?

Monday, 16 July 2012

SQL In The City - A first time speaker's experiences

I have (with a colleague) recently presented at SQL In The City (Red Gate’s industry conference) in London. This was the largest audience to which I have thus far presented, and I thought I’d write up my thoughts on how it came about, how we prepared, and how it went.

Disclaimer – I’ve not had the session feedback yet – so this post is from my perspective!

Firstly, I’ll outline a little of the background to my session. I wrote an article for Red Gate’s community web site Simple Talk in February 2012. This reflected a system we had implemented at work, and was positively received.

Fast forward 4 months. I was intending to attend the SQL In The City conference on the Friday, and had booked the day off work. Imagine my surprise when I received an email from Michael Francis and David Atkinson at Red Gate, inviting me to speak on Continuous Integration at this very conference.

So, I first of all took a moment to talk to my colleagues and other members of my local User Group. This was very helpful. I was cheered that the general thought was that I could do this, but given a word of caution by the user group leader, that I should think about how much time I was to be speaking for, and how much that time might be worth.

As an example, an audience of 150 * 1 hour * £20 per hour = £3,000 – was my talk worthy of that much time?

Well, after much deliberation I decided to do it – co-presenting with a colleague, Annette Allen. Annette had been involved in the implementation from the beginning, and this gave us the chance to cover both the developer experience and the system set-up and implementation side.

This meant that we had only 6 weeks to write the presentation, practice it, get it approved by Red Gate, and generally polish it. We the had a kind offer to do a ‘preview’ of the session at the next User Group session – which we wanted to do as it would give us some (largely) unbiased feedback. So that gave us 3 weeks to write the presentation, practice it, and get comfortable with the content. Oh, and we had to keep up the day job, too!

I’d like to thank my employer for their open and supportive attitude towards our doing this presentation, which really helped.

As you will have gathered, given that I am writing this, we made it. We had useful feedback from the User Group session, and from Red Gate’s product managers, which we incorporated into the presentation.

So, what of the event itself? SQL In the City was a 2 day event in London (It’s on tour in the US this summer), and hosted about 600 people over those two days. I turned up the evening before, to find a great atmosphere as Red Gaters and Speakers helped get the place set up, then headed out for a great meal at a local restaurant.

Come the morning of the event, we had an early start – at the venue for 0730 (and Robin Williams was right about what the ‘O’ stands for). There were many great sessions – some solo, from speakers such as Steve Jones and Grant Fritchey; and others a double act, like Sebastian Meine and Dennis Lloyd (on Unit Testing), or Annette and myself. I was pleased to get a chance to enjoy some of these sessions in the run up to my session.

When it actually came time to speak, I have to say I was initially nervous (and perhaps relied on my notes a little much), but everyone there was very supportive, and I soon relaxed into it. We ran close to time both days – so I apologise if we didn’t get a chance to answer your questions; both Annette and I were kept busy with questions all through subsequent breaks! (If you have more – do get in touch).

Quite aside from the presentations – at which I hope everyone learns something – the breaks gave a great opportunity for asking questions, learning from the Red Gate staff members, and getting to know your peers. It was great to see people I've known for a while, and I was really pleased to meet lots of people who I had hitherto only conversed with by email or knew from blogs, forums & tweets.

The days ended well, with Beer and Competitions, and all in all the event was very well attended and supported, and most enjoyable. If you went and haven’t sent in your feedback form yet – please do so (http://www.red-gate.com/sitcfeedback). I hope to see many of you at the event next year.

Thursday, 17 May 2012

Killing processes

There's a fairly standard way of killing out of control or overrunning processes - identify the culprit, work out if it can be killed, and let it roll back the transaction. But what do you do if this doesn't roll back quickly?

Having not had this immediately to the forefront of my mind today, I've collated some information on killing and rollback of connections.

To identify the blocking process SPID:

EXEC sp_who2

See what's long running or blocking, assess it, then if appropriate kill off the culprit with the command:

KILL XX

(Where XX is the ID of the SPID)

Great, but then it rolls back.
So, how can you tell how long that will take?

select wait_type, wait_time, percent_complete from sys.dm_exec_requests where session_id = XX

Alternatively

KILL XX WITH STATUSONLY

This latter query also gives you an estimation of the time remaining for the rollback to complete.

If you need to run it more urgently, you're limited to doing a dirty rollback.

This can be accomplished by connecting to the database and setting it to single user mode with rollback immediate. Note however this leaves the data in an uncertain state and therefore is NOT recommended for production databases, as it is a "dirty" rollback.

If you needed to do this (e.g. if a rollback has hung) on a production server you could set the database offline with rollback immediate and then restore the database from backups.

Rollback is mostly single-threaded, which means what a rollback can take significantly longer than the query was running before the rollback was issued.

Rollback can also take longer because the SQL may use cached plans for the initial query but be unable to use them for rollback.

Restarting the server isn't always the answer either - and in any case is a very drastic step on a production server! (It also drops the connection which can be problematic for determining remaining time).

See this MSDN blog post for more information.

Note that if an external process is 'killed' then it may not rollback all the way. To avoid this, try to kill the external process at the OS level.

There is some discussion of that in this MSDN social post.

This connect article has more information on what to do about unresponsive rollbacks.

Of course the ideal is that we don't get into this situation in production - and code reviews etc help here, but the world I inhabit isn't always perfect.

Tuesday, 13 March 2012

NULL values - NOT IN, or not?

I've recently come across a piece of behaviour which surprised me around the IN operator, or more specifically the NOT IN variant of it.

So, the usual behaviour of this is to give us everything which is in one list but not another:

/* Example Script 1 - "NULL values - NOT IN, or not?"
Dave Green, http://d-a-green.blogspot.com */
/* Create main table */
CREATE TABLE #mya (aa INT)
/* Insert sample values */
INSERT #mya (aa) VALUES (1),(2),(3),(4),(5)
/* Create comparison table */
CREATE TABLE #myb (aa INT)
INSERT #myb(aa) VALUES (6)
/* Run the select, 5 rows returned; no rows excluded */
SELECT * FROM #mya WHERE aa NOT IN(SELECT aa FROM #myb)
/* Clearup */
DROP TABLE #mya
DROP TABLE #myb

This returns the five results:

aa
-----------
1
2
3
4
5
(5 row(s) affected)

So far, so good.

So, what if the table #myb (the comparison table) contains no values?

Well, that works well too:

/* Example Script 2 - "NULL values - NOT IN, or not?"
Dave Green, http://d-a-green.blogspot.com */
/* Create main table */
CREATE TABLE #mya (aa INT)
/* Insert sample values */
INSERT #mya (aa) VALUES (1),(2),(3),(4),(5)
/* Create comparison table */
CREATE TABLE #myb (aa INT)
/* Run the select, 5 rows returned; no rows excluded */
SELECT * FROM #mya WHERE aa NOT IN(SELECT aa FROM #myb)
/* Clearup */
DROP TABLE #mya
DROP TABLE #myb

Which gives us :

aa
-----------
1
2
3
4
5
(5 row(s) affected)

This is all well and good. But what if the table DOES contain a row, but the value in the field in that row is null?

/* Example Script 3 - "NULL values - NOT IN, or not?"
Dave Green, http://d-a-green.blogspot.com */
/* Create main table */
CREATE TABLE #mya (aa INT)
/* Insert sample values */
INSERT #mya (aa) VALUES (1),(2),(3),(4),(5)
/* Create comparison table */
CREATE TABLE #myb (aa INT)
INSERT #myb(aa) VALUES (NULL)
/* Run the select */
SELECT * FROM #mya WHERE aa NOT IN(SELECT aa FROM #myb)
/* Clearup */
DROP TABLE #mya
DROP TABLE #myb

Which gives us:

aa
-----------
(0 row(s) affected)

This is unexpected. It's actually occurring because the comparison between the null in the record, and the values in the table results in UNKNOWN, which isn't TRUE or FALSE.

As BOL puts it

Any null values returned by subquery or expression that are compared to test_expression using IN or NOT IN return UNKNOWN. Using null values in together with IN or NOT IN can produce unexpected results.

This seems to me to be an understatement!

Now, this behaviour isn't so bad if the only record in the set is null - that seems unlikely. But what if you have only one bad record in a series of otherwise good ones?

/* Example Script 4 - "NULL values - NOT IN, or not?"
Dave Green, http://d-a-green.blogspot.com */
/* Create main table */
CREATE TABLE #mya (aa INT)
/* Insert sample values */
INSERT #mya (aa) VALUES (1),(2),(3),(4),(5)
/* Create comparison table */
CREATE TABLE #myb (aa INT)
/* Insert 95 integer records */
DECLARE @i INT
SET @i=6
WHILE @i < 100
BEGIN
INSERT #myb(aa) VALUES (@i)
SET @i +=1
END
/* Insert one null record */
INSERT #myb(aa) VALUES (NULL)
/* Run the select, 5 rows returned; no rows excluded */
SELECT * FROM #mya WHERE aa NOT IN(SELECT aa FROM #myb)
/* Clearup */
DROP TABLE #mya
DROP TABLE #myb

Again, we get no results:

aa
-----------
(0 row(s) affected)

You can try simply commenting in or out the line which inserts the null value into #myb, and prove the difference.

Clearly, this means that if you're using NOT IN, and there is any possibility that the values you are comparing with could be NULL, then you should consider this. Alternatively, set the columns you are referencing as NOT NULL to avoid this.

What happens if we put a null value in the main table (#mya in the above examples)? The null record is never returned, whether there is a null in the comparison (#myb) table or not. And it's worth noting that this last point is true for IN and NOT IN.

So, what happens if we're not using a table but rather a set of values?

/* Example Script 5 - "NULL values - NOT IN, or not?"
Dave Green, http://d-a-green.blogspot.com */
/* Create main table */
CREATE TABLE #mya (aa INT)
/* Insert sample values */
INSERT #mya (aa) VALUES (1),(2),(3),(4),(5),(null)
/* Get results */
SELECT * FROM #mya WHERE aa NOT IN (1,2,null)
SELECT * FROM #mya WHERE aa IN (1,2,null)
/* Clearup */
DROP TABLE #mya

We get :

aa
-----------
(0 row(s) affected)

aa
-----------
1
2
(2 row(s) affected)

So this isn't quite the same behaviour we saw with tables; here, the not in null means that everything evaluates as false (as before), but the in is explicitly looking for true matches - so with the exception of the comparison between null and null, we get the expected 2 records.

It's worth noting that the behaviour experienced can change when the ANSI_NULLS setting is changed, but that this is mandated ON in a future SQL server release (per the note in BOL)

In summary, use caution when looking at IN with NULLable fields or values.

Further reading : http://stackoverflow.com/questions/129077/sql-not-in-constraint-and-null-values