Unit Testing Stored Procedure ETL SQL Server03 Aug 2017
In many application development teams, unit testing is a non negotiable, but the SQL world still seems a long way behind when it comes to unit testing. Regardless of platform, when you have code, and that code is manipulating data, unit testing offers the same benefits as it does in the application world.
When you have lots of complex Stored Procedures, making one minor change to a table may affect other parts of the code that you were not expecting, unit testing gives you the confidence that all is well. After merging code from a development server into a test or live environment, running the unit test suite can give you confidence of a successful deployment. You can create static code analysis unit tests to monitor for occurrences of things that could cause problems, for example if your system is based on UTC dates everywhere, you could check for the existence of GetDate() as opposed to GetUtcDate() Like the above, you could also test code quality, for example, that your index names follow a particular convention. New developers on a project can get started much quicker due to the extra confidence the unit tests give them that they have not broken anything, and also the tests assist in understanding the intent of the code they are testing It makes it possible to develop in a TDD style, and also allows one to develop against a test data set created for the purpose, rather than relying on whatever data happens to be in your development environment, which may or may not exercise all conditions it needs to. Developing against unit tests drives a good specification and requirements, and asking important questions earlier on in the development process So, if all this makes sense, why are so many places not doing this?
I think its down to a few reasons;
Since most ETL database code will be manipulating data in tables and moving it from one place to another, its common that development is done in a shared development environment with existing data from a test source system, to feed the procs with test data and ensure a test is re-runnable and not fragile, the data needs isolating. This is difficult when the product, in this case SQL server does not have the concepts one would use in the application world to achieve this, for example, dependancy injection. It is however, possible to do this using the tSQLt framework, by using the FakeTable functionality, which essentially copies your table into a new blank table that you can insert data into. The tests all run as apart of a transaction, thus any changes you make during the test execution are all rolled back.
Many ETL and SQL developers don’t have a background in the application development world, and may have never been exposed to unit testing, and therefore may not understand the benefits it brings, as well as how to actually write a good unit test. Often this requires time and experience.
What to test?
So your Proc takes some data from table A, and puts a selection of its columns into table B.. What could possible go wrong?
Whilst some procedures look simple on face value, There’s a bit more than meets the eye, and ETL often does fail. If we have some actual business logic (A Transformation) it may be fairly obvious what to test here, but sometimes we are simply moving data. However, its still a possibility that we have made assumptions about the source data, or joins, and it would be good to cover that with a unit test.
Even if the unit test acts simple as a smoke test, and simply runs the proc, and it fails because a DDL change was made to a source or target table, this is still offering value and allows you to easily verify wether a change you made had any knock on effects elsewhere.
Whilst there are unit testing frameworks out there, often they run in the users IDE, and offer no shared view of results. Its possible to output tSQLt results as a JUnit XML file, which you could then export to a reporting system.
Where to run?
Often teams work on a shared server where changes are all made on a central database, and unit testing may be affected by what another developer is currently working on. It would be good to be able to run tests in a completely isolated environment. This way Developer A can confirm his tests are working, and developer B the same. When it comes to both developers merging their code there could still be failures.
THOUGHTS ON SOLUTIONS
I like to run Version Control on database code (More on that in another article) but this gives us a point at which we can run unit tests, when a developer commits code.
Imagine we have a group of developers working on a project on a shared development server, and once their features are committed, the code from the source control develop branch is used to build a database, and the tests are run on that database, and a nice report is sent out with unit testing results.
This gives us a chance to see how that code affected the project as a whole, for example identify any failures resulting from that code. It makes sense to run these tests outside of the shared development server, as there could be none functional uncommitted code there.
I like the tSQLt framework, its simple to use, and easy for SQL developers to get into since it uses Stored Procedures as tests, and Schemas as test classes, and allows some nice isolation options.
One can use a CI tool to execute the tests, such as Jenkins, and produce a nice unit test report that automatically emails out to the team (As well as identify which developer broke the tests and therefore owes the team a beer 🙂
I’ve written a simple utility that you can kick off from Jenkins or any kind of task scheduler, that will run tSQLt tests, and feed the output into a nice HTML report wherever you choose.