T-SQL Tuesday #136 Blog About Your Favorite Data Type
T-SQL Tuesday is the brainchild of Adam Machanic (Blog | Twitter). December 2009 was the first T-SQL Tuesday invitation that went out by Adam. It is a monthly blog party on the second Tuesday of each month. Currently, Steve Jones (Blog | Twitter) organises the event and maintains a website with all previous posts which you can find here. Everyone is welcome to participate in this monthly blog post.
What to choose!
So I left it late and boy am I not prepared! This tweet came out last night;
And this is how I felt getting started with my post last night because…yknow…OMG BRENT WILL READ MY POST!!!. I had intended to start earlier last week, but I have been revising for a certification, left it until Monday evening and the pressure was on. So in preparing for this, I went and had a look at the documentation around data types. If you’ve never had a read, it’s worth setting some time aside to understand characteristics of the various data types available to you. It helps inform decisions on the correct and appropriate data type for your needs.
The data type I dislike
The data type I dislike the most is the one that wasn’t considered. There are many times where we as developers don’t have sufficient info to critically design database structures correctly however, oftentimes I see issues that could have been avoided with a little more care. Here’s some examples I’ve seen recently;
VARCHAR = INT
LEFT([col],4) = 1234
A column stored as VARCHAR, using LEFT to strip all but the first 4 digits will still be a VARCHAR even if the values returned are numeric. Because of the way in which SQL handles datatype precedence , each value will be implicitly converted before being compared to 1234. Here’s how it should have looked;
LEFT([col],4) = '1234'
VARCHAR stores an extra 2 bytes that holds the width of the record. The way in which SQL stores fixed and variable width columns means not only are you adding additional bytes for no good reason, it is also less efficient for SQL to search on those records. Here’s how it should have looked;
INT for flags
INT often appears to be the go to datatype for many design choices but it can often be really unecessary given there are both SMALLINT and TINYINT options. Using INT for flags that have just a few options (or worse still just 1 or 0!) is an unecessary overhead.
Save the planet
We’re in the middle of a climate crisis right? All of these simple issues can result in additional CPU cycles, excess memory use, excess data storage, network traffic…all unecessary effort. I challenge you to do your bit to reduce this unecessary work and in turn, reduce your carbon footprint! Treat it as a small contribution to climate change :)
The data type I like
Ok, so this one is a bit tenuous because it’s only partly about data types but bear with me, it’s worth it! The data type I’m going to put forward is VARBINARY. Some time ago I fell down a rabbit hole exploring the inner workings of the ENCRYPTBYKEY function in SQL as part of some work exploring options for securing sensitive information. ENCRYPTBYKEY returns VARBINARY with a maximum size of 8,000 bytes (that’s as far as me following the topic goes).
I was referred to this MSDN article from 2009. In the article, it outlines the parts that make up the algorithm, bit heavy…but the part I wish to draw your attention to is this;
InnerMessageHeader := MagicNumber + IntegrityBytesLength + PlaintextLength
The first part of the InnerMessageHeader consists of a magic number, which is a fixed value that is used to ensure the message format is valid. The number used for this algorithm is 3131961357, the hexidecimal representation of which is 0xBAADF00D which reads “bad food”.
This led to me discovering Hexspeak and in turn, a list of the finest magic numbers, which can be seen here . My absolute favourite has to be Open Solaris’ core dumps that use the code 0xDEFEC8ED (“defecated”) AMAZING!!!