Concatenating Strings in SQL Server
Updated: Nov 11, 2020
Recently I needed to run a long batch inside a string using sys.sp_executesql. So I declared a variable with the NVARCHAR(MAX) data type, assigned the text of the batch into it, and tried to execute it. When I assigned the text into the variable, I used the concatenation of a few strings. Something like that:
DECLARE @Statement AS NVARCHAR(MAX);
SET @Statement = N’Long String…’ + @Variable + N’Another Long String…’ + …
I have probably done that a million times before. But this time I received an error. The error resulted from the fact that my string was truncated. The length of my string should have been more than 10,000 bytes. I checked the length of the @Statement variable using the DATALENGTH function, and it turned out to be only 8,000 bytes.
I guess it’s the first time that I’m trying to assign such a long string to an NVARCHAR(MAX) variable. At first, I couldn’t figure it out. What’s the whole point of the NVARCHAR(MAX) data type, if such a variable cannot contain more than 8,000 bytes? It doesn’t make sense…
I thought that maybe there was something about my text that caused this strange behavior, so I decided to try something much simpler:
DECLARE @String AS NVARCHAR(MAX);
SET @String = REPLICATE (N’X’ , 5000);
SELECT DATALENGTH (@String);
I expected the length to be 10,000 bytes, but it was 8,000 bytes again. At this point, I started to get nervous. It says “MAX”, doesn’t it?
I checked the value of @@TEXTSIZE, but it was set to the maximum value (2147483647). I played with it for a while, and after some (wasted) time, I finally understood what’s going on. So here it is…
Every expression in SQL Server has a data type. This includes a column in a table, a variable, a returned value from a scalar function, and even a literal, such as the expression N’ABCD’. In the case of the first three types of expressions, the data type is well defined. But in the case of a literal, it’s a little less clear what the data type should be. SQL Server has its own rules as of what data type to assign to a literal. So what’s the data type of N’ABCD’?
I can think of several options that would make sense, but according to SQL Server’s rules, the data type of N’ABCD’ is NVARCHAR(4). As long as the number of characters (n) in the string literal is less than or equal to 4,000, the data type would be NVARCHAR(n). If the number of characters is greater than 4,000, the data type would be NVARCHAR(MAX). So far it makes sense, doesn’t it?
Now, let’s talk about concatenation. What do you think would be the data type of the following expression?
N’ABCD’ + N’EFG’
Correct! It’s NVARCHAR(7). Everything is making sense. Isn’t it great?
Now, let’s complicate things just a little bit. Suppose you have an expression that is a concatenation of two string literals – one of them contains 3,000 characters and the other contains 2,000 characters. Something like that:
What would be the data type in this case? Well, it can’t be NVARCHAR(5000), so it should be NVARCHAR(MAX), right? Wrong! The data type of this expression is NVARCHAR(4000), and the last 1,000 characters simply get truncated. So if you assign this expression to an NVARCHAR(4000) variable, the value of the variable would be 3,000 X’s and 1,000 Y’s. No errors, no warnings.
If you assign it to an NVARCHAR(MAX) variable, then the following things will happen. First, SQL Server will perform the concatenation and assign the data type NVARCHAR(4000) to the expression (and truncate the last 1,000 characters). Then, SQL Server will implicitly convert the NVARCHAR(4000) expression to NVARCHAR(MAX) and assign it to the variable. So the type of the variable would be NVARCHAR(MAX), but its value would still be 3,000 X’s and 1,000 Y’s.
This explains the behavior that I originally experienced. Since I concatenated a few long strings, the result was truncated to accommodate 4,000 characters and then assigned to my NVARCHAR(MAX) variable. That’s great, isn’t it?
So how do we solve this problem? If we concatenate an NVARCHAR(MAX) expression with any other string expression, the result would be NVARCHAR(MAX). It doesn’t matter how many expressions we want to concatenate. As long as at least one of them would be NVARCHAR(MAX), the result would be NVARCHAR(MAX) as well. So all we need to do is to explicitly convert one of the literals to NVARCHAR(MAX), like this:
The data type of this expression is NVARCHAR(MAX), and its length is 5,000 characters, as expected. Not so pretty, but it works.
Remember the REPLICATE (N’X’ , 5000) example? Why is the data type of this expression NVARCHAR(4000) and not NVARCHAR(MAX)? Well, the answer is simple – The REPLICATE function simply concatenates the same expression the requested number of times. It’s just like the following expression:
N’X’ + N’X’ + N’X’ + …
And we already know what happens here…
By the way, if you try to print an NVARCHAR(MAX) expression using the PRINT statement, it will also truncate the value to 4,000 characters, even if the expression is a true NVARCHAR(MAX) with more than 4,000 characters. This has nothing to do with the explanation given above. It happens because the PRINT statement can’t handle more than 4,000 Unicode characters or 8,000 non-Unicode characters. So if it’s given an NVARCHAR(MAX) value, it truncates it to NVARCHAR(4000). Again, no errors, no warnings.
This is why I use my own version of PRINT, which is a stored procedure called “PrintMax”. You can read all about it and download the code from here.