I was answering a question on Experts Exchange this morning. A simple question that comes back every so often.
The question was: how do I split the content of a value from a row into multiple rows?
I had to provide 2 answers. One useful only for the lucky who are using Microsoft SQL Server 2016 and above. And another one for the unlucky ones (but limited to SQL 2008).
No downloadable file this month
Since all the code is provided here, I decided not to create a downloadable file this month.
Unnormalized data
This situation can be found many times when working with unnormalized data or data that is imported from files.
You end up with a value in a column containing multiple values. You will need to have a separator in this column to be able to do something with it.
The resulting dataset will be that each value contained in that column will be spread into as many rows as needed bringing every other column with it.
Creating some test data
To test the 2 ways of handling that situation, I will create a very simple table containing 3 columns. Two columns are containing a single value while a third column contains something like a CSV.
DECLARE @data TABLE (
school VARCHAR(3),
testid VARCHAR(2),
grade VARCHAR(50)
)
INSERT INTO @data (
school, testid, grade
)
VALUES ('001', 'A1', '00,01,02')
, ('002', 'A2', '01,02')
, ('003', 'A3', '')
, ('004', 'A4', '1001,1002,1003,1004,2001,2002,2003')
SELECT 'original data', * FROM @data
The result of this query is shown in figure 1.
Figure 1: the test data
The easy solution
If you are using a relatively new version of the tool, read here Microsoft SQL Server 2016 or better (including SQL Azure), you can use the built-in STRING_SPLIT function.
This new function is somewhat easy to use. You specify the column name in the first argument and your separator in the second one. Notice that the separator needs to be a single character.
This query is an example of how we can use it to query the test data created earlier:
SELECT 'using STRING_SPLIT', school, testid, value
FROM @data
CROSS APPLY STRING_SPLIT(grade, ',')
ORDER BY school, value
The STRING_SPLIT function returns a single-column table for each row of your original data. Therefore we need to use a mechanism like CROSS APPLY to join the columns containing a single value to the (possibly) multiple rows returned by the STRING_SPLIT function.
The output is shown in figure 2.
Figure 2: the result of STRING_SPLIT
The other solution
Not everybody has the chance of using (almost) up-to-date versions of SQL Server. That was the situation in which the asker of the question was.
If this is your case too, I have another solution for you that will work if you are using at least SQL Server 2008. If you are using an older version, I would really recommend upgrading as you are out of support!
This solution is using a recursive CTE and the STUFF function (both appeared in SQL Server 2008):
;WITH tmp(school, testid, DataItem, grade) AS
(
SELECT
school,
testid,
CAST(LEFT(grade, CHARINDEX(',', grade + ',') - 1) AS VARCHAR(250)),
CAST(STUFF(grade, 1, CHARINDEX(',', grade + ','), '') AS VARCHAR(250))
FROM @data
UNION ALL
SELECT
school,
testid,
CAST(LEFT(grade, CHARINDEX(',', grade + ',') - 1) AS VARCHAR(250)),
CAST(STUFF(grade, 1, CHARINDEX(',', grade + ','), '') AS VARCHAR(250))
FROM tmp
WHERE grade > ''
)
SELECT 'using a CTE', tmp.school, tmp.testid, tmp.DataItem
FROM tmp
ORDER BY tmp.school, tmp.DataItem
By default, the recursion of the CTE will work up to 100 iterations. If you need more than 100 (if you have more than 100 values separated by your separator, you can add this line to your query:
OPTION (maxrecursion 0)
The output is shown in figure 3.
Figure 3: the result of the recursive CTE
Multi-characters separator
One limitation of the STRING_SPLIT function is that the separator needs to be a single character.
If you ever met a situation in which your separator contains more than one character, you can surely update your data to obtain a single one and thus satisfy the requirement.
If not, my second solution can be easily modified to handle that situation.
To be able to test it, you first need to ensure that your separator is a multi-character one like this:
DECLARE @sep VARCHAR(5) = '(-)'
UPDATE @data
SET grade = REPLACE(grade,',',@sep)
SELECT * FROM @data
Now we have a 3-character separator. The STUFF function now needs to consider the length of the separator like this:
;WITH tmp(school, testid, DataItem, grade) AS
(
SELECT
school,
testid,
CAST(LEFT(grade, CHARINDEX(@sep, grade + @sep) - 1) AS VARCHAR(250)),
CAST(STUFF(grade, 1, CHARINDEX(@sep, grade + @sep)+LEN(@sep)-1, '') AS VARCHAR(250))
FROM @data
UNION ALL
SELECT
school,
testid,
CAST(LEFT(grade, CHARINDEX(@sep, grade + @sep) - 1) AS VARCHAR(250)),
CAST(STUFF(grade, 1, CHARINDEX(@sep, grade + @sep)+LEN(@sep)-1, '') AS VARCHAR(250))
FROM tmp
WHERE grade > ''
)
Results are exactly the same as before.
Performance
I was surprised to see that for this test table containing only 4 rows which is in-memory and has no indexes, there isn’t significant differences between the 2 approaches.
I leave up to you to test with a much larger dataset.
Conclusion
2 ways of solving a frequent question about Microsoft SQL Server to split the content of a columns are presented here. Even a way of working around one of the limitations has been presented.
I hope this article will prove to be helpful to many of you in the future!