Tuesday, November 15, 2011

Convert Comma Separated String to Table : 4 different approaches

Recently, I came across a piece of TSQL code that would take a comma separated string as an input and parse it to return a single column table from it.

Lets’ call this function as Split1. The code is as follows:



CREATE FUNCTION [dbo].Split1(@input AS Varchar(4000) )

RETURNS

@Result TABLE(Value BIGINT)

AS

BEGIN

DECLARE @str VARCHAR(20)

DECLARE @ind Int

IF(@input is not null)

BEGIN

SET @ind = CharIndex(',',@input)

WHILE @ind > 0

BEGIN

SET @str = SUBSTRING(@input,1,@ind-1)

SET @input = SUBSTRING(@input,@ind+1,LEN(@input)-@ind)

INSERT INTO @Result values (@str)

SET @ind = CharIndex(',',@input)

END

SET @str = @input

INSERT INTO @Result values (@str)

END

RETURN

END

This is a very old fashioned (but still effective enough) script which does a loop over a string to cut out all possible string values that are separated by a comma.

Let’s see now, how the same could be achieved in modern day TSQL languages (such as SQL 2005 or SQL 2008).

Approach 1: Common Table Expression (CTE)



Lets call this function as Split2. here we are using

CREATE FUNCTION dbo.Split2 ( @strString varchar(4000))

RETURNS @Result TABLE(Value BIGINT)

AS

begin

WITH StrCTE(start, stop) AS

(

SELECT 1, CHARINDEX(',' , @strString )

UNION ALL

SELECT stop + 1, CHARINDEX(',' ,@strString , stop + 1)

FROM StrCTE

WHERE stop > 0

)



insert into @Result

SELECT SUBSTRING(@strString , start, CASE WHEN stop > 0 THEN stop-start ELSE 4000 END) AS stringValue

FROM StrCTE



return

end



GO



Approach 2: XML (surprise)

XML could be applied to do some type of string parsing (see this) Let’s call this function as Split3.



CREATE FUNCTION dbo.Split3 ( @strString varchar(4000))

RETURNS @Result TABLE(Value BIGINT)

AS

BEGIN



DECLARE @x XML

SELECT @x = CAST(''+ REPLACE(@strString,',','')+ '' AS XML)



INSERT INTO @Result

SELECT t.value('.', 'int') AS inVal

FROM @x.nodes('/A') AS x(t)



RETURN

END

GO



Approach 4: Classic TSQL Way

I got this approach from SQL Server Central site. This approach is slightly unusual but very effective. this needs you to create a table of sequential numbers called a Tally Table.



SELECT TOP 11000 --equates to more than 30 years of dates

IDENTITY(INT,1,1) AS N

INTO dbo.Tally

FROM Master.dbo.SysColumns sc1,

Master.dbo.SysColumns sc2



Lets index the table for better performance.



ALTER TABLE dbo.Tally

ADD CONSTRAINT PK_Tally_N PRIMARY KEY CLUSTERED (N)

WITH FILLFACTOR = 100



Finally out Split4 function.





CREATE FUNCTION dbo.Split4 ( @strString varchar(4000))

RETURNS @Result TABLE(Value BIGINT)

AS

BEGIN



SET @strString = ','+@strString +','



INSERT INTO @t (Value)

SELECT SUBSTRING(@strString,N+1,CHARINDEX(',',@strString,N+1)-N-1)

FROM dbo.Tally

WHERE N < LEN(@strString) AND SUBSTRING(@strString,N,1) = ',' --Notice how we find the comma RETURN END GO


BEST OF ALL

DECLARE @str VARCHAR(4000)

= '6,7,7,8,10,12,13,14,16,44,46,47,394,396,417,488,714,717,718,719,722,725,811,818,832,833,836,837,846,913,914,919,922,923,924,925,926,927,927,928,929,929,930,931,932,934,935,1029,1072,1187,1188,1192,1196,1197,1199,1199,1199,1199,1200,1201,1202,1203,1204,1205,1206,1207,1208,1209,1366,1367,1387,1388,1666,1759,1870,2042,2045,2163,2261,2374,2445,2550,2676,2879,2880,2881,2892,2893,2894'

Declare @x XML

select @x = cast(''+ replace(@str,',','')+ '' as xml)



select t.value('.', 'int') as inVal

from @x.nodes('/A') as x(t)