I'm doing a migration for a customer and setting up the data (cleaning, filtering, setting up views for mig jobs, etc.) on SQL Server 2017 (SSMS 18).
Now I'm stuck on following problem:
From the source system, I'll get a string containing customer addresses, with a length from 10-120 characters.
The destination system has max. 5 address columns (Address_1, Address_2, etc.) with a length of 35 characters (not changeable).
My first attempt was to simply cut it like Address_1 = 1-35 characters, Address_2 = 35-70, etc.
But the customer is not satisfied with cutting the string in the middle of a word.
Challenge is to cut the string after the last "Space" before exceeding 35 characters and continue like this.
Some example - given a source address
'Cecilia Chapman 711-2880 Nulla St. Mankato Mississippi 96522 (257) 563-7401'
I would like to get:
ADDRESS_1 = 'Cecilia Chapman 711-2880 Nulla'
ADDRESS_2 = 'St. Mankato Mississippi 96522'
ADDRESS_3 = '(257) 563-7401'`
Note: there is no pattern in the source system. It's a free text field, filled from different employees around the world.
Above is sample data from google since I'm not allowed to share address data of my customer.
What I already tried
CASE
WHEN
LEN(REPLACE(REPLACE(REPLACE(OE.ADDR,CHAR(10),' '),CHAR(13),' '),';',',')) > 35
THEN LEFT(REPLACE(REPLACE(REPLACE(OE.ADDR,CHAR(10),' '),CHAR(13),' '),';',','),35 - CHARINDEX(REVERSE(' '), REVERSE(LEFT(OE.ADDR,35))))
ELSE REPLACE(REPLACE(REPLACE(OE.ADDR,CHAR(10),' '),CHAR(13),' '),';',',')
END AS ADDRESS1,
CASE
WHEN
LEN(REPLACE(REPLACE(REPLACE(OE.ADDR,CHAR(10),' '),CHAR(13),' '),';',',')) > 35
THEN SUBSTRING(REPLACE(REPLACE(REPLACE(OE.ADDR,CHAR(10),' '), CHAR(13),' '), ';', ','),37 - CHARINDEX(REVERSE(' '), REVERSE(LEFT(OE.ADDR, 35))), 35-CHARINDEX(REVERSE(' '), REVERSE(SUBSTRING(OE.ADDR, 37 - CHARINDEX(REVERSE(' '), REVERSE(LEFT(OE.ADDR, 35))), 35))))
END AS ADDRESS2,
CASE
WHEN
LEN(REPLACE(REPLACE(REPLACE(OE.ADDR,CHAR(10),' '),CHAR(13),' '),';',',')) > (35 - CHARINDEX(REVERSE(' '), REVERSE(LEFT(OE.ADDR,35)))) + (35-CHARINDEX(REVERSE(' '), REVERSE(SUBSTRING(OE.ADDR,35 - CHARINDEX(REVERSE(' '), REVERSE(LEFT(OE.ADDR,35))),35))))
THEN
SUBSTRING(REPLACE(REPLACE(REPLACE(OE.ADDR,CHAR(10),' '),CHAR(13),' '),';',','), (36 - CHARINDEX(REVERSE(' '), REVERSE(LEFT(OE.ADDR,35)))) + (36-CHARINDEX(REVERSE(' '), REVERSE(SUBSTRING(OE.ADDR,37 - CHARINDEX(REVERSE(' '), REVERSE(LEFT(OE.ADDR,35))),35)))) ,35 - CHARINDEX(REVERSE(' '), REVERSE(SUBSTRING(OE.ADDR, (36 - CHARINDEX(REVERSE(' '), REVERSE(LEFT(OE.ADDR,35)))) + (36-CHARINDEX(REVERSE(' '), REVERSE(SUBSTRING(OE.ADDR,37 - CHARINDEX(REVERSE(' '), REVERSE(LEFT(OE.ADDR,35))),35)))),35))))
ELSE
''
END AS ADDRESS3,`
I also tried to split it and thought about some "WHILE < 35" to put it back together, but the whole thing is giving me a headache.
SELECT ROW_ID, OE.ADDR, VALUE
FROM IDTB_ORG_EXT OE
CROSS APPLY STRING_SPLIT(OE.ADDR, ' ')
ORDER BY LEN(OE.ADDR) DESC
3条答案
按热度按时间tnkciper1#
You're likely going to need to achieve this with recursion, where you'll need to "loop" through each individual string, aggregating the length and "resetting" with you hit a value greater than 35. For a single value this won't be awfully slow, but if you need to do this on a large data set, it will likely be quite (awfully) slow.
Using this method gives you the following:
I use
STRING_SPLIT
and the ordinal parameter here, however, it was then noted that you are using 2017 after I wrote the solution in my environment, which does not support the ordinal parameter. You will need to use a different splitter function that provides oridinal positions, such as a JSON splitter, ordelimitedsplit8K_LEAD
.Note that this solution does not give you the expected results, as your description and the result you expect don't match. The result this gives is (with the value lengths added):
| Value | Value Length |
| ------------ | ------------ |
| Cecilia Chapman 711-2880 Nulla St. | 34 |
| Mankato Mississippi 96522 (257) | 31 |
| 563-7401 | 8 |
If you need these in columns, rather than rows, you'll need to then pivot the data. I use conditional aggregation here:
iyfjxgzm2#
When asking questions like this providing example DDL/DML is important. Failing to do so will directly affect the quality of answer you get, as well as the votes your question receives. I prefer to use table variables for this, as there's less to clean up. Consider:
Text/String manipulation is not something a RDBMS is really good at, but in a pinch it can be done. Consider:
There's a few things going on here. First we're grabbing the left 35 characters (per your description) and then reversing them to look for the last (but first in the reversed string) instance of a white space. Once we know where that is we know where to terminate the first split. Then we use the same technique, but this time to take all the characters right of that position. Finally we look for a open paren in the remaining (right side) string so we can also split out the phone number.
6jygbczu3#
Without using SPLIT_STRING, I came up with this. You can pass in the input string, the delimiter to split the string, and the cut off length that each substring cannot exceed. You can also pass in a label if you don't want the output rows to get labelled as "address n".
to run
or CROSS APPLY it to a table of existing addresses/string