SQL Server Separate a string with SUBSTRING and PATINDEX - last step

yftpprvb  于 2023-11-16  发布在  其他
关注(0)|答案(2)|浏览(118)

I need to finalize a query. The query returns a column which contains values like "P100+P200" or "SUMME(P400:P1200)".

In the end, the result should be:
| Column A | Column B | Column C |
| ------------ | ------------ | ------------ |
| P100 | + | P200 |
| P400 | : | P1200 |

Solved to extract column A and column B.

I used for the first two steps this code:

  1. MAX (SUBSTRING(t3.formel, PATINDEX('%[A-Z][0-9]%', t3.formel), PATINDEX('%[+:-]%', SUBSTRING(t3.formel, PATINDEX('%[A-Z][0-9]%', t3.formel), LEN(t3.formel))) - 1)) "Formelteil 1",
  2. MAX (SUBSTRING(t3.formel, PATINDEX('%[+:.-]%', t3.formel), 1) ) AS Sonderzeichen

But guess I'm going to be blind about the solution for the third step.

lnvxswe2

lnvxswe21#

As mentioned in the comments, this is not really a job for SQL Server.

When asking questions like this it's helpful to provide example DDL/DML:

  1. DECLARE @Table TABLE (formel NVARCHAR(100));
  2. INSERT INTO @Table (formel) VALUES
  3. ('P100+P200'), ('G100/G200'), ('a100*z200'), ('P1005-P2005'), ('SUMME(P400:P1200)');

You're two thirds of the way there. Since we only seem to need to worry about one additional character, we can simply use the position of the operator + 1 to find the start of the last string and use an arbitrary number higher than the remaining characters, and then replace it with nothing:

  1. SELECT t3.formel,
  2. SUBSTRING(t3.formel, PATINDEX('%[A-Z|a-z][0-9]%', t3.formel),PATINDEX('%[-|*|/|+|:]%', t3.formel)-PATINDEX('%[A-Z|a-z][0-9]%', t3.formel)) AS a,
  3. SUBSTRING(t3.formel, PATINDEX('%[-*/+:]%', t3.formel), 1) AS b,
  4. REPLACE(SUBSTRING(t3.formel, PATINDEX('%[-*/+:]%', t3.formel)+1, LEN(t3.formel)),')','') AS c
  5. FROM @Table t3;
formelabc
P100+P200P100+P200
G100/G200G100/G200
a100*z200a100*z200
P1005-P2005P1005-P2005
SUMME(P400:P1200)P400:P1200
展开查看全部
vltsax25

vltsax252#

T-SQL isn't a text manipulation language and doesn't even have regular expressions. It's a lot easier to do this task in a client language, using a regular expression like ([A-Z\d]+)([+:.-])([A-Z\d]+) to capture the three parts.

In the comments you mention the data is used in Power BI. You can use a Python Transformation in the Query editor to apply a regular expression to the data using Pandas' str.exact and automatically extract the parts into columns.

The Power BI step script is essentially a one-liner

  1. import pandas as pd
  2. pattern=r"([A-Z\d]+)([+:.-])([A-Z\d]+)"
  3. dataset[['a','b','c']]=dataset['formel'].str.extract(pattern)

str.extract applies the regular expression to all the values of the formel column (Series) and extracts each capture group into a separate column. dataset[['a','b','c']]= stores those columns in the original dataset using the names a , b and c .

You can easily test Python scripts in the command line or a Jupyter Notebook in VS Code.

The following script, in either Python or VS Code :

  1. import pandas as pd
  2. dataset=pd.DataFrame({'formel':['P100+P400','SUMME(P200:P300)']})
  3. pattern=r"([A-Z\d]+)([+:.-])([A-Z\d]+)"
  4. dataset[['a','b','c']]=dataset['formel'].str.extract(pattern)
  5. dataset

Prints

  1. formel a b c
  2. 0 P100+P400 P100 + P400
  3. 1 SUMME(P200:P300) P200 : P300
展开查看全部

相关问题