SQL Server中查询中的INITCAP

lkaoscv7  于 2023-02-11  发布在  SQL Server
关注(0)|答案(3)|浏览(172)

使用SQL Server 2016,我需要以某种方式清除空白并实现INITCAP。
空白清除器很简单。我在替换INITCAP时遇到了问题。
Initcap equivalent in mssql的公认答案是错误的,正如第一条评论中所指出的。
我的数据包含一行中有多个空格和特殊字符(&、%等)的值。
stuff():在SQL Server 2016中,string_split没有证明序数值的选项,也不保证结果以任何特定顺序返回。因此,我需要编写代码以确保值以正确顺序从split_string返回。
convert(xml,...):解码大多数XML编码值。
convert(varchar(max),...):...因为在需要SELECT DISTINCT时无法使用XML
SQL Fiddle

with T as (
  select *
  from (
  values ('Minesota Mining and   Manufacturing')
  , ('Minesota Mining & Manufacturing   ')
  , (' tillamook')
  , ('MUTUAL OF OMAHA')
  , ('   ')
  ) q(s)
),
scrubbed as (
  select T.s as InitialValue
  , CASE 
      WHEN LEN(RTRIM(T.s)) > 0
        THEN LTRIM(RTRIM(T.s))
    END as s
  from T
)
select distinct s.InitialValue
, stuff(
    (
      SELECT ' ' + t2.word
      from (
          select str.value
          , upper(substring(str.value, 1, 1)) + 
            case when len(str.value) > 1 then lower(substring(str.value, 2, len(str.value) - 1)) else '' end as word
          , charindex(' ' + str.value + ' ', ' ' + s.s + ' ') as idx
          from string_split(s.s, ' ') str
        ) t2
      order by t2.idx
      FOR XML PATH('')
    ), 
    1, 
    1, 
    ''
  ) as INITCAP_xml
, convert(
    varchar(max), 
    convert(
      xml, 
      stuff(
        (
          SELECT ' ' + t2.word
          from (
              select str.value
              , upper(substring(str.value, 1, 1)) + 
                case when len(str.value) > 1 then lower(substring(str.value, 2, len(str.value) - 1)) else '' end as word
              , charindex(' ' + str.value + ' ', ' ' + s.s + ' ') as idx
              from string_split(s.s, ' ') str
            ) t2
          order by t2.idx
          FOR XML PATH('')
        ), 
        1, 
        1, 
        ''
      )
    )
  ) as INITCAP_decoded
from scrubbed s

在输出中可以看到,使用FOR XML会导致一些字符被编码(如[space]= $#x20;和&= &),通过转换为XML数据类型,这些字符中的一些被解码,但一些字符(如&)仍然是编码的。
| 初始值|初始化CAP_尝试1|初始化CAP_xml|INITCAP_已解码|
| - ------|- ------|- ------|- ------|
| Minesota Mining and Manufacturing|x1米11米1x|Minesota Mining And Manufacturing|Minesota Mining And Manufacturing|
| Minesota Mining & Manufacturing|Minesota Mining & Manufacturing|Minesota Mining & Manufacturing|Minesota Mining & Manufacturing|
|  tillamook|Tillamook|x1米20英寸1x|Tillamook|
| MUTUAL OF OMAHA|Mutual Of Omaha|Mutual Of Omaha|Mutual Of Omaha|
|    |* 无效 | 无效 | 无效 *|
REPLACE(s, '&', '&')似乎不是一个合理的选择,因为我不知道随着时间的推移会遇到什么其他值。有没有一种好的、通用的方法来处理将由FOR XML编码的字符?
在视图中(因此,不使用用户定义的函数或存储过程),是否有更好的方法在SQL Server中实现INITCAP

nwlls2ji

nwlls2ji1#

如果对SVF感兴趣,这里有一个缩小的版本,它允许定制和边缘事件。例如,你会得到Phd而不是PhD ... MacDonald,O'Neil
这是一个大大缩小的版本。我的规则/例外在一个通用的Map表中。

    • 示例**
select *
       ,[dbo].[svf-Str-Proper] (S)
  from (
  values ('Minesota Mining and   Manufacturing')
  , ('Minesota Mining & Manufacturing   ')
  , (' tillamook')
  , ('MUTUAL OF OMAHA')
  , ('   ')
  ) q(s)

结果

s                                       (No column name)
Minesota Mining and   Manufacturing     Minesota Mining And Manufacturing
Minesota Mining & Manufacturing         Minesota Mining & Manufacturing
 tillamook                              Tillamook
MUTUAL OF OMAHA                         Mutual Of Omaha
    • 迭代函数**
CREATE FUNCTION [dbo].[svf-Str-Proper] (@S varchar(max))
Returns varchar(max)
As
Begin
    Set @S = ' '+ltrim(rtrim(replace(replace(replace(lower(@S),' ','†‡'),'‡†',''),'†‡',' ')))+' '
    ;with cte1 as (Select * From (Values(' '),('-'),('/'),('\'),('['),('{'),('('),('.'),(','),('&'),(' Mc'),(' Mac'),(' O''') ) A(P))
         ,cte2 as (Select * From (Values('A'),('B'),('C'),('D'),('E'),('F'),('G'),('H'),('I'),('J'),('K'),('L'),('M')
                                       ,('N'),('O'),('P'),('Q'),('R'),('S'),('T'),('U'),('V'),('W'),('X'),('Y'),('Z')
                                       ,('LLC'),('PhD'),('MD'),('DDS'),('II'),('III'),('IV')
                                 ) A(S))
         ,cte3 as (Select F = Lower(A.P+B.S),T = A.P+B.S From cte1 A Cross Join cte2 B 
                   Union All 
                   Select F = Lower(B.S+A.P),T = B.S+A.P From cte1 A Cross Join cte2 B where A.P in ('&') 
                  ) 
    Select @S = replace(@S,F,T) From cte3 
    Return rtrim(ltrim(@S))
End
-- Syntax : Select [dbo].[svf-Str-Proper]('john cappelletti')
--          Select [dbo].[svf-Str-Proper]('james e. o''neil')
--          Select [dbo].[svf-Str-Proper]('CAPPELLETTI II,john old macdonald iv phd,dds llc b&o railroad bank-one at&t BD&I Bank-Five dr. Langdon,dds')
enxuqcxy

enxuqcxy2#

请尝试以下解决方案。
它使用SQL Server XML、XQuery及其FLWOR表达式。
值得注意的要点:

  • cast as xs:token?负责空格,即:
  • 所有不可见的制表符、回车符和换行符都将替换为空格。
  • 然后从值中删除前导空格和尾随空格。
  • 此外,连续出现的多个空格将被替换为单个空格。
  • FLWOR表达式正在处理正确的大小写。
    • SQL语言**
-- DDL and sample data population, start
DECLARE @tbl TABLE (tokens VARCHAR(MAX));
INSERT @tbl (tokens) VALUES
('mineSota Mining and   MaNufacturing'),
('Minesota Mining & Manufacturing   '),
(' tillamook'),
('MUTUAL  OF   OMAHA'),
('   ');
-- DDL and sample data population, end

DECLARE @separator CHAR(1) = SPACE(1);
SELECT t.*, scrubbed
    , result = c.query('
        for $x in /root/r/text()
        return concat(upper-case(substring($x,1,1)),lower-case(substring($x,2,1000)))
        ').value('text()[1]', 'VARCHAR(MAX)')
FROM @tbl AS t
CROSS APPLY (SELECT TRY_CAST('<r><![CDATA[' + tokens + ' ' + ']]></r>' AS XML).value('(/r/text())[1] cast as xs:token?','VARCHAR(MAX)')) AS t1(scrubbed)
CROSS APPLY (SELECT TRY_CAST('<root><r><![CDATA[' + 
      REPLACE(scrubbed, @separator, ']]></r><r><![CDATA[') + 
      ']]></r></root>' AS XML)) AS t2(c);
    • 产出**

| 代币|擦洗过的|结果|
| - ------|- ------|- ------|
| 矿业和制造|矿业和制造|Minesota采矿和制造业|
| Minesota采矿和制造|Minesota采矿和制造|Minesota采矿和制造|
| 蒂拉穆克|蒂拉穆克|蒂拉穆克|
| 奥马哈互助会|奥马哈互助会|奥马哈互助银行|
| 零|||

kmpatx3s

kmpatx3s3#

你犯了一个典型的SQL Server XML错误,不能只使用PATH('')。你必须做一些复杂的PATH(''),TYPE).value('.','NVARCHAR(MAX)')来获得正确的编码字符。
以下是您的修复版本:

with T as (
  select *
  from (
  values ('Minesota Mining and   Manufacturing')
  , ('Minesota Mining & Manufacturing   ')
  , (' tillamook')
  , ('MUTUAL OF OMAHA')
  , ('   ')
  ) q(s)
),
scrubbed as (
  select T.s as InitialValue
  , CASE 
      WHEN LEN(RTRIM(T.s)) > 0
        THEN LTRIM(RTRIM(T.s))
    END as s
  from T
)
select distinct s.InitialValue
, stuff(
    (
      SELECT ' ' + t2.word
      from (
          select str.value
          , upper(substring(str.value, 1, 1)) + 
            case when len(str.value) > 1 then lower(substring(str.value, 2, len(str.value) - 1)) else '' end as word
          , charindex(' ' + str.value + ' ', ' ' + s.s + ' ') as idx
          from string_split(s.s, ' ') str
        ) t2
      order by t2.idx
      FOR XML PATH(''), TYPE
    ).value('.', 'NVARCHAR(MAX)'), 
    1, 
    1, 
    ''
  ) as INITCAP
from scrubbed s

相关问题