SQL Server Clear text with SQL [duplicate]

ao218c7q  于 2024-01-05  发布在  其他
关注(0)|答案(1)|浏览(103)

This question already has answers here:

How to strip HTML tags from a string in SQL Server? (12 answers)
Closed 12 days ago.

I have this text string that has certain HTML tags like \r, <a href= ............, and I want to remove them to leave the text completely clean, without tags, but I cannot remove them.

I have this query

DECLARE @text_original NVARCHAR(MAX);
DECLARE @clean_text NVARCHAR(MAX);

SET @text_original = ' several pop gems in their wake.\r\n\nThe groups original linitarist/vocal drummer/vocalist Debbi Peterson responded to an advertisement that guitarist/vocalist <a href=\"spotify:artist:2Sc4ukCRllIu02LZfHF0RL\">Susanna Hoffs</a> had placed in a local Los Angeles paper, The Recycler. Taking the name the Bangs, the trio released a single, \"Getting Out of Hand\"/\"Call on Me,\" on their own label, Downkiddie. They had to change their name early the following year to the Bangles, since there was already a New York-based group called the Bangs. After an appearance on a Rodney on the ROQ compilation and a series of local concerts which featured new bassist Annette Zilinskas, Miles Copeland signed the Bangles to the <a href=\"spotify:search:label%3A%22IRS%22\">IRS</a> subsidiary <a href=\"spotify:search:label%3A%22Faulty+Products%22\">Faulty himself, and when it came time to find a producer for the Bangles fifth record, <a href=\"spotify:artist:2Sc4ukCRllIu02LZfHF0RL\">Hoffs</a> didnt have far to look. The resulting sunny, very California-sounding Sweetheart of the Sun was released in September 2011. The following years saw the band regularly touring while playing some big shows like 2012s Rewind Festival, the 50th anniversary celebration for the famous L.A. nightclub the Whisky a Go Go, and 2013s Paisley Aboveground, which featured the Bangles playing alongside reunited paisley underground bands <a href=\"spotify:artist:3iayA3QkB3jfkUgcXqsL7C\">the Three OClock</a> and <a href=\"spotify:artist:1e9cOgMYTLJ0KFOjmTx13F\">Rain Parade</a>. In 2014, the band released Ladies and Gentlemen... The Bangles!, a compilation of early singles, EP tracks, and rarities. In 2019 the band appeared alongside <a href=\"spotify:artist:1e9cOgMYTLJ0KFOjmTx13F\">Rain Parade</a>, <a href=\"spotify:artist:1l0eT7EY5r7U1gMVm2SyoS\">the Dream Syndicate</a>, and <a href=\"spotify:artist:3iayA3QkB3jfkUgcXqsL7C\">the Three OClock</a> on the Paisley Underground compilation 3X4, which saw each band covering a song by the other three groups. ~ Stephen Thomas Erlewine & Andrew Leahey, Rovi'
SET @clean_text = REPLACE(
                    REPLACE(
                        REPLACE(
                            REPLACE(
                                REPLACE(
                                    REPLACE(
                                        REPLACE(
                                            REPLACE(
                                                REPLACE(
                                                    REPLACE(
                                                        @text_original,
                                                        '</a>', ''
                                                    ),
                                                    '<a[^>]*>', ''
                                                ),
                                                'href="spotify:[^"]*"', ''
                                            ),
                                            'href="[^"]*"', ''
                                        ),
                                        '\r\n\n', ' '
                                    ),
                                    '\r\n', ' '
                                ),
                                CHAR(13), ' '
                            ),
                            CHAR(10), ' '
                        ),
                        '>', ''
                    ),
                    '<', ''
                );
SELECT @clean_text AS clean_text;

But the output still showing the tags, any idea?

xqkwcwgp

xqkwcwgp1#

Here is one way to solve the puzzle.

Though it is not a generic sustainable solution.

I had to apply few REPLACE(...) calls to clean up the input string and cast is as a legitimate XML.

SQL

DECLARE @text_original NVARCHAR(MAX) = ' several pop gems in their wake.\r\n\nThe groups original linitarist/vocal drummer/vocalist Debbi Peterson responded to an advertisement that guitarist/vocalist <a href=\"spotify:artist:2Sc4ukCRllIu02LZfHF0RL\">Susanna Hoffs</a> had placed in a local Los Angeles paper, The Recycler. Taking the name the Bangs, the trio released a single, \"Getting Out of Hand\"/\"Call on Me,\" on their own label, Downkiddie. They had to change their name early the following year to the Bangles, since there was already a New York-based group called the Bangs. After an appearance on a Rodney on the ROQ compilation and a series of local concerts which featured new bassist Annette Zilinskas, Miles Copeland signed the Bangles to the <a href=\"spotify:search:label%3A%22IRS%22\">IRS</a> subsidiary <a href=\"spotify:search:label%3A%22Faulty+Products%22\">Faulty himself, and when it came time to find a producer for the Bangles fifth record, <a href=\"spotify:artist:2Sc4ukCRllIu02LZfHF0RL\">Hoffs</a> didnt have far to look. The resulting sunny, very California-sounding Sweetheart of the Sun was released in September 2011. The following years saw the band regularly touring while playing some big shows like 2012s Rewind Festival, the 50th anniversary celebration for the famous L.A. nightclub the Whisky a Go Go, and 2013s Paisley Aboveground, which featured the Bangles playing alongside reunited paisley underground bands <a href=\"spotify:artist:3iayA3QkB3jfkUgcXqsL7C\">the Three OClock</a> and <a href=\"spotify:artist:1e9cOgMYTLJ0KFOjmTx13F\">Rain Parade</a>. In 2014, the band released Ladies and Gentlemen... The Bangles!, a compilation of early singles, EP tracks, and rarities. In 2019 the band appeared alongside <a href=\"spotify:artist:1e9cOgMYTLJ0KFOjmTx13F\">Rain Parade</a>, <a href=\"spotify:artist:1l0eT7EY5r7U1gMVm2SyoS\">the Dream Syndicate</a>, and <a href=\"spotify:artist:3iayA3QkB3jfkUgcXqsL7C\">the Three OClock</a> on the Paisley Underground compilation 3X4, which saw each band covering a song by the other three groups. ~ Stephen Thomas Erlewine & Andrew Leahey, Rovi';
DECLARE @clean_text NVARCHAR(MAX);

DECLARE @xml XML = CAST('<root>' + 
    REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(@text_original, '\"', '"')
        , '&', '&amp;')
        , 'Bangles fifth record', 'Bangles fifth record</a>')
        , '\r', '')
        , '\n', '')
    + '</root>' AS XML);

SET @clean_text = @xml.value('data(.)', 'NVARCHAR(MAX)');

-- test
SELECT @clean_text;

相关问题