如何在Oracle中知道文本是阿拉伯语还是英语

gcmastyq  于 12个月前  发布在  Oracle
关注(0)|答案(3)|浏览(121)

假设我有一张这样的table

table:
C1
ابي
امي
mo

字符串
我想指定阿拉伯语和英语单词,所以我希望我的选择是这样的

C1 C2
ابي Ar
امي Ar
mo EN


我怎么能做这样的事呢?

92vpleto

92vpleto1#

TRANSLATEREGEXP_LIKECASE表达式一起使用。假设您的英语单词仅包含字母字符、连字符或撇号,则:

SELECT c1,
       CASE
       WHEN TRANSLATE(
              c1,
              '0abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ-''',
              '0'
            ) IS NULL
       THEN 'EN'
       ELSE 'AR'
       END AS c2
FROM   table_name;

字符串
或者使用REGEXP_LIKE(可能输入较少,但可能效率较低):

SELECT c1,
       CASE WHEN REGEXP_LIKE(c1, '^[a-zA-Z''-]+$') THEN 'EN' ELSE 'AR' END
         AS c2
FROM   table_name;


其中,对于样本数据:

CREATE TABLE table_name (c1) AS
SELECT 'ابي' FROM DUAL UNION ALL
SELECT 'امي' FROM DUAL UNION ALL
SELECT 'mo' FROM DUAL


两个输出:
| C1| C2|
| --|--|
| ابي |AR|
| امي |AR|
| 莫|EN|
fiddle

dphi5xsq

dphi5xsq2#

可以使用REGEXP_LIKE来搜索阿拉伯语unicode范围0600- 06 FF:

SELECT c1, CASE WHEN regexp_like(c1, UNISTR('[\0600-\06FF]')) THEN 'AR' ELSE 'EN' END AS c2
FROM mytable

字符串
Demo here

3df52oht

3df52oht3#

如果只有英语和阿拉伯语,那么最简单的方法是利用阿拉伯字母是多字节字符的事实。你可以比较字符长度和字节长度来区分它们:

-- Answer updated after comment below (thanks p3consulting)
WITH    -- Sample Data:
    tbl AS
        (   Select 1 "ID", 'ابي' "C1" From Dual Union All
            Select 2, 'امي' From Dual Union All
            Select 3, 'اسم عملة الاتحاد الأوروبي هو اليورو والعلامة هي €' From Dual Union All
            Select 4, 'Name of EU currency is euro and the sign is €' From Dual )
Select ID, Length(C1) "LEN", LengthB(C1) "LEN_BYTES" From tbl
/* 
        ID        LEN  LEN_BYTES
---------- ---------- ----------
         1          3          6
         2          3          6
         3         49         91
         4          2          2 */
 
Select  C1, 
        -- if there are some multibyte characters in english text (like €) then
        -- case expression in next line will give false result for english text (it would say Arabic)
        Case When Length(C1) != LengthB(C1) Then 'Arabic' Else 'English' End "LANG",
        -- This (case below) should solve the above issue (if exists)
        -- case when char length increased 70% is still less than byte length then ARABIC
        -- 70% is arbitrary - you could define it differently
        Case When Length(C1) * 1.7 < LengthB(C1) Then 'ARABIC' Else 'ENGLISH' End "LANG_2"
From tbl
/*
C1                                                  LANG    LANG_2
--------------------------------------------------  ------- -------
ابي                                                 Arabic  ARABIC
ابي                                                 Arabic  ARABIC
اسم عملة الاتحاد الأوروبي هو اليورو والعلامة هي €      Arabic  ARABIC
Name of EU currency is euro and the sign is €       Arabic  ENGLISH */

字符串
这应该比regexp快得多。

相关问题