scala 在列值中引发正则表达式‘硬币’->Rlike方法

8xiog9wr  于 2022-11-09  发布在  Scala
关注(0)|答案(1)|浏览(105)

我想检查列值中是否包含‘Coin’等值。有没有可能更改我的正则表达式,使其不包括“加密币|KUCOIN|Coinbase”?我想要一些像这样的东西
与硬币单词关联的正则表达式|BTCBIT.NET”
请在下面找到我的附加代码:

val CRYPTO_CARD_INDICATOR: String = ("BTCBIT.NET|KUCOIN|COINBASE|CRYPTCOIN")
val CryptoCheckDataset = df.withColumn("is_crypto_indicator",when(upper(col("company_name")).rlike(CRYPTO_CARD_INDICATOR), 1).otherwise(0))
hjzp0vay

hjzp0vay1#

我认为以下措施应该会奏效:

COIN|BTCBIT.NET

在PySpark中进行全面测试:

from pyspark.sql.functions import *
CRYPTO_CARD_INDICATOR = "COIN|BTCBIT.NET"
df = spark.createDataFrame([('kucoin',), ('coinbase',), ('crypto',)], ['company_name'])

CryptoCheckDataset = df.withColumn("is_crypto_indicator", when(upper(col("company_name")).rlike(CRYPTO_CARD_INDICATOR), 1).otherwise(0))
CryptoCheckDataset.show()

# +------------+-------------------+

# |company_name|is_crypto_indicator|

# +------------+-------------------+

# |      kucoin|                  1|

# |    coinbase|                  1|

# |      crypto|                  0|

# +------------+-------------------+

相关问题