scala—如何有条件地从列中删除前两个字符

dauxcl2d 于 2021-05-29 发布在 Hadoop

关注(0)|答案(4)|浏览(416)

我有一些电话记录的以下数据，我想从每个记录中删除前两个值，因为它们是国家代码。使用scala、spark或hive有什么方法可以做到这一点？

phone
|917799423934|
|019331224595|
|  8981251522|
|917271767899|

我希望结果是：

phone
|7799423934|
|9331224595|
|8981251522|
|7271767899|

如何从该列的每个记录或每行中删除前缀91,01？

hadoop Hive scala apache-spark

来源：https://stackoverflow.com/questions/52173674/how-to-conditionally-remove-the-first-two-characters-from-a-column

4条答案

按热度按时间

66bbxpm51#

如果它们是字符串，则对于配置单元查询：

sql("select substring(phone,3) from table").show

赞(0）回复(0）举报 2021-05-29

u91tlkcl2#

我认为这是一个改进，我更喜欢一个包含contains或等同于contains的列表，但下面是：

import org.apache.spark.sql.functions._
case class Tel(telnum: String)
val ds = Seq(
     Tel("917799423934"),
     Tel("019331224595"),
     Tel("8981251522"),
     Tel("+4553")).toDS()
val ds2 = ds.withColumn("new_telnum", when(expr("substring(telnum,1,2)") === "91" || expr("substring(telnum,1,2)") === "01", expr("substring(telnum,3,length(telnum)-2)")).otherwise(col("telnum"))) 
ds2.show

退货：

+------------+----------+
|      telnum|new_telnum|
+------------+----------+
|917799423934|7799423934|
|019331224595|9331224595|
|  8981251522|8981251522|
|       +4553|     +4553|
+------------+----------+

我们可能需要考虑+，但什么也没说。

展开查看全部

赞(0）回复(0）举报 2021-05-29

toiithl63#

使用正则表达式
使用 regexp_replace （如有必要，请添加更多扩展代码）：

select regexp_replace(trim(phone),'^(91|01)','') as phone --removes leading 91, 01 and all leading and trailing spaces
from table;

同样的用法 regexp_extract :

select regexp_extract(trim(phone),'^(91|01)?(\\d+)',2) as phone --removes leading and trailing spaces, extract numbers except first (91 or 01) 
from table;

赞(0）回复(0）举报 2021-05-29

agxfikkp4#

电话大小可以不同，可以使用这种结构（scala）：

df.withColumn("phone", expr("substring(phone,3,length(phone)-2)"))

赞(0）回复(0）举报 2021-05-29

我来回答

scala—如何有条件地从列中删除前两个字符

4条答案

相关问题

热门标签

最新问答