我在用 libpostal
- pypostal
但我只需要 road
以及 country
在数组中 ["franklin ave","usa"],["leonard st","united kingdom"]
我怎样才能做到这一点?
返回类型为 net.razorvine.pickle.objects.classdictconstructor
```
from pyspark.sql.functions import udf
LIBPOSTAL_LOADED = False
@udf("string")
def parse(address):
from postal.parser import parse_address
address_parsed = parse_address(address)
return str(address_parsed)
spark.createDataFrame(['781 Franklin Ave Crown Heights Brooklyn NYC NY 11216 USA','The Book Club 100-106 Leonard St, Shoreditch, London, Greater London, EC2A 4RH, United Kingdom'], "string").toDF("address").select(parse("address")).show(truncate=False)
![](https://i.stack.imgur.com/D9ZXB.png)
@mck应要求更新
@udf("array")
def parse(address):
from postal.parser import parse_address
address_parsed = [a[0] for a in parse_address(address) if a[1] in ['road', 'country']]
return address_parsed
+------------------+
|[franklin ave,usa]|
+------------------+
这是意料之中的############################################################################
@udf("array")
def parse(address):
from postal.parser import parse_address
address_parsed = [a[0] for a in parse_address(address) if a[1] in ['road', 'country']]
return address_parsed[0]
+-----+
|null |
+-----+
这并不像预期的那样。我希望第一个元素来自 `address_parsed` 就是这样 `franklin ave`
1条答案
按热度按时间pgccezyw1#
在返回解析后的地址之前,您可以尝试列表理解: