Dataframe 中每列的pyspark最大字符串长度

ljsrvy3e  于 2023-01-01  发布在  Spark
关注(0)|答案(1)|浏览(257)

我正在数据库中尝试此操作。请让我知道需要导入的pyspark库以及在Azure数据库pyspark中获取以下输出的代码
示例:-输入 Dataframe :-

  1. | column1 | column2 | column3 | column4 |
  2. | a | bbbbb | cc | >dddddddd |
  3. | >aaaaaaaaaaaaaa | bb | c | dddd |
  4. | aa | >bbbbbbbbbbbb | >ccccccc | ddddd |
  5. | aaaaa | bbbb | ccc | d |

输出 Dataframe :-

  1. | column | maxLength |
  2. | column1 | 14 |
  3. | column2 | 12 |
  4. | column3 | 7 |
  5. | column4 | 8 |
kx7yvsdv

kx7yvsdv1#

  1. >>> from pyspark.sql import functions as sf
  2. >>> df = sc.parallelize([['a','bbbbb','ccc','ddd'],['aaaa','bbb','ccccccc', 'dddd']]).toDF(["column1", "column2", "column3", "column4"])
  3. >>> df1 = df.select([sf.length(col).alias(col) for col in df.columns])
  4. >>> df1.groupby().max().show()
  5. +------------+------------+------------+------------+
  6. |max(column1)|max(column2)|max(column3)|max(column4)|
  7. +------------+------------+------------+------------+
  8. | 4| 5| 7| 4|
  9. +------------+------------+------------+------------+

然后使用此link来融化先前的 Dataframe

相关问题