pyspark 如何在spark scala中从输入文件中读取不同类型的段？

bvjveswy 于 2023-10-15 发布在 Spark

关注(0)|答案(1)|浏览(104)

样本数据到达输入文件：样本数据包括三个部分- BS -借款人，RS -关系，CR -信贷（贸易线）。如何使用spark读取具有单独布局和解析的数据。
段的布局定义

BS - Borrower: Indicator, First name, Last, name, Company name, joining date
RS - Relationship: Indicator, Name, Role, Company Name
CR - Credity facility: Indicator, Type of Loan, Account Number, Sanctioned date, Amount

样本数据：

BS,Rohan,Mundle,Infy,20230101
RS,Sohan Mundle,Director,Croma
CR,Home Loan, 10023045, 20200101, 10000.00
BS,Priyatee,Sinha,L&T,20220101
RS,Mohan Mehta,Owner, ABC Tech
CR,Home Loan, 20023045, 20200301, 50000.00

如何在spark scala中使用类型安全方法读取上述数据。

pyspark

来源：https://stackoverflow.com/questions/77211150/how-to-read-different-types-of-segments-from-input-file-in-spark-scala

1条答案

按热度按时间

e4yzc0pl1#

通过阅读Dataset[String]中的数据并按指示符过滤数据，并分配正确的模式，可以从一个文本文件中提取三个不同的模式：

val mixedData: Dataset[String] = spark.read.textFile("sampleData.csv")

def readWithSchema(indicator: String, schema: StructType): DataFrame = {
  val segmentData = mixedData.filter(_.startsWith(indicator))
  spark.read.schema(schema).csv(segmentData)
}

val borrowerSchema = StructType(
  Seq(
    StructField(name = "Indicator", dataType = StringType),
    StructField(name = "First name", dataType = StringType),
    StructField(name = "Last name", dataType = StringType),
    StructField(name = " Company name", dataType = StringType),
    StructField(name = " joining date", dataType = StringType)
  )
)
val borrowers = readWithSchema("BS", borrowerSchema)
// Declare schema for `Relationship` and `Credity facility`, and read with `readWithSchema`

输出量：

+---------+----------+---------+-------------+-------------+
|Indicator|First name|Last name| Company name| joining date|
+---------+----------+---------+-------------+-------------+
|BS       |Rohan     |Mundle   |Infy         |20230101     |
|BS       |Priyatee  |Sinha    |L&T          |20220101     |
+---------+----------+---------+-------------+-------------+

赞(0）回复(0）举报 2023-10-15

我来回答

pyspark 如何在spark scala中从输入文件中读取不同类型的段？

1条答案

相关问题

热门标签

最新问答