pyspark中的重塑函数(转置列)

fae0ux8s  于 2021-07-09  发布在  Spark
关注(0)|答案(0)|浏览(476)

我正在将一个r脚本转换为pyspark脚本,但有一点卡住了,需要一些帮助,下面是r脚本的代码:


## Transposing the stacked trt section to wider set for Outcome ID set

## c("Trial.ID", "Arm.ID", "Re.randomized.arm.id", "Phase.ID", "Period.ID")

TRT <- select(trt_stacked, -Planned.Treatment.ID) %>% 
  renameCol(c("Treatment.Administration.days.x", "Treatment.Administration.days.y"),
            c("Treatment.Administration.days.plan", "Treatment.Administration.days.act") )  %>% 
  reshape(direction = "wide", 
          idvar = c("Trial.ID", "Arm.ID", "Re.randomized.arm.id", "Phase.ID", "Period.ID",
                    "Phase", "Phase.Duration", "Phase.Duration.unit", "Phase.Description",
                    "Period", "Period.Duration", "Period.Dur.Unit", "Period.Description"), 
          timevar="Treatment.ID")

我需要将这段代码转换成pyspark,虽然spark中有一个pivot函数进行转置,但我不知道这个“重塑”函数的功能。我知道这个整形函数的输出是将除idvar中的所有列转置为treatment\u id的所有不同值的行。它还将treatment\u id连接到所有transpose列,如下所示:

Titration.1
Titration.Duration.1
Titration.Duration.Unit.1
Titration.Target.1
Titration.Value.1
Titration.unit.1
Treatment.name.1
Treatment.Class.1
Treatment.Description.1
Treatment.Start.Time.1
Treatment.End.Time.1

Titration.2
Titration.Duration.2
Titration.Duration.Unit.2
Titration.Target.2
Titration.Value.2
Titration.unit.2
Treatment.name.2
Treatment.Class.2
Treatment.Description.2
Treatment.Start.Time.2
Treatment.End.Time.2

r中的重塑函数是否也会删除空值?有人能帮我在spark或python中找到类似的函数吗。
输入:

|treatment_id|arm_id|re_randomized_arm_id|trial_id|phase_id| phase|phase_duration|phase_duration_unit|phase_description|titration|titration_duration|titration_duration_unit|titration_target|titration_value|titration_unit|
1|1|-999|16|1|Active|NA|NA|NA|titration|NA|NA|NA|NA|NA
2|1|-999|16|1|Active|NA|NA|NA|titration|NA|NA|NA|NA|NA
2|1|-999|16|1|Active|NA|NA|NA|No titration|NA|NA|NA|NA|NA

转置后的预期输出:

|treatment_id|arm_id|re_randomized_arm_id|trial_id|phase_id| phase|phase_duration|phase_duration_unit|phase_description|titration_1|titration_duration_1|titration_duration_unit_1|titration_target_1|titration_value_1|titration_unit_1|titration_2|titration_duration_2|titration_duration_unit_2|titration_target_2|titration_value_2|titration_unit_2|
1|1|-999|16|1|Active|NA|NA|NA|titration|NA|NA|NA|NA|NA
2|1|-999|16|1|Active|NA|NA|NA|titration|NA|NA|NA|NA|NA|No titration|NA|NA|NA|NA|NA

任何人都可以帮助,不能想出逻辑或库来执行这项任务。
谢谢

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题