pandas 如何将不规则列表转换为 Dataframe

y53ybaqx  于 2023-02-17  发布在  其他
关注(0)|答案(3)|浏览(164)

我有一张不规则的单子:

['6', '20553737100', '6', '20431084172', '25200.00', '4536.00', 'PEN', '09', 'EG01', '124', '2022-06-20', '29735.43', ['POLO MANGA LARGA T L', '600.00', '16.90', '19.942', '1825.20', '10140.00', '18.00'], ['POLO MANGA LARGA T M', '600.00', '16.90', '19.942', '1825.20', '10140.00', '18.00'], ['LENTE LUNA CLARA TSG-100 ANTIEMPAÑO SIMPLE', '800.00', '2.65', '3.127', '381.60', '2120.00', '18.00'], ['LENTE LUNA OSCURA TSG-100 ANTIEMPAÑO C/CORDON', '800.00', '3.50', '4.13', '504.00', '2800.00', '18.00']

我希望它在 Dataframe 中看起来像这样:

0       6   20553737100 6   20431284172 25200   4536    PEN 09  EG01    124 2022-06-02  29735.43    POLO MANGA LARGA T L    600 16.9    19.942  10140   1825.2  18
1       6   20553737100 6   20431284172 25200   4536    PEN 09  EG01    124 2022-06-02  29735.43    POLO MANGA LARGA T M    600 16.9    19.942  10140   1825.2  18
2       6   20553737100 6   20431284172 25200   4536    PEN 09  EG01    124 2022-06-02  29735.43    LENTE LUNA OSCURA TSG   800 2.65    3.127   2120    381.6   18
3       6   20553737100 6   20431284172 25200   4536    PEN 09  EG01    124 2022-06-02  29735.43    LENTE LUNA OSCURA JAE   800 3.5 4.13    2800    504 18

抱歉,刚才的问题不完整,显然我的列表要复杂得多。

jdzmm42g

jdzmm42g1#

使用交叉合并:

l = [['a'],['b'],['c'],['d'],[[5],[9],[7],[4]]]

(pd.DataFrame(l[:4]).T.merge(pd.DataFrame(l[4]), how='cross')
   .set_axis([f'column_{i+1}' for i in range(5)], axis=1)
 )

输出:

column_1 column_2 column_3 column_4  column_5
0        a        b        c        d         5
1        a        b        c        d         9
2        a        b        c        d         7
3        a        b        c        d         4

使用自定义函数来取消嵌套原始列表的另一个想法是:

def unnest(l):
    if len(l) == 1:
        return l[0]
    return [unnest(x) for x in l]

pd.DataFrame([unnest(l)]).explode(4, ignore_index=True))

或者,使用编程变量以避免必须将列指定为explode

(pd.DataFrame([unnest(l)])
   .pipe(lambda d: d.explode(list (d.columns[d.iloc[0].str.len().gt(1)]), ignore_index=True))
 )

输出:

0  1  2  3  4
0  a  b  c  d  5
1  a  b  c  d  9
2  a  b  c  d  7
3  a  b  c  d  4
zrfyljdw

zrfyljdw2#

我尝试了他们给我的解决方案,一步一步地,这就是我如何找到这个解决方案:

df = (pd.DataFrame(filas[:12])
.T
.merge(pd.DataFrame(filas[13:17]), how='cross')
.set_axis([f'column_{i+1}' for i in range(19)], axis=1))

因此,情况如下

column_1     column_2 column_3     column_4  column_5 column_6 column_7 column_8  ...   column_12                                      column_13 column_14 column_15 column_16 column_17 column_18 column_19
0        6  20553737100        6  20431084172  25200.00  4536.00      PEN       09  ...  2022-06-20                           POLO MANGA LARGA T L    600.00     16.90    19.942  10140.00   1825.20     18.00
1        6  20553737100        6  20431084172  25200.00  4536.00      PEN       09  ...  2022-06-20                           POLO MANGA LARGA T M    600.00     16.90    19.942  10140.00   1825.20     18.00
2        6  20553737100        6  20431084172  25200.00  4536.00      PEN       09  ...  2022-06-20     LENTE LUNA CLARA TSG-100 ANTIEMPAÑO SIMPLE    800.00      2.65     3.127   2120.00    381.60     18.00
3        6  20553737100        6  20431084172  25200.00  4536.00      PEN       09  ...  2022-06-20  LENTE LUNA OSCURA TSG-100 ANTIEMPAÑO C/CORDON    800.00      3.50      4.13   2800.00    504.00     18.00

谢谢你做的一切

yqlxgs2m

yqlxgs2m3#

您可以将输入列表分为2个部分,以构建2个 Dataframe ,然后将其连接起来:

lst = ['6', '20553737100', '6', '20431084172', '25200.00', '4536.00', 'PEN', '09', 'EG01', '124', '2022-06-20', '29735.43', ['POLO MANGA LARGA T L', '600.00', '16.90', '19.942', '1825.20', '10140.00', '18.00'], ['POLO MANGA LARGA T M', '600.00', '16.90', '19.942', '1825.20', '10140.00', '18.00'], ['LENTE LUNA CLARA TSG-100 ANTIEMPAÑO SIMPLE', '800.00', '2.65', '3.127', '381.60', '2120.00', '18.00'], ['LENTE LUNA OSCURA TSG-100 ANTIEMPAÑO C/CORDON', '800.00', '3.50', '4.13', '504.00', '2800.00', '18.00']]
df = pd.concat([pd.DataFrame({f'column_{i}': v for i, v in enumerate(lst[:12], 1)}, index=[0]),
                pd.DataFrame(columns=[f'column_{i}' for i in range(13, 13 + len(lst[12]))],
                             data=lst[12:])], axis=1).ffill()
print(df)
column_1     column_2 column_3     column_4  column_5 column_6 column_7  \
0        6  20553737100        6  20431084172  25200.00  4536.00      PEN   
1        6  20553737100        6  20431084172  25200.00  4536.00      PEN   
2        6  20553737100        6  20431084172  25200.00  4536.00      PEN   
3        6  20553737100        6  20431084172  25200.00  4536.00      PEN   

  column_8 column_9 column_10   column_11 column_12  \
0       09     EG01       124  2022-06-20  29735.43   
1       09     EG01       124  2022-06-20  29735.43   
2       09     EG01       124  2022-06-20  29735.43   
3       09     EG01       124  2022-06-20  29735.43   

                                       column_13 column_14 column_15  \
0                           POLO MANGA LARGA T L    600.00     16.90   
1                           POLO MANGA LARGA T M    600.00     16.90   
2     LENTE LUNA CLARA TSG-100 ANTIEMPAÑO SIMPLE    800.00      2.65   
3  LENTE LUNA OSCURA TSG-100 ANTIEMPAÑO C/CORDON    800.00      3.50   

  column_16 column_17 column_18 column_19  
0    19.942   1825.20  10140.00     18.00  
1    19.942   1825.20  10140.00     18.00  
2     3.127    381.60   2120.00     18.00  
3      4.13    504.00   2800.00     18.00

相关问题