如何在python中解组polars对象框架？

0dxa2lsx 于 12个月前发布在 Python

关注(0)|答案(2)|浏览(160)

我有一个polars框架，其中有一个特定的列与重复的模式。我已经将它们按模式分组并添加一个新列到这个分组的框架。但现在我必须解包/解组这个框架。我如何在polars中做到这一点？
我原来的网络看起来像这样：
| 文件|col1| col2|
| --|--|--|
| 一|电池1|电池2|
| B|小区3|小区4|
| 一|电池5|电池6|
| B|小区7|电池8|
我执行了groupby来按FILE对这个框架进行分组，然后我添加了我想要的新列，我得到了下面的输出。
| 文件|col1| col2|文件夹|
| --|--|--|--|
| 一|[1号牢房，5号牢房]|[2号牢房，6号牢房]|[文件1，文件2]|
| B| [3号牢房7号牢房]|[4号牢房，8号牢房]|[文件1，文件2]|
现在我想把上面的嵌套框架解组成原来的格式，同时也包括这个新的列。我怎么做呢？我的实际嵌套框架是巨大的，有很多行和列，使用迭代是无效的，相当慢。有没有什么函数可以应用于整个嵌套框架，而不是按列迭代？
最终期望输出：
| 文件|集管1|集流管2|文件夹|
| --|--|--|--|
| 一|电池1|电池2| file1|
| B|小区3|小区4| file1|
| 一|电池5|电池6| file2|
| B|小区7|电池8| file2|
我做了以下工作：

dfg = df.groupby('FILE').agg(pl.all())             #to group them first time 
newdf =  dfg.with_columns(pl.repeat([file1,file2,file3], dfg.height)    #adding desired column

字符串
怎样才能有效地得到想要的输出呢？注意，我的嵌套框架非常大，所以使用逐列迭代是很耗时的。
PS -更新了最终表格格式中的错字。在“文件”列中，由于条目在几行后重复，因此应为其分配一个新的“文件夹”名称。

python-3.x

来源：https://stackoverflow.com/questions/76913197/how-can-i-ungroup-a-polars-dataframe-in-python

2条答案

按热度按时间

jk9hmnmh1#

看起来像是在尝试“枚举”每个组。
你可以使用.cum_count()。

df = pl.from_repr("""
┌──────┬─────────┬──────────┐
│ file ┆ col1    ┆ col2     │
│ ---  ┆ ---     ┆ ---      │
│ str  ┆ str     ┆ str      │
╞══════╪═════════╪══════════╡
│ A    ┆ cell 1  ┆ cell 2   │
│ B    ┆ cell 3  ┆ cell 4   │
│ A    ┆ cell 5  ┆ cell 6   │
│ B    ┆ cell 7  ┆ cell 8   │
│ A    ┆ cell 9  ┆ cell 10  │
│ B    ┆ cell 11 ┆ cell 12  │
│ A    ┆ cell 13 ┆ cell 14  │
│ B    ┆ cell 15 ┆ cell 16  │
│ A    ┆ cell 17 ┆ cell 18  │
│ B    ┆ cell 19 ┆ cell 20  │
└──────┴─────────┴──────────┘
""")

df.with_columns(folder = 
   pl.col("file").cum_count().over("file")
)

shape: (10, 4)
┌──────┬─────────┬─────────┬────────┐
│ file ┆ col1    ┆ col2    ┆ folder │
│ ---  ┆ ---     ┆ ---     ┆ ---    │
│ str  ┆ str     ┆ str     ┆ u32    │
╞══════╪═════════╪═════════╪════════╡
│ A    ┆ cell 1  ┆ cell 2  ┆ 0      │
│ B    ┆ cell 3  ┆ cell 4  ┆ 0      │
│ A    ┆ cell 5  ┆ cell 6  ┆ 1      │
│ B    ┆ cell 7  ┆ cell 8  ┆ 1      │
│ …    ┆ …       ┆ …       ┆ …      │
│ A    ┆ cell 13 ┆ cell 14 ┆ 3      │
│ B    ┆ cell 15 ┆ cell 16 ┆ 3      │
│ A    ┆ cell 17 ┆ cell 18 ┆ 4      │
│ B    ┆ cell 19 ┆ cell 20 ┆ 4      │
└──────┴─────────┴─────────┴────────┘

您可以使用modulo arithmetic.将其转换为“重复序列”

df.with_columns(folder = 
   pl.col("file").cum_count().over("file").mod(3)
)

shape: (10, 4)
┌──────┬─────────┬─────────┬────────┐
│ file ┆ col1    ┆ col2    ┆ folder │
│ ---  ┆ ---     ┆ ---     ┆ ---    │
│ str  ┆ str     ┆ str     ┆ u32    │
╞══════╪═════════╪═════════╪════════╡
│ A    ┆ cell 1  ┆ cell 2  ┆ 0      │
│ B    ┆ cell 3  ┆ cell 4  ┆ 0      │
│ A    ┆ cell 5  ┆ cell 6  ┆ 1      │
│ B    ┆ cell 7  ┆ cell 8  ┆ 1      │
│ …    ┆ …       ┆ …       ┆ …      │
│ A    ┆ cell 13 ┆ cell 14 ┆ 0      │
│ B    ┆ cell 15 ┆ cell 16 ┆ 0      │
│ A    ┆ cell 17 ┆ cell 18 ┆ 1      │
│ B    ┆ cell 19 ┆ cell 20 ┆ 1      │
└──────┴─────────┴─────────┴────────┘

的字符串
然后你可以.format()这个字符串。

df.with_columns(folder = 
   pl.format("file{}", pl.col("file").cum_count().over("file").mod(3) + 1)
)

shape: (10, 4)
┌──────┬─────────┬─────────┬────────┐
│ file ┆ col1    ┆ col2    ┆ folder │
│ ---  ┆ ---     ┆ ---     ┆ ---    │
│ str  ┆ str     ┆ str     ┆ str    │
╞══════╪═════════╪═════════╪════════╡
│ A    ┆ cell 1  ┆ cell 2  ┆ file1  │
│ B    ┆ cell 3  ┆ cell 4  ┆ file1  │
│ A    ┆ cell 5  ┆ cell 6  ┆ file2  │
│ B    ┆ cell 7  ┆ cell 8  ┆ file2  │
│ …    ┆ …       ┆ …       ┆ …      │
│ A    ┆ cell 13 ┆ cell 14 ┆ file1  │
│ B    ┆ cell 15 ┆ cell 16 ┆ file1  │
│ A    ┆ cell 17 ┆ cell 18 ┆ file2  │
│ B    ┆ cell 19 ┆ cell 20 ┆ file2  │
└──────┴─────────┴─────────┴────────┘

的字符串

赞(0）回复(0）举报 12个月前

lstz6jyr2#

您可以explode：

dfg.explode(pl.exclude('file'))

字符串
不过，您的问题可能最好通过join或某种类型的over表达式来解决：

df = pl.DataFrame(
    {
        'file': ['A', 'B'] * 2,
        'col1': [f'cell {i}' for i in range(1, 9, 2)],
        'col2': [f'cell {i}' for i in range(2, 9, 2)],
    }
)
df2 = pl.DataFrame({'file': ['A', 'B'], 'folder': ['file1', 'file2']})

df.join(df2, on='file')

shape: (4, 4)
┌──────┬────────┬────────┬────────┐
│ file ┆ col1   ┆ col2   ┆ folder │
│ ---  ┆ ---    ┆ ---    ┆ ---    │
│ str  ┆ str    ┆ str    ┆ str    │
╞══════╪════════╪════════╪════════╡
│ A    ┆ cell 1 ┆ cell 2 ┆ file1  │
│ B    ┆ cell 3 ┆ cell 4 ┆ file2  │
│ A    ┆ cell 5 ┆ cell 6 ┆ file1  │
│ B    ┆ cell 7 ┆ cell 8 ┆ file2  │
└──────┴────────┴────────┴────────┘

的数据

赞(0）回复(0）举报 12个月前

我来回答

如何在python中解组polars对象框架？

2条答案

相关问题

热门标签

最新问答