python 在Polars中创建一个常量列

quhf5bfb 于 2024-01-05 发布在 Python

关注(0)|答案(3)|浏览(242)

在Polars 0.13.14中，我可以创建一个DataFrame，其中包含一个全常数列，如下所示：

import polars as pl
pl.DataFrame(dict(x=pl.repeat(1, 3)))
# shape: (3, 1)
# ┌─────┐
# │ x   │
# │ --- │
# │ i64 │
# ╞═════╡
# │ 1   │
# ├╌╌╌╌╌┤
# │ 1   │
# ├╌╌╌╌╌┤
# │ 1   │
# └─────┘

字符串
但在Polars 0.13.15中，这是一个错误

ValueError: Series constructor not called properly.

型
如何用polars中的值填充列？

python

来源：https://stackoverflow.com/questions/71624674/make-a-constant-column-in-polars

3条答案

按热度按时间

smdnsysy1#

您可能正在寻找pl.lit(..)

import polars as pl
df = pl.DataFrame({"a": [1,2,3], "b": [4, 5, 6]})
print(df)
print(df.with_column(pl.lit(1).alias("constant_column")))

字符串
这将为您给予以下输出

shape: (3, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 4   │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 2   ┆ 5   │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 3   ┆ 6   │
└─────┴─────┘
shape: (3, 3)
┌─────┬─────┬─────────────────┐
│ a   ┆ b   ┆ constant_column │
│ --- ┆ --- ┆ ---             │
│ i64 ┆ i64 ┆ i32             │
╞═════╪═════╪═════════════════╡
│ 1   ┆ 4   ┆ 1               │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2   ┆ 5   ┆ 1               │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 3   ┆ 6   ┆ 1               │
└─────┴─────┴─────────────────┘

型

展开查看全部

赞(0）回复(0）举报 2024-01-05

vc6uscn92#

从Polars 0.13.15开始，默认情况下，repeat became a lazy function和lazy函数不会在DataFrame构造函数中进行计算。您可以使用eager=True标志恢复渴望行为：

import polars as pl
pl.DataFrame(dict(x=pl.repeat(1, 3, eager=True)))

字符串
或者你可以使用这样的上下文：

import polars as pl
pl.DataFrame().with_column(pl.repeat(1, 3).alias('x'))

型

赞(0）回复(0）举报 2024-01-05

xyhw6mcr3#

pl.repeat（polars v0.20.0）的文档字符串如下：

pl.repeat(
    value: 'IntoExpr | None',
    n: 'int | Expr',
    *,
    dtype: 'PolarsDataType | None' = None,
    eager: 'bool' = False,
) -> 'Expr | Series'

字符串
默认情况下，它返回一个惰性表达式。

pl.repeat(1,2, eager=True)

型
正如@ Drupal所提到的。
表达式也可以使用pl.select运行，并转换为系列：

In[265]: pl.select(pl.repeat(1, 2)).to_series()
Out[265]:
shape: (2,)
Series: 'repeat' [i32]
[
        1
        1
]

型
pl.select并行运行表达式，因此您可以简化并加快该过程，

pl.DataFrame(pl.select(a=pl.repeat(1, 100), b=pl.repeat(1, 100)).to_dict())

型
当你有很多表达式计算时，这可能很方便。为了演示它：

In [277]: %timeit pl.select(a=pl.repeat(1, 100, eager=True))
150 µs ± 6.2 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [278]: %timeit pl.select(a=pl.repeat(1, 100, eager=True), b=pl.repeat(1, 100, eager=True))
331 µs ± 14.1 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
In [279]: %timeit pl.select(a=pl.repeat(1, 100), b=pl.repeat(1, 100))
128 µs ± 4.18 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [280]: %timeit pl.select(a=pl.repeat(1, 100))
104 µs ± 8.09 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [281]: %timeit pl.repeat(1, 100, eager=True)
99 µs ± 5.18 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [282]: %timeit pl.repeat(1, 100, eager=True); pl.repeat(1, 100, eager=True)
208 µs ± 14.4 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

型

展开查看全部

赞(0）回复(0）举报 2024-01-05

我来回答

python 在Polars中创建一个常量列

3条答案

相关问题

热门标签

最新问答