有一个相当于tidyr的uncount的Pandas吗？

4urapxun 于 2023-04-03 发布在其他

关注(0)|答案(4)|浏览(116)

假设我们有一个包含变量分组及其频率的表：
在R中：

> df

# A tibble: 3 x 3
  Cough Fever cases
  <lgl> <lgl> <dbl>
1 TRUE  FALSE     1
2 FALSE FALSE     2
3 TRUE  TRUE      3

然后，我们可以使用tidyr::uncount来获得一个包含各个案例的 Dataframe ：

> uncount(df, cases)

# A tibble: 6 x 2
  Cough Fever
  <lgl> <lgl>
1 TRUE  FALSE
2 FALSE FALSE
3 FALSE FALSE
4 TRUE  TRUE 
5 TRUE  TRUE 
6 TRUE  TRUE

在Python/Pandas中是否有等价物？

来源：https://stackoverflow.com/questions/61533786/is-there-a-pandas-equivalent-to-tidyrs-uncount

4条答案

按热度按时间

z0qdvdin1#

除了其他解决方案之外，您还可以合并take、repeat和drop：

import pandas as pd
df = pd.DataFrame({'Cough': [True, False, True],
                   'Fever': [False, False, True],
                   'cases': [1, 2, 3]})

df.take(df.index.repeat(df.cases)).drop(columns="cases")

    Cough   Fever
0   True    False
1   False   False
1   False   False
2   True    True
2   True    True
2   True    True

您也可以在对头寸编制索引之前预先选择列：

df.loc(axis=1)[:'Fever'].take(df.index.repeat(df.cases))
   Cough  Fever
0   True  False
1  False  False
1  False  False
2   True   True
2   True   True
2   True   True

赞(0）回复(0）举报 2023-04-03

ymdaylpp2#

你有一个行索引，并根据计数重复它，例如在R中你可以这样做：

df[rep(1:nrow(df),df$cases),]

第一个获得像您这样的数据：

df = pd.DataFrame({'x':[1,1,2,2,2,2],'y':[0,1,0,1,1,1]})
counts = df.groupby(['x','y']).size().reset_index()
counts.columns = ['x','y','n']

    x   y   n
0   1   0   1
1   1   1   1
2   2   0   1
3   2   1   3

然后：

counts.iloc[np.repeat(np.arange(len(counts)),counts.n),:2]

    x   y
0   1   0
1   1   1
2   2   0
3   2   1
3   2   1
3   2   1

赞(0）回复(0）举报 2023-04-03

von4xj4u3#

我还没有在Python中找到一个等效的函数，但这个函数可以工作

df2 = df.pop('cases')
df = pd.DataFrame(df.values.repeat(df2, axis=0), columns=df.columns)

df['cases']被传递给df2，然后创建一个新的DataFrame，根据df2中的计数重复原始DataFrame中的元素。如果有帮助，请告诉我。

赞(0）回复(0）举报 2023-04-03

wztqucjr4#

就像在datar中使用tidyr的API一样简单：

>>> from datar.all import f, tribble, uncount
>>> df = tribble(
...     f.Cough, f.Fever, f.cases,
...     True,    False,   1,
...     False,   False,   2,
...     True,    True,    3
... )
>>> uncount(df, f.cases)
   Cough  Fever
  <bool> <bool>
0   True  False
1  False  False
2  False  False
3   True   True
4   True   True
5   True   True

我是软件包的作者。如果您有任何问题，请随时提交问题。

赞(0）回复(0）举报 2023-04-03

我来回答

有一个相当于tidyr的uncount的Pandas吗？

4条答案

相关问题

热门标签

最新问答