pandas 在python中对 Dataframe 的每n行进行置乱/置换的最佳方法是什么？

cgfeq70w 于 2022-12-31 发布在 Python

关注(0)|答案(3)|浏览(212)

我想对一个 Dataframe 中的每n行（窗口大小）进行混洗，但我不知道如何用Python的方式来实现。我找到了混洗所有行的答案，但没有找到给定窗口大小的答案：

def permute(df: pd.DataFrame, window_size: int = 10) -> pd.DataFrame:
    df_permuted = df.copy()
    """How would you shuffle every window_size rows for the modifiable columns?"""
    df_permuted.loc[:, modifiable_columns]
    ...
    return df_permuted

pandas

来源：https://stackoverflow.com/questions/74970303/what-is-the-best-way-to-shuffle-permute-each-n-rows-of-a-data-frame-in-python

3条答案

按热度按时间

kb5ga3dv1#

这段代码定义了一个名为permute的函数，该函数接收Pandas Dataframe 和窗口大小（默认设置为10），并返回一个经过混洗的新 Dataframe 。
该函数首先通过将输入 Dataframe 的长度除以窗口大小来计算窗口的数量。然后，它迭代窗口，并使用 Dataframe 的sample方法（随机重新排序行）重排每个窗口中的行。最后，它使用concat方法将所有重排的窗口连接到一个 Dataframe 中，并返回此 Dataframe 。
然后，代码通过创建一个小的 Dataframe 并将其打印出来来测试置换函数，然后在窗口大小为3的 Dataframe 上调用置换函数并打印出混洗后的 Dataframe 。

import pandas as pd

def permute(df: pd.DataFrame, window_size: int = 10) -> pd.DataFrame:
    num_windows = len(df) // window_size
    
    compil = []
    for i in range(num_windows):
        start = i * window_size
        end = (i+1) * window_size
        compil.append( df.iloc[start:end].sample(frac=1))
        
    df = pd.concat(compil)
    return df

# Test the permute function
df = pd.DataFrame({"A": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
                   "B": [11, 12, 13, 14, 15, 16, 17, 18, 19, 20]})
print(df)

df_permuted = permute(df, window_size=3)
print(df_permuted)

输出：

赞(0）回复(0）举报 2022-12-31

w8f9ii692#

接受的答案没有矢量化。使用groupby.sample是更好的选择：

df.groupby(np.arange(len(df))//N).sample(frac=1)

赞(0）回复(0）举报 2022-12-31

sc4hvdpw3#

要添加代码注解中的附加要求，但问题中没有，这里有一个版本也考虑了可修改的列。
在下面的示例中，mod和mod2是可修改的列，而nomod列是不可修改的。
我认为使用向量化方法无法实现可修改的列，因此将其添加到已接受的答案中，而且已接受的答案在内存中保留了整个df的另一个完整表示，而我的版本只保留了window_size大小的内存记录。

df = pd.DataFrame([np.arange(0, 12)]*3).T
df.columns = ['mod', 'nomod', 'mod2']
df

    mod     nomod   mod2
0   0   0   0
1   1   1   1
2   2   2   2
3   3   3   3
4   4   4   4
5   5   5   5
6   6   6   6
7   7   7   7
8   8   8   8
9   9   9   9
10  10  10  10
11  11  11  11

def permute(df, window_size, modifiable_columns):
    num_chunks = int(len(df) / window_size)
    for i in range(0, num_chunks):
        start_ind = i * window_size
        end_ind = i * window_size + window_size
        
        df_row_subset = df.loc[start_ind:end_ind-1, modifiable_columns].sample(frac=1, random_state=1)
        df_row_subset.index = np.arange(start_ind, end_ind)
        
        df.loc[df_row_subset.index, modifiable_columns] = df_row_subset
        
    return df

permute(df, 4, ['mod', 'mod2'])

    mod     nomod   mod2
0   3   0   3
1   2   1   2
2   0   2   0
3   1   3   1
4   7   4   7
5   6   5   6
6   4   6   4
7   5   7   5
8   11  8   11
9   10  9   10
10  8   10  8
11  9   11  9

赞(0）回复(0）举报 2022-12-31

我来回答

pandas 在python中对 Dataframe 的每n行进行置乱/置换的最佳方法是什么？

3条答案

相关问题

热门标签

最新问答