pandas 查找组中公共值的连续索引

vdzxcuhz 于 2023-02-02 发布在其他

关注(0)|答案(1)|浏览(112)

对于以下DataFrame：

df = pd.DataFrame({'Name': ['Tesla','Tesla','Tesla','Toyota','Ford','Ford','Ford','BMW','BMW','BMW','Mercedes','Mercedes','Mercedes'],
                   'Type': ['Model X','Model X','Model X','Corolla','Bronco','Bronco','Mustang','3 Series','-','-','Alpha','Alpha','S-Class'],
                   'Year': [2015, 2015, 2015, 2017, 2018, 2018, 2020, 2015, 2015, 2017, 2018, 2018, 2020],
                   'Price': [85000, 90000, 95000, 20000, 35000, 35000, 45000, 40000, 40000, 65000, 50000, 50000, 75000],
                   'Color': ['White','White','White','Red','Blue','Blue','Yellow','Silver','Silver','Black','White','White','Black']
                  })

df

Name    Type     Year   Price   Color
0   Tesla   Model X  2015   85000   White
1   Tesla   Model X  2015   90000   White
2   Tesla   Model X  2015   95000   White
3   Toyota  Corolla  2017   20000   Red
4   Ford    Bronco   2018   35000   Blue
5   Ford    Bronco   2018   35000   Blue
6   Ford    Mustang  2020   45000   Yellow
7   BMW     3 Series 2015   40000   Silver
8   BMW  -   2015   40000   Silver
9   BMW  -   2017   65000   Black
10  Mercedes Alpha   2018   50000   White
11  Mercedes Alpha   2018   50000   White
12  Mercedes S-Class 2020   75000   Black

对于每个Name列组，我试图找到其他列中连续公共值的对应start index和end index。如果值连续重复，则应将其视为公共值，并将其存储到字典中，其中键作为开始索引，值作为结束索引。
例如，Tesla在Type列中有Model X公共，所以在Type列中，我应该得到Model X的start index和Model X的end index作为字典中的第一个值，start index：'end index作为字典的键值对。
类似地，对于Color列，Blue应该具有作为4的start index和作为5的end index，并且对于Year列，2015应该具有作为0的start index和作为2的end index。
预期输出：
Name：

{0: 2, 3: 3, 4: 6, 7: 9, 10: 12}

Type：

{0: 2, 3: 3, 4: 5, 6: 6, 7: 7, 8: 9, 10: 11, 12: 12}

Year：

{0: 2, 3: 3, 4: 5, 6: 6, 7: 8, 9: 9, 10: 11, 12: 12] }

Price：

{0: 0, 1: 1, 2: 2, 3: 3, 4: 5, 6: 6, 7: 8, 9: 9, 10: 11, 12: 12}

Color：

{0: 2, 3: 3, 4: 5, 6: 6, 7: 8, 9: 9, 10: 11, 12:12}

我所尝试的

def find_indices(df, column):
    df1 = df.index.to_series().groupby(df[column]).agg(['first', 'last']).reset_index()
    df1 = df1.sort_values("first").reset_index()
    first_last_rows = df1.set_index('first')['last'].to_dict()
    print(column + ":")
    print(first_last_rows)

我得到的输出

Name:
{0: 2, 3: 3, 4: 6, 7: 9, 10: 12}
Type:
{0: 2, 3: 3, 4: 5, 6: 6, 7: 7, 8: 9, 10: 11, 12: 12}
Year:
{0: 8, 3: 9, 4: 11, 6: 12}
Price:
{0: 0, 1: 1, 2: 2, 3: 3, 4: 5, 6: 6, 7: 8, 9: 9, 10: 11, 12: 12}
Color:
{0: 11, 3: 3, 4: 5, 6: 6, 7: 8, 9: 12}

在输出中，我得到Year和Color列的最终值来自不相交组，因此无法找到连续的公共值（或不同Name的值）

pandas

来源：https://stackoverflow.com/questions/75292018/find-consecutive-indices-of-common-values-in-a-group

1条答案

按热度按时间

jyztefdp1#

看起来你需要一组连续的值，为此一个众所周知的方法是使用shift和cumsum，然后使用groupby，然后对于你的问题，你可以使用每组的第一个索引和最后一个索引，然后按如下所示构建一个dict：

col = "Year"
col_dict = dict(
    df.groupby(df[col].ne(df[col].shift()).cumsum())[col]
    .agg(lambda x: (x.index[0], x.index[-1]))
    .values
)
print(col_dict)

{0: 2, 3: 3, 4: 5, 6: 6, 7: 8, 9: 9, 10: 11, 12: 12}

赞(0）回复(0）举报 2023-02-02

我来回答

pandas 查找组中公共值的连续索引

1条答案

相关问题

热门标签

最新问答