dataframe选择行计数大于x的所有行

u59ebvdq  于 2021-07-13  发布在  Java
关注(0)|答案(1)|浏览(341)

如何选择行数>=2的所有行?
我有下面的PandasDataframe。

  1. df = pd.DataFrame({"date": ["2000-01-03", "2000-01-04", "2000-01-04", "2000-01-04", "2000-01-04",
  2. "2000-01-03", "2000-01-04", "2000-01-05", "2000-01-05",
  3. "2000-01-03", "2000-01-05", "2000-01-05",
  4. "2000-01-04", "2000-01-05"],
  5. "sym": ["A", "A", "A", "A", "A" ,"B", "B","B", "B" ,"C", "C", "C", "D", "E"],
  6. "val1": [1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 2, 2],
  7. "val2": [2, 2, 2, 2, 2, 2, 3, 3, 3, 1, 1, 2, 2, 2]
  8. })

测向

  1. date sym val1 val2
  2. 0 2000-01-03 A 1 2
  3. 1 2000-01-04 A 1 2
  4. 2 2000-01-04 A 1 2
  5. 3 2000-01-04 A 1 2
  6. 4 2000-01-04 A 1 2
  7. 5 2000-01-03 B 2 2
  8. 6 2000-01-04 B 2 3
  9. 7 2000-01-05 B 2 3
  10. 8 2000-01-05 B 2 3
  11. 9 2000-01-03 C 3 1
  12. 10 2000-01-05 C 3 1
  13. 11 2000-01-05 C 3 2
  14. 12 2000-01-04 D 2 2
  15. 13 2000-01-05 E 2 2

我申请了

  1. df.groupby(['date', 'sym'], as_index=False).mean().sort_values(['sym','date'])

为每个符号指定日期的val1、val2求平均值。

  1. date sym val1 val2
  2. 0 2000-01-03 A 1.0 2.0
  3. 3 2000-01-04 A 1.0 2.0
  4. 1 2000-01-03 B 2.0 2.0
  5. 4 2000-01-04 B 2.0 3.0
  6. 6 2000-01-05 B 2.0 3.0
  7. 2 2000-01-03 C 3.0 1.0
  8. 7 2000-01-05 C 3.0 1.5
  9. 5 2000-01-04 D 2.0 2.0
  10. 8 2000-01-05 E 2.0 2.0

接下来,我需要选择行计数>=2的“sym”的所有行。在本例中,结果df将是sym=a,b,c中的所有行
期望输出:

  1. date sym val1 val2
  2. 0 2000-01-03 A 1.0 2.0
  3. 3 2000-01-04 A 1.0 2.0
  4. 1 2000-01-03 B 2.0 2.0
  5. 4 2000-01-04 B 2.0 3.0
  6. 6 2000-01-05 B 2.0 3.0
  7. 2 2000-01-03 C 3.0 1.0
  8. 7 2000-01-05 C 3.0 1.5

我尝试了组合groupby,pivot,count,但运气不好。

mhd8tkvw

mhd8tkvw1#

请参阅:如何基于值计数过滤Dataframe?

  1. import pandas as pd
  2. df = pd.DataFrame({"date": ["2000-01-03", "2000-01-04",
  3. "2000-01-04", "2000-01-04",
  4. "2000-01-04", "2000-01-03",
  5. "2000-01-04", "2000-01-05",
  6. "2000-01-05", "2000-01-03",
  7. "2000-01-05", "2000-01-05",
  8. "2000-01-04", "2000-01-05"],
  9. "sym": ["A", "A", "A", "A", "A", "B",
  10. "B", "B", "B", "C", "C", "C",
  11. "D", "E"],
  12. "val1": [1, 1, 1, 1, 1, 2, 2, 2, 2, 3,
  13. 3, 3, 2, 2],
  14. "val2": [2, 2, 2, 2, 2, 2, 3, 3, 3, 1,
  15. 1, 2, 2, 2]
  16. })
  17. df = df \
  18. .groupby(['date', 'sym'], as_index=False) \
  19. .mean() \
  20. .sort_values(['sym', 'date'])
  21. df = df[df['sym'].map(df['sym'].value_counts()) >= 2]
  22. print(df)

输出:

  1. date sym val1 val2
  2. 0 2000-01-03 A 1.0 2.0
  3. 3 2000-01-04 A 1.0 2.0
  4. 1 2000-01-03 B 2.0 2.0
  5. 4 2000-01-04 B 2.0 3.0
  6. 6 2000-01-05 B 2.0 3.0
  7. 2 2000-01-03 C 3.0 1.0
  8. 7 2000-01-05 C 3.0 1.5
展开查看全部

相关问题