pandas 如何选择groupby中空值最少的组?

k4emjkb1  于 2023-03-28  发布在  其他
关注(0)|答案(2)|浏览(101)

示例:

  1. row_number |id |firstname | middlename | lastname |
  2. 0 | 1 | John | NULL | Doe |
  3. 1 | 1 | John | Jacob | Doe |
  4. 2 | 2 | Alison | Marie | Smith |
  5. 3 | 2 | NULL | Marie | Smith |
  6. 4 | 2 | Alison | Marie | Smith |

我试图弄清楚如何groupby id,然后为每个groupby获取具有最少NULL值的行,删除包含最少NULL值的任何额外行都可以(例如,删除row_number 4,因为它将row_number 2与id=2的最少NULL值联系起来)
本例的答案是row_numbers 1和2
最好是ANSI SQL,但我可以翻译其他语言(如python与pandas),如果你能想到一种方法来做
编辑:为平局打破的情况增加了一行。

jdzmm42g

jdzmm42g1#

如果你想这样做Pandas,你可以这样做:

  1. df[df.assign(NC = df.isnull().sum(1)).groupby('id')['NC'].transform(lambda x: x == x.min())]

输出:

  1. row_number id firstname middlename lastname
  2. 1 1 1 John Jacob Doe
  3. 2 2 2 Alison Marie Smith

决胜局:
添加一行:

  1. df.loc[4,['row_number','id','firstname','middlename','lastname']] = ['4',2,'Mary','Maxine','Maxwell']

然后使用groupbytransformidxmin

  1. df[df.index == df.assign(NC = df.isnull().sum(1)).groupby('id')['NC'].transform('idxmin')]

输出:

  1. row_number id firstname middlename lastname
  2. 1 1 1 John Jacob Doe
  3. 2 2 2 Alison Marie Smith
展开查看全部
jm81lzqq

jm81lzqq2#

哦,你想要null值最少的行。我建议:

  1. select t.*
  2. from (select t.*,
  3. dense_rank() over (order by (case when firstname is null then 1 else 0 end) +
  4. (case when middlename is null then 1 else 0 end) +
  5. (case when lastname is null then 1 else 0 end)
  6. ) as seqnum
  7. from t
  8. ) t
  9. where seqnum = 1;

这是ANSI标准的SQL。

相关问题