pandas 在2个条件下选择行,并在新列中打印以开头的所有行的值

zpf6vheq  于 2022-12-21  发布在  其他
关注(0)|答案(2)|浏览(128)

我正在尝试将行标记为"根"
下面是一个示例
进口Pandas当PD

collection   = ({
    'REZID':["E0912","E0912","E0912","E0912","E0912","E0912","E0912","E0912","E0913","E0913",],
    'POS' :["01","0101","0102","0103","010301","010302","02","0201","01","0101"],
    'KOMPID':['k01','k02','k03','k04','k05','k06','k07','k08','k09','k10'],
    'WEIGHT':[1000,300,400,300,150,150,1400,1400,1200,500]
                })

df = pd.数据框(集合,列=["REZID","POS","KOMPID","WEIGHT"])

REZID     POS KOMPID  WEIGHT
0  E0912      01    k01    1000
1  E0912    0101    k02     300
2  E0912    0102    k03     400
3  E0912    0103    k04     300
4  E0912  010301    k05     150
5  E0912  010302    k06     150
6  E0912      02    k07    1400
7  E0912    0201    k08    1400
8  E0913      01    k09    1200
9  E0913    0101    k10     500

我希望将POS上具有2个以上dig的行标记为其根,即2个dig
根= df.位置[df ['POS '].字符串长度()== 2]

REZID  POS     KOMPID  WEIGHT
0   E0912   01  k01     1000
6   E0912   02  k07     1400
8   E0913   01  k09     1200

我想要的

REZID     POS KOMPID  WEIGHT ROOT
0  E0912      01    k01    1000  k01
1  E0912    0101    k02     300  k01
2  E0912    0102    k03     400  k01
3  E0912    0102    k04     300  k01
4  E0912  010201    k05     150  k01
5  E0912  010202    k06     150  k01
6  E0912      02    k07    1400  k07
7  E0912    0201    k08    1400  k07
8  E0913      01    k09    1200  k09
9  E0913    0101    k10     500  k09
2vuwiymt

2vuwiymt1#

您可以简单地创建一个名为ROOT的新列,并使用KOMPID填充其中POS正好为2位数的行:
df['ROOT']=root['KOMPID']
这将返回:

REZID   POS     KOMPID  WEIGHT  ROOT
0   E0912   01      k01     1000    k01
1   E0912   0101    k02     300     NaN
2   E0912   0102    k03     400     NaN
3   E0912   0103    k04     300     NaN
4   E0912   010301  k05     150     NaN
5   E0912   010302  k06     150     NaN
6   E0912   02      k07     1400    k07
7   E0912   0201    k08     1400    NaN
8   E0913   01      k09     1200    k09
9   E0913   0101    k10     500     NaN

然后使用fillna方法用最后一个可用的ROOT值"向前填充"空单元格:
df = df.fillna(method = 'ffill')
这将返回:

REZID   POS     KOMPID  WEIGHT  ROOT
0       E0912   01      k01     1000    k01
1       E0912   0101    k02     300     k01
2       E0912   0102    k03     400     k01
3       E0912   0103    k04     300     k01
4       E0912   010301  k05     150     k01
5       E0912   010302  k06     150     k01
6       E0912   02      k07     1400    k07
7       E0912   0201    k08     1400    k07
8       E0913   01      k09     1200    k09
9       E0913   0101    k10     500     k09
cbjzeqam

cbjzeqam2#

(df['REZID'] == row['REZID'])用于检查是否具有与当前行相同的REZID值。
(df['POS'].str.startswith(root_pos))用于检查是否具有从根位置开始的POS值。
使用.iloc[0]返回第一行。

def get_root(row):
    root_pos = row['POS'][:2]
    root_kompid = df[(df['REZID'] == row['REZID']) & (df['POS'].str.startswith(root_pos))]['KOMPID'].iloc[0]
    return root_kompid

df['ROOT'] = df.apply(get_root, axis=1)

相关问题