假设我有一个这样的 Dataframe :
RING CLLI RR root CIRCUIT
N100 M200 200.1 OC1 Circuit1
N100 M200 200.1 OC1 Circuit2
N100 M201 200.2 OC1 Circuit3
N100 M202 200.3 OC1 Circuit1
N101 M300 300.1 OC2 Circuit1
N101 M304 301.8 OC2 Circuit11
N101 M147 500.5 OC2 Circuit10
N102 M874 568.7 OC4 Circuit11
N102 M874 568.7 OC4 Circuit114
N102 M874 568.7 OC4 Circuit113
N102 M874 568.7 OC4 Circuit112
N104 M643 414.1 OC8 Circuit2
N104 M643 414.1 OC8 Circuit234
N104 M643 414.1 OC8 Circuit11
我想检查例如,如果列电路在其他行重复.如果它重复,并且RING是不同的,我想添加另一列,并告诉什么是该电路的CLLI。最后,最终的dataframe看起来像这样:
RING CLLI RR root CIRCUIT NeigbourCLLI
N100 M200 200.1 OC1 Circuit1 M300
N100 M200 200.1 OC1 Circuit2 M643
N100 M201 200.2 OC1 Circuit3 NaN
N100 M202 200.3 OC1 Circuit1 NaN
N101 M300 300.1 OC2 Circuit1 M200
N101 M304 301.8 OC2 Circuit11 M874, M643
N101 M147 500.5 OC2 Circuit10 NaN
N102 M874 568.7 OC4 Circuit11 M304, M643
N102 M874 568.7 OC4 Circuit114 NaN
N102 M874 568.7 OC4 Circuit113 NaN
N102 M874 568.7 OC4 Circuit112 NaN
N104 M643 414.1 OC8 Circuit2 M200
N104 M643 414.1 OC8 Circuit234 NaN
N104 M643 414.1 OC8 Circuit11 M874, M304
我试过这段代码,但它不是那么好,它也迭代了很多,我的 Dataframe 是巨大的,它需要很多时间:
ring_clli_dict = {}
for index, row in df.iterrows():
circuit = row['CIRCUIT']
ring = row['RING']
clli = row['CLLI']
if circuit in ring_clli_dict:
if ring != ring_clli_dict[circuit]['RING']:
ring_clli_dict[circuit]['NeighbourCLLI'].append(clli)
else:
ring_clli_dict[circuit] = {'RING': ring, 'NeighbourCLLI': [clli]}
for index, row in df.iterrows():
circuit = row['CIRCUIT']
if circuit in ring_clli_dict:
df.at[index, 'NeighbourCLLI'] = ', '.join(ring_clli_dict[circuit]['NeighbourCLLI'])
有没有更好的办法解决这个问题?
UPD:
好吧,这个很好用,但是正如我提到的,我的DF是50万行,使用这样的代码,需要很长时间才能完成。
grouped = df.groupby('CIRCUIT').apply(lambda x: x['RING'].nunique() > 1)
def find_neighboring_cllis(row):
circuit, ring = row['CIRCUIT'], row['RING']
if grouped[circuit]:
neighbors = df[(df['CIRCUIT'] == circuit) & (df['RING'] != ring)]['CLLI'].unique()
if neighbors.size > 0:
return ', '.join(neighbors)
return np.nan
df['NeigbourCLLI'] = df.apply(find_neighboring_cllis, axis=1)
有没有可能让它更快,并使用更“Pandas”的解决方案,而不是一行一行?
2条答案
按热度按时间gtlvzcf81#
试试这个:
4nkexdtk2#
试试这个:
输出: