使用Regex组通过一次匹配多个模式来重命名Pandas Dataframe 中的列

ktca8awb  于 2023-05-27  发布在  其他
关注(0)|答案(1)|浏览(85)

数据框的列列表如下:dfcols_2 = ['state', 'population', 'male population', 'female population', 'working population', 'male working population', 'female working population' 'female population in the age group 0 to 6 years', 'male population in the age group 0 to 6 years', 'population in the age group 0 to 6 years']
出于自然原因,我想将名称压缩如下:出现的任何populationpopmalem_femalef_workingwin the age group 0 to 6 years_minor
请注意图案中包含的空格
这个堆栈溢出讨论是起点,这里的要求只是通过匹配单个模式来摆脱方括号。
我的目标是获得多个模式的多个匹配
真的很感谢任何形式的帮助!
PS:这是我第一次来这里!

ffscu2ro

ffscu2ro1#

您可以尝试:

import re

new_cols = []

pat = re.compile(r"(?:fe)?male\s*|population|working\s*")
for col in cols_2:
    new_col = pat.sub(
        lambda g: f"{g[0][0]}_" if g[0][0] in "fmw" else f"{g[0][:3]}", col
    )
    new_col = new_col.replace(" in the age group 0 to 6 years", "_minor")
    new_cols.append(new_col)

print(new_cols)

# to replace the columns in DataFrame then:
# df.columns = new_cols

图纸:

[
    "state",
    "pop",
    "m_pop",
    "f_pop",
    "w_pop",
    "m_w_pop",
    "f_w_pop",
    "f_pop_minor",
    "m_pop_minor",
    "pop_minor",
]

相关问题