pandas 列的正则表达式未产生预期输出

56lgkhnf  于 12个月前  发布在  其他
关注(0)|答案(1)|浏览(97)

我有这样一个框架:

dfsupport = pd.DataFrame({'Date': ['8/12/2020','8/12/2020','13/1/2020','24/5/2020','31/10/2020','11/7/2020','11/7/2020'],
                          'Category': ['Table','Chair','Cushion','Table','Chair','Mats','Mats'],
                          'Sales': ['1 table','3chairs','8 cushions','3Tables','12 Chairs','12Mats','4Mats'],
                          'Paid': ['Yes','Yes','Yes','Yes','No','Yes','Yes',],
                          'Amount': ['93.78','$51.99','44.99','38.24','£29.99','29 21 only','18']
                          })

字符串
它看起来像这样的表格形式:

Date Category     Sales   Paid      Amount
0   8/12/2020    Table     1 table  Yes       93.78
1   8/12/2020    Chair     3chairs  Yes      $51.99
2   13/1/2020  Cushion  8 cushions  Yes       44.99
3   24/5/2020    Table     3Tables  Yes       38.24
4  31/10/2020    Chair   12 Chairs   No      £29.99
5   11/7/2020     Mats      12Mats  Yes  29 21 only
6   11/7/2020     Mats       4Mats  Yes          18


我想删除上面的两个字符串元素。我已经学会了成功地将$和£替换为:

patternv='|'.join(re.escape(x) for x in ['$', '£'])
dfsupport['Amount'] = dfsupport['Amount'].str.replace(patternv,regex=True)


我现在想替换Amount列中的“29 21 only”条目。我的尝试是:

patterns="{r'(\d{1,})\s(\d{1,2})\D+' : r'\1 \2'}"
dfsupport['Amount']=dfsupport['Amount'].str.replace(patterns,regex=True)


然而,我的尝试导致了错误:

Traceback (most recent call last):
  File "/home/cloud/code/learning/howmany.py", line 160, in <module>
    dfsupport['Amount'] = dfsupport['Amount'].str.replace(patternv,regex=True)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cloud/.venv/lib/python3.12/site-packages/pandas/core/strings/accessor.py", line 136, in wrapper
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: StringMethods.replace() missing 1 required positional argument: 'repl'


我该如何解决此问题?
我应该补充说,我正在寻求输出为“29.21”
我跟着question here

sczxawaw

sczxawaw1#

您缺少str.replace的第二个(必需)参数:

dfsupport['Amount'] = dfsupport['Amount'].str.replace(r'(\d{1,})\s(\d{1,2})\D+',
                                                      r'\1.\2', regex=True)

字符串
看起来你还想在patterns中使用一个字典,但是这并不像你做的那样工作,你必须传递一个正则表达式。
如果你想传递一个字典,使用replace(不带str.):

patterns = {r'(\d{1,})\s(\d{1,2})\D+' : r'\1.\2'}
dfsupport['Amount'] = dfsupport['Amount'].replace(patterns,regex=True)


输出量:

Date Category       Sales Paid  Amount
0   8/12/2020    Table     1 table  Yes   93.78
1   8/12/2020    Chair     3chairs  Yes  $51.99
2   13/1/2020  Cushion  8 cushions  Yes   44.99
3   24/5/2020    Table     3Tables  Yes   38.24
4  31/10/2020    Chair   12 Chairs   No  £29.99
5   11/7/2020     Mats      12Mats  Yes   29.21
6   11/7/2020     Mats       4Mats  Yes      18

相关问题