Python基于单独的查找替换df中的现有列值列上的代码/值(df)KeyError.get_loc

mqkwyuun  于 2022-10-30  发布在  Python
关注(0)|答案(1)|浏览(183)

问题

根据单独的查找类型操作替换目标数据框中的现有列值,以匹配另一个单独的源数据框中的“代码/值”列,并进行更新,例如,将目标数据框列中的文本替换为源数据的“代码/值”。基本上就是将类似于“10”的内容替换为“您的全名”。

代码尝试时出现键错误

此操作引发了键错误。

  1. countynames.set_index('CountyCode')
  2. employee['County.Code'] = countynames.lookup(countynames.index, countynames['CountyCode'])

潜在解决方案构想

类似于让apply()函数在dataframe 'countynames'中查找雇员['County. Code'],并用countynames ['Value']替换/覆盖/更新现有雇员['County. Code']。
寻找替代方法,因为我的第一次尝试导致了KeyError。

  1. ### potential approach 1:
  2. employee['County.Code'] = countynames.apply(lambda x: employee.loc[x['County.Code'], x['Value']], axis=1)
  3. ### potential approach 2:
  4. employee['County.Code']<- lapply(employee, function(x) look$class[match(x, look$CountyCode)])

实验代码

  1. employee = pd.read_csv("employee_data.csv")
  2. countynames = pd.read_csv("County Codes.csv")
  3. employee['County.Code']
  4. 0 34
  5. 1 34
  6. 2 34
  7. 3 34
  8. 4 55
  9. Name: County.Code, dtype: int64

源,查找数据框:

  1. countynames.head()
  2. CountyCode Value
  3. 0 1 Alameda
  4. 1 2 Alpine
  5. 2 3 Amador
  6. 3 4 Butte
  7. 4 5 Calaveras

错误:键错误

在列上引发错误.get_loc(item)

  1. KeyError Traceback (most recent call last)
  2. Input In [410], in <cell line: 2>()
  3. 1 countynames.set_index('CountyCode')
  4. ----> 2 employee['County.Code'] = countynames.lookup(countynames.index, countynames['CountyCode'])
  5. File ~\anaconda3\lib\site-packages\pandas\core\frame.py:4602, in DataFrame.lookup(self, row_labels, col_labels)
  6. 4600 result = np.empty(n, dtype="O")
  7. 4601 for i, (r, c) in enumerate(zip(row_labels, col_labels)):
  8. -> 4602 result[i] = self._get_value(r, c)
  9. 4604 if is_object_dtype(result):
  10. 4605 result = lib.maybe_convert_objects(result)
  11. File ~\anaconda3\lib\site-packages\pandas\core\frame.py:3615, in DataFrame._get_value(self, index, col, takeable)
  12. 3612 series = self._ixs(col, axis=1)
  13. 3613 return series._values[index]
  14. -> 3615 series = self._get_item_cache(col)
  15. 3616 engine = self.index._engine
  16. 3618 if not isinstance(self.index, MultiIndex):
  17. 3619 # CategoricalIndex: Trying to use the engine fastpath may give incorrect
  18. 3620 # results if our categories are integers that dont match our codes
  19. 3621 # IntervalIndex: IntervalTree has no get_loc
  20. File ~\anaconda3\lib\site-packages\pandas\core\frame.py:3931, in DataFrame._get_item_cache(self, item)
  21. 3926 res = cache.get(item)
  22. 3927 if res is None:
  23. 3928 # All places that call _get_item_cache have unique columns,
  24. 3929 # pending resolution of GH#33047
  25. -> 3931 loc = self.columns.get_loc(item)
  26. 3932 res = self._ixs(loc, axis=1)
  27. 3934 cache[item] = res
  28. File ~\anaconda3\lib\site-packages\pandas\core\indexes\base.py:3623, in Index.get_loc(self, key, method, tolerance)
  29. 3621 return self._engine.get_loc(casted_key)
  30. 3622 except KeyError as err:
  31. -> 3623 raise KeyError(key) from err
  32. 3624 except TypeError:
  33. 3625 # If we have a listlike key, _check_indexing_error will raise
  34. 3626 # InvalidIndexError. Otherwise we fall through and re-raise
  35. 3627 # the TypeError.
  36. 3628 self._check_indexing_error(key)
  37. KeyError: 1
hsvhsicv

hsvhsicv1#

没有数据总是很难。
但试试看:

  1. employee['County.Code'].replace(countynames.set_index("CountyCode")["Value"].to_dict(), inplace=True)

相关问题