pandas python将 Dataframe 与分类树匹配

3phpmpom  于 2022-12-25  发布在  Python
关注(0)|答案(1)|浏览(150)

给定以下数据框架,一个与一些交易数据有关,一个与一些分类规则有关:

data = {'Transaction_description': ['sfdsjk fsjfdkj;f sfsdf RESTARANT', 'fsdk ;kjf;lskf;m gjkf NL111111111111 klkfdlo', 'golf kjnfksdn DE111111111112 fkdkk', 'jhfjd jhfj Jumbo jhf'], 'Amount': [-20, -21, -30, 10]} 
Transactions = pd.DataFrame(data)  

data = {
    'Priority': [1, 1, 2, 2, 3, 3], 
    'Type': ['IBAN', 'IBAN', 'Company', 'Company', 'Keyword','Keyword'],
    'Value': ['NL111111111111', 'DE111111111112', 'AMAZON', 'JUMBO','Restaurant','Golf'],
    'Priority': [1, 1, 2, 2, 3, 3],
    'Description': ['', '', '', '','',''],
    'MappingCode': ['A1', 'A2', 'B1', 'B2','B1','B2']
    } 
Categorization = pd.DataFrame(data)

我想根据[Priority]对所有[Transaction_description]进行分类,搜索

(1) IBAN 
(2) Company 
(3) keyword.

哪一个是获得以下预期结果的最佳原因:

data = {
    'Transaction_description': ['sfdsjk fsjfdkj;f sfsdf RESTARANT', 'fsdk ;kjf;lskf;m gjkf NL111111111111 klkfdlo', 'golf kjnfksdn DE111111111112 fkdkk', 'jhfjd jhfj Jumbo jhf'], 
    'Amount': [-20, -21, -30, 10],
    'MappingCode': ['B1','A1','A2','B2']
    } 
TransactionsClassified = pd.DataFrame(data)

谢谢你,海伯里

velaa5lx

velaa5lx1#

您的数据有点非结构化,这使得它有点困难。

  • 首先,最好清理一下你的数据,去掉一些错别字,把所有的东西都换成小写。
  • 然后,您可以在“事务”数据框中创建一列,该列应包含可合并的字符串。您可以通过创建可能的字符串列表,然后使用np.where()来完成此操作。列表的顺序将决定分类的顺序。
  • 然后在此字符串上合并您的分类数据框

您的代码可能如下所示:

import numpy as np
import pandas as pd

# Data input
data = {'Transaction_description': ['sfdsjk fsjfdkj;f sfsdf RESTAURANT', 'fsdk ;kjf;lskf;m gjkf NL111111111111 klkfdlo', 'golf kjnfksdn DE111111111112 fkdkk', 'jhfjd jhfj Jumbo jhf'], 'Amount': [-20, -21, -30, 10]} 
Transactions = pd.DataFrame(data)  

data = {
    'Priority': [1, 1, 2, 2, 3, 3], 
    'Type': ['IBAN', 'IBAN', 'Company', 'Company', 'Keyword','Keyword'],
    'Value': ['NL111111111111', 'DE111111111112', 'AMAZON', 'JUMBO','Restaurant','Golf'],
    'Description': ['', '', '', '','',''],
    'MappingCode': ['A1', 'A2', 'B1', 'B2','B1','B2']
    } 
Categorization = pd.DataFrame(data)

# Make everything lowercase
Transactions["Transaction_description"] =  Transactions["Transaction_description"].str.lower()
Categorization["Value"] = Categorization["Value"].str.lower()

# Create the column you can merge on
keywordList = list(Categorization[Categorization["Type"] == "Keyword"]["Value"])
ibanList = list(Categorization[Categorization["Type"] == "IBAN"]["Value"])
companyList = list(Categorization[Categorization["Type"] == "Company"]["Value"])

allList = keywordList + companyList + ibanList
Transactions["Value"] = np.nan
for element in allList:
    Transactions["Value"] = np.where(Transactions["Transaction_description"].str.contains(element), element, Transactions["Value"])

# Merge the dataframes
TransactionsClassified = Transactions[["Transaction_description", "Value", "Amount"]].merge(Categorization[["Value", "MappingCode"]], on="Value")

相关问题