typeerror:从hdfs读取时,“int”对象不可下标

kcwpcxri  于 2021-07-15  发布在  Hadoop
关注(0)|答案(1)|浏览(410)

我正在读取一个来自hdfs的文件,不断出现以下错误: TypeError: 'int' object is not subscriptable csv文件:

CLAIM_NUM,BEN_ST,AGE,MEDICAL_ONLY_IND,TTL_MED_LOSS,TTL_IND_LOSS,TTL_MED_EXP,TTL_IND_EXP,BP_CD,NI_CD,legalrep,depression,cardiac,diabetes,hypertension,obesity,smoker,subabuse,arthritis,asthma,CPT_codes,D,P,NDC_codes
123456789,IL,99,1,2201.26,0,97.16,0,31,4,1,0,0,0,0,0,0,0,0,0,NA,8409~71941,NA,NA
987654321,AL,98,1,568.12,0,20.82,0,42,52,1,0,0,0,0,0,0,0,0,0,NA,7242~8472~E9273,NA,NA

我的代码:

with hdfs.open("/user/ras.csv") as f: 
    reader = f.read()

    for i, row in enumerate(reader, start=1):
        root = ET.Element('cbcalc')
        icdNode = ET.SubElement(root, "icdcodes")

        for code in row['D'].split('~'):
            ET.SubElement(icdNode, "code").text = code
        ET.SubElement(root, "clientid").text = row['CLAIM_NUM']
        ET.SubElement(root, "state").text = row['BEN_ST']
        ET.SubElement(root, "country").text = "US"  
        ET.SubElement(root, "age").text = row['AGE']
        ET.SubElement(root, "jobclass").text = "1" 
        ET.SubElement(root, "fulloutput").text ="Y"

        cfNode = ET.SubElement(root, "cfactors")
        for k in ['legalrep', 'depression', 'diabetes',
                 'hypertension', 'obesity', 'smoker', 'subabuse']:
            ET.SubElement(cfNode, k.lower()).text = str(row[k])

        psNode = ET.SubElement(root, "prosummary")

        psicdNode = ET.SubElement(psNode, "icd")
        for code in row['P'].split('~'):
            ET.SubElement(psNode, "code").text = code

        psndcNode = ET.SubElement(psNode, "ndc")
        for code in row['NDC_codes'].split('~'):
            ET.SubElement(psNode, "code").text = code 

        cptNode = ET.SubElement(psNode, "cpt")
        for code in row['CPT_codes'].split('~'):
            ET.SubElement(cptNode, "code").text = code

        ET.SubElement(psNode, "hcpcs")

        doc = ET.tostring(root, method='xml', encoding="UTF-8")

        response = requests.post(target_url, data=doc, headers=login_details)
        response_data = json.loads(response.text)
        if type(response_data)==dict and 'error' in response_data.keys():
            error_results.append(response_data)
        else:
            api_results.append(response_data)

我需要做什么更改,以便循环浏览csv文件并将数据转换为xml格式以进行api调用?
我已经用python测试了这段代码,它似乎可以工作,但是一旦我把我的文件hdfs放进去,它就开始崩溃了。

q9rjltbz

q9rjltbz1#

问题是(可能;我没有安装这个库) f.read() 正在返回bytes对象。如果你迭代它(使用 enumerate 例如)您将检查 int s(文件的每个字符一个,取决于上下文),而不是任何类型的结构化“行”对象。
在开始要写入的循环之前,需要进行额外的处理。
像这样的事情可能会做你想做的:

import pydoop.hdfs as hdfs
from io import TextIOWrapper
from csv import DictReader

with hdfs.open("/user/ras.csv") as h,
     TextIOWrapper(h, *unknown_settings) as w,
     DictReader(w, *defaults_are_probably_ok) as dict_reader:
    for row in dict_reader:
        ...

相关问题