我想先说我没有XML文件的工作经验。我只是在过去的几天里开始研究这些数据,并有一些代码,有点工作,它只是需要一些调整。我的代码在这里:
import pandas as pd
import numpy as np
import xml.etree.ElementTree as ET
#parses the orders.xml file and stores in the "tree" variable
tree = ET.parse("orders (1).xml")
#gets the root of the parsed tree
root = tree.getroot()
#creating lists to store xml data in
OrderID = []
CustomerID = []
SKU = []
Price = []
Quantity = []
#goes through the xml file looking for the specified tag and stores the value into the list created above
for orderid in root.iter('OrderId'):
OrderID.append(orderid.text)
for customerid in root.iter('CustomerId'):
CustomerID.append(customerid.text)
for sku in root.iter('Sku'):
SKU.append(sku.text)
for price in root.iter('PriceInclTax'):
Price.append(price.text)
for quantity in root.iter('Quantity'):
Quantity.append(quantity.text)
# writes the data stored in the lists above to a pandas dataframe
df = pd.DataFrame(
list(zip(OrderID, CustomerID, SKU, Price, Quantity)), columns=['Order ID', 'Customer ID', 'SKU', 'Price', 'Quantity']
)
# prints the dataframe
print(df)
结果如下:
Order ID Customer ID SKU Price Quantity
0 518 18321022 2BF207HT750-150 83.2300 8
1 517 18265518 4BF209HT750-175 107.5200 2
2 516 18344682 HUCP211-35KX 560.0000 2
3 515 18344503 6206-2RC4KX 58.8300 1
4 514 18352425 6206-ZZC4KX 47.8900 13
5 513 18265596 HUCF206-19KX 242.0000 8
6 512 18352395 4BF209HT750-175 352.7800 10
7 511 18266176 4BF209SHT750-175P 825.7200 10
8 510 18351938 SUCF208-24 28251299.4606 4
9 509 18265518 125855 567.4500 1
10 508 18351852 6003-2RC4KX 73.4686 1
11 506 18350111 4BF212HT750-250 152.4800 4
12 505 18327149 6200-2RSH 4.4600 6
13 504 18351172 6205-2RC4KX 68.0700 1
14 503 18266179 6203-2RC4KX 62.3800 1
15 502 18266179 4BF210HT750-200 162.2300 6
16 501 18265517 6204-ZZC4KX 34.8400 1
17 500 18265548 CG3-7000-NE 0.0000 1
18 499 18348139 CG3-7000-E 0.0000 1
19 498 18321909 CG3-7000-E 8920.0000 1
20 497 18337101 CG3-7000-NE 10770.0000 1
21 495 18348878 4BF207HT750-150 281.6700 24
22 494 18348911 UCF211-35 185.4930 6
23 493 18321909 HUC207-23KX 276.3900 2
24 492 18265594 4BF316SHT100-400P 1801.3900 20
25 491 18265533 2PB316SHT750-400P 1651.4400 1
26 490 18347438 SUCF205-16 134.8800 2
我遇到的问题是其中一些订单上有多个项目,这并没有反映出这一点。每个订单的项目数据存储在标记中,然后。我的代码没有说明每个订单可以有多个。下面是订单ID 515的XML文档的源代码
<Order>
<OrderId>515</OrderId>
<OrderGuid></OrderGuid>
<StoreId>1</StoreId>
<CustomerId></CustomerId>
<OrderStatusId></OrderStatusId>
<PaymentStatusId></PaymentStatusId>
<ShippingStatusId></ShippingStatusId>
<CustomerLanguageId></CustomerLanguageId>
<CustomerTaxDisplayTypeId></CustomerTaxDisplayTypeId>
<CustomerIp></CustomerIp>
<OrderSubtotalInclTax>681.4000</OrderSubtotalInclTax>
<OrderSubtotalExclTax>681.4000</OrderSubtotalExclTax>
<OrderSubTotalDiscountInclTax>0.0000</OrderSubTotalDiscountInclTax>
<OrderSubTotalDiscountExclTax>0.0000</OrderSubTotalDiscountExclTax>
<OrderShippingInclTax>0.0000</OrderShippingInclTax>
<OrderShippingExclTax>0.0000</OrderShippingExclTax>
<PaymentMethodAdditionalFeeInclTax>0.0000</PaymentMethodAdditionalFeeInclTax>
<PaymentMethodAdditionalFeeExclTax>0.0000</PaymentMethodAdditionalFeeExclTax>
<TaxRates>0.00:0.00; </TaxRates>
<OrderTax>0.0000</OrderTax>
<OrderTotal>681.4000</OrderTotal>
<RefundedAmount>0.0000</RefundedAmount>
<OrderDiscount>0.0000</OrderDiscount>
<CurrencyRate>1.0000</CurrencyRate>
<CustomerCurrencyCode>USD</CustomerCurrencyCode>
<AffiliateId>0</AffiliateId>
<AllowStoringCreditCardNumber>False</AllowStoringCreditCardNumber>
<CardType />
<CardName />
<CardNumber />
<MaskedCreditCardNumber />
<CardCvv2 />
<CardExpirationMonth />
<CardExpirationYear />
<PaymentMethodSystemName>Payments.PurchaseOrder</PaymentMethodSystemName>
<AuthorizationTransactionId />
<AuthorizationTransactionCode />
<AuthorizationTransactionResult />
<CaptureTransactionId />
<CaptureTransactionResult />
<SubscriptionTransactionId />
<PaidDateUtc>03/08/2023 21:29:31</PaidDateUtc>
<ShippingMethod>UPS Next Day Air®</ShippingMethod>
<ShippingRateComputationMethodSystemName>Shipping.Director</ShippingRateComputationMethodSystemName>
<CustomValuesXml></CustomValuesXml>
<VatNumber />
<Deleted>False</Deleted>
<CreatedOnUtc>3/8/2023 9:29:31 PM</CreatedOnUtc>
<OrderItems>
<OrderItem>
<Id>633</Id>
<OrderItemGuid>1cb035ca-73c8-4124-94d9-c14c54b669d8</OrderItemGuid>
<Name>30mm 6206 High Temperature 220 °C Sealed C4 Bearing</Name>
<Sku>6206-2RC4KX</Sku>
<PriceExclTax>58.8300</PriceExclTax>
<PriceInclTax>58.8300</PriceInclTax>
<Quantity>1</Quantity>
<DiscountExclTax>0.0000</DiscountExclTax>
<DiscountInclTax>0.0000</DiscountInclTax>
<TotalExclTax>58.8300</TotalExclTax>
<TotalInclTax>58.8300</TotalInclTax>
</OrderItem>
<OrderItem>
<Id>634</Id>
<OrderItemGuid>1a4a005f-e669-4420-9379-611e5b3f637c</OrderItemGuid>
<Name>30mm 6206 High Temperature 250 °C Shielded ZZ C4 Bearing</Name>
<Sku>6206-ZZC4KX</Sku>
<PriceExclTax>47.8900</PriceExclTax>
<PriceInclTax>47.8900</PriceInclTax>
<Quantity>13</Quantity>
<DiscountExclTax>0.0000</DiscountExclTax>
<DiscountInclTax>0.0000</DiscountInclTax>
<TotalExclTax>622.5700</TotalExclTax>
<TotalInclTax>622.5700</TotalInclTax>
</OrderItem>
</OrderItems>
<Shipments>
<Shipment>
<ShipmentId>391</ShipmentId>
<TrackingNumber>1Z4054470191053867</TrackingNumber>
<TotalWeight>6.1600</TotalWeight>
<ShippedDateUtc>3/8/2023 9:31:38 PM</ShippedDateUtc>
<DeliveryDateUtc />
<CreatedOnUtc>03/08/2023 21:31:38</CreatedOnUtc>
</Shipment>
</Shipments>
</Order>
我需要帮助的是拥有包含多个行项目的订单,并能够将其写入panda Dataframe 。OrderID 515应该是索引3和索引4中的SKU。
我原以为有多个项目的订单会有多个行,而不是订单515中的项目转到下一个订单ID为516的下一行。
下面是两个订单的XML:
<Order>
<OrderId>516</OrderId>
<OrderGuid></OrderGuid>
<StoreId></StoreId>
<CustomerId></CustomerId>
<OrderStatusId>30</OrderStatusId>
<PaymentStatusId>30</PaymentStatusId>
<ShippingStatusId>30</ShippingStatusId>
<CustomerLanguageId>1</CustomerLanguageId>
<CustomerTaxDisplayTypeId>10</CustomerTaxDisplayTypeId>
<CustomerIp></CustomerIp>
<OrderSubtotalInclTax>1120.0000</OrderSubtotalInclTax>
<OrderSubtotalExclTax>1120.0000</OrderSubtotalExclTax>
<OrderSubTotalDiscountInclTax>0.0000</OrderSubTotalDiscountInclTax>
<OrderSubTotalDiscountExclTax>0.0000</OrderSubTotalDiscountExclTax>
<OrderShippingInclTax>0.0000</OrderShippingInclTax>
<OrderShippingExclTax>0.0000</OrderShippingExclTax>
<PaymentMethodAdditionalFeeInclTax>0.0000</PaymentMethodAdditionalFeeInclTax>
<PaymentMethodAdditionalFeeExclTax>0.0000</PaymentMethodAdditionalFeeExclTax>
<TaxRates>0.00:0.00;</TaxRates>
<OrderTax>0.0000</OrderTax>
<OrderTotal>1120.0000</OrderTotal>
<RefundedAmount>0.0000</RefundedAmount>
<OrderDiscount>0.0000</OrderDiscount>
<CurrencyRate>1.0000</CurrencyRate>
<CustomerCurrencyCode>USD</CustomerCurrencyCode>
<AffiliateId>0</AffiliateId>
<AllowStoringCreditCardNumber>False</AllowStoringCreditCardNumber>
<CardType />
<CardName />
<CardNumber />
<MaskedCreditCardNumber></MaskedCreditCardNumber>
<CardCvv2 />
<CardExpirationMonth />
<CardExpirationYear />
<PaymentMethodSystemName></PaymentMethodSystemName>
<AuthorizationTransactionId></AuthorizationTransactionId>
<AuthorizationTransactionCode></AuthorizationTransactionCode>
<AuthorizationTransactionResult>Approved (1: This transaction has been approved.)</AuthorizationTransactionResult>
<CaptureTransactionId></CaptureTransactionId>
<CaptureTransactionResult>Approved (1: This transaction has been approved.)</CaptureTransactionResult>
<SubscriptionTransactionId />
<PaidDateUtc>03/09/2023 14:43:09</PaidDateUtc>
<ShippingMethod>Other</ShippingMethod>
<ShippingRateComputationMethodSystemName>Shipping.Director</ShippingRateComputationMethodSystemName>
<CustomValuesXml />
<VatNumber />
<Deleted>False</Deleted>
<CreatedOnUtc>3/9/2023 2:39:57 PM</CreatedOnUtc>
<OrderItems>
<OrderItem>
<Id>635</Id>
<OrderItemGuid>9dac1add-310b-41f7-a5af-36ff680b46de</OrderItemGuid>
<Name>2-3/16" High Temperature 500 °F Mounted Pillow Block Bearing</Name>
<Sku>HUCP211-35KX</Sku>
<PriceExclTax>560.0000</PriceExclTax>
<PriceInclTax>560.0000</PriceInclTax>
<Quantity>2</Quantity>
<DiscountExclTax>0.0000</DiscountExclTax>
<DiscountInclTax>0.0000</DiscountInclTax>
<TotalExclTax>1120.0000</TotalExclTax>
<TotalInclTax>1120.0000</TotalInclTax>
</OrderItem>
</OrderItems>
<Shipments>
<Shipment>
<ShipmentId>392</ShipmentId>
<TrackingNumber>1ZB3B6620328442937</TrackingNumber>
<TotalWeight>17.0000</TotalWeight>
<ShippedDateUtc>3/9/2023 2:40:56 PM</ShippedDateUtc>
<DeliveryDateUtc />
<CreatedOnUtc>03/09/2023 14:40:55</CreatedOnUtc>
</Shipment>
</Shipments>
</Order>
<Order>
<OrderId>515</OrderId>
<OrderGuid></OrderGuid>
<StoreId>1</StoreId>
<CustomerId></CustomerId>
<OrderStatusId>30</OrderStatusId>
<PaymentStatusId>10</PaymentStatusId>
<ShippingStatusId>30</ShippingStatusId>
<CustomerLanguageId>1</CustomerLanguageId>
<CustomerTaxDisplayTypeId>10</CustomerTaxDisplayTypeId>
<CustomerIp></CustomerIp>
<OrderSubtotalInclTax>681.4000</OrderSubtotalInclTax>
<OrderSubtotalExclTax>681.4000</OrderSubtotalExclTax>
<OrderSubTotalDiscountInclTax>0.0000</OrderSubTotalDiscountInclTax>
<OrderSubTotalDiscountExclTax>0.0000</OrderSubTotalDiscountExclTax>
<OrderShippingInclTax>0.0000</OrderShippingInclTax>
<OrderShippingExclTax>0.0000</OrderShippingExclTax>
<PaymentMethodAdditionalFeeInclTax>0.0000</PaymentMethodAdditionalFeeInclTax>
<PaymentMethodAdditionalFeeExclTax>0.0000</PaymentMethodAdditionalFeeExclTax>
<TaxRates>0.00:0.00; </TaxRates>
<OrderTax>0.0000</OrderTax>
<OrderTotal>681.4000</OrderTotal>
<RefundedAmount>0.0000</RefundedAmount>
<OrderDiscount>0.0000</OrderDiscount>
<CurrencyRate>1.0000</CurrencyRate>
<CustomerCurrencyCode>USD</CustomerCurrencyCode>
<AffiliateId>0</AffiliateId>
<AllowStoringCreditCardNumber>False</AllowStoringCreditCardNumber>
<CardType />
<CardName />
<CardNumber />
<MaskedCreditCardNumber />
<CardCvv2 />
<CardExpirationMonth />
<CardExpirationYear />
<PaymentMethodSystemName>Payments.PurchaseOrder</PaymentMethodSystemName>
<AuthorizationTransactionId />
<AuthorizationTransactionCode />
<AuthorizationTransactionResult />
<CaptureTransactionId />
<CaptureTransactionResult />
<SubscriptionTransactionId />
<PaidDateUtc>03/08/2023 21:29:31</PaidDateUtc>
<ShippingMethod>UPS Next Day Air®</ShippingMethod>
<ShippingRateComputationMethodSystemName>Shipping.Director</ShippingRateComputationMethodSystemName>
<CustomValuesXml></CustomValuesXml>
<VatNumber />
<Deleted>False</Deleted>
<CreatedOnUtc>3/8/2023 9:29:31 PM</CreatedOnUtc>
<OrderItems>
<OrderItem>
<Id>633</Id>
<OrderItemGuid>1cb035ca-73c8-4124-94d9-c14c54b669d8</OrderItemGuid>
<Name>30mm 6206 High Temperature 220 °C Sealed C4 Bearing</Name>
<Sku>6206-2RC4KX</Sku>
<PriceExclTax>58.8300</PriceExclTax>
<PriceInclTax>58.8300</PriceInclTax>
<Quantity>1</Quantity>
<DiscountExclTax>0.0000</DiscountExclTax>
<DiscountInclTax>0.0000</DiscountInclTax>
<TotalExclTax>58.8300</TotalExclTax>
<TotalInclTax>58.8300</TotalInclTax>
</OrderItem>
<OrderItem>
<Id>634</Id>
<OrderItemGuid>1a4a005f-e669-4420-9379-611e5b3f637c</OrderItemGuid>
<Name>30mm 6206 High Temperature 250 °C Shielded ZZ C4 Bearing</Name>
<Sku>6206-ZZC4KX</Sku>
<PriceExclTax>47.8900</PriceExclTax>
<PriceInclTax>47.8900</PriceInclTax>
<Quantity>13</Quantity>
<DiscountExclTax>0.0000</DiscountExclTax>
<DiscountInclTax>0.0000</DiscountInclTax>
<TotalExclTax>622.5700</TotalExclTax>
<TotalInclTax>622.5700</TotalInclTax>
</OrderItem>
</OrderItems>
<Shipments>
<Shipment>
<ShipmentId>391</ShipmentId>
<TrackingNumber>1Z4054470191053867</TrackingNumber>
<TotalWeight>6.1600</TotalWeight>
<ShippedDateUtc>3/8/2023 9:31:38 PM</ShippedDateUtc>
<DeliveryDateUtc />
<CreatedOnUtc>03/08/2023 21:31:38</CreatedOnUtc>
</Shipment>
</Shipments>
</Order>
2条答案
按热度按时间pgccezyw1#
考虑将迭代与嵌套循环联系起来,这可以通过解析实现:
jdgnovmf2#
尝试使用pandas.read_xml
对于xpath,
.
选择当前节点,//
从当前节点中选择文档中与所选内容匹配的节点,无论这些节点位于何处。|
是or运算符。