第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

為了賬號安全,請及時綁定郵箱和手機(jī)立即綁定
已解決430363個問題,去搜搜看,總會有你想問的

如何將可能格式錯誤的xml解析為數(shù)據(jù)框?

如何將可能格式錯誤的xml解析為數(shù)據(jù)框?

明月笑刀無情 2021-04-30 14:16:48
我有一個從API看起來像這樣的xml。import requestsimport pandas as pdimport lxml.etree as etfrom lxml import etree url = 'abc.com' xml_data1 = requests.get(url).content print(xml_data1)xml_data1:    <?xml version="1.0" encoding="utf-8"?>    <Leads>      <Lead Id="123" LeadTitle="test, test.,  , (123) 456-7890, " CreateDate="01/01/2017 11:11:11" ModifyDate="01/04/2017 03:03:03" ACount="1" LCount="4" RCount="0" ROnly="false" Flagged="false" LastDistributionDate="01/01/2017 10:10:10" LeadFormType="test test">    <Campaign CampaignId="123" CampaignTitle="abc" />    <Status StatusId="123" StatusTitle="test" />    <Agent AgentId="123" AgentName="test, test" AgentEmail="a@a.com">      <AgentCustomFields custom1="test test, test" custom2="test" custom3="" custom4="" />    </Agent>    <Fields>      <Field FieldId="7" Value="a@a.com" FieldTitle="test" FieldType="test" />      <Field FieldId="8" Value="test" FieldTitle="test 1" FieldType="test" />      <Field FieldId="9" Value="test" FieldTitle="City" FieldType="Text" />      <Field FieldId="10" Value="test" FieldTitle="State" FieldType="State" />      <Field FieldId="11" Value="test" FieldTitle="test" FieldType="Zip" />      <Field FieldId="950" Value="test." FieldTitle="Business Name" FieldType="Text" />      <Field FieldId="1261" Value="Intuit Desktop" FieldTitle="test" FieldType="Text" />      <Field FieldId="1262" Value="test" FieldTitle="test" FieldType="Text" />      <Field FieldId="1263" Value="test" FieldTitle="test" FieldType="Number" />您是否有工作上的顧慮,我無法發(fā)布整個xml字符串,但它遵循上面的結(jié)構(gòu)。根據(jù)一個xml驗證器,該xml是正確的,但是當(dāng)我進(jìn)行另一個API調(diào)用并返回另一個xml字符串時,但是,當(dāng)我將可能格式錯誤的xml字符串傳遞給上述函數(shù)時,出現(xiàn)錯誤:AttributeError: 'xml.etree.ElementTree.Element' object has no attribute 'getroottree'由于可能格式錯誤的xml在同一個標(biāo)記中具有多個值,因此我認(rèn)為該函數(shù)無法對其進(jìn)行解析。我希望將可能格式錯誤的xml推送到平面數(shù)據(jù)框中。
查看完整描述

2 回答

?
九州編程

TA貢獻(xiàn)1785條經(jīng)驗 獲得超4個贊

自從您更新了問題以來,我決定用新的xml發(fā)布另一個答案。


from bs4 import BeautifulSoup 

import pandas as pd


xml = """

    <?xml version="1.0" encoding="utf-8"?>

    <Leads>

      <Lead Id="123" LeadTitle="test, test.,  , (123) 456-7890, " CreateDate="01/01/2017 11:11:11" ModifyDate="01/04/2017 03:03:03" ACount="1" LCount="4" RCount="0" ROnly="false" Flagged="false" LastDistributionDate="01/01/2017 10:10:10" LeadFormType="test test">

    <Campaign CampaignId="123" CampaignTitle="abc" />

    <Status StatusId="123" StatusTitle="test" />

    <Agent AgentId="123" AgentName="test, test" AgentEmail="a@a.com">

      <AgentCustomFields custom1="test test, test" custom2="test" custom3="" custom4="" />

    </Agent>

    <Fields>

      <Field FieldId="7" Value="a@a.com" FieldTitle="test" FieldType="test" />

      <Field FieldId="8" Value="test" FieldTitle="test 1" FieldType="test" />

      <Field FieldId="9" Value="test" FieldTitle="City" FieldType="Text" />

      <Field FieldId="10" Value="test" FieldTitle="State" FieldType="State" />

      <Field FieldId="11" Value="test" FieldTitle="test" FieldType="Zip" />

      <Field FieldId="950" Value="test." FieldTitle="Business Name" FieldType="Text" />

      <Field FieldId="1261" Value="Intuit Desktop" FieldTitle="test" FieldType="Text" />

      <Field FieldId="1262" Value="test" FieldTitle="test" FieldType="Text" />

      <Field FieldId="1263" Value="test" FieldTitle="test" FieldType="Number" />

      <Field FieldId="1267" Value="test" FieldTitle="test" FieldType="Text" />

      <Field FieldId="1310" Value="test" FieldTitle="test" FieldType="Phone" />

      <Field FieldId="1319" Value="test" FieldTitle="test" FieldType="Number" />

      <Field FieldId="1485" Value="test" FieldTitle="tst" FieldType="State" />

    </Fields>

    <Logs>

      <StatusLog>

        <Status LogId="123" LogDate="01/04/2017 03:08:44" StatusId="28" StatusTitle="test" AgentId="19" AgentName="test" AgentEmail="test@test.com" />

      </StatusLog>

      <ActionLog>

        <Action LogId="123" ActionTypeId="73" ActionTypeName="test" MilestoneId="1" ActionDate="01/04/2017 03:08:44" ActionNote="test" AgentId="19" AgentName="test,test" AgentEmail="test@test.com" />

      </ActionLog>

      <EmailLog>

        <Email LogId="123" SendDate="01/01/2017 20:53:39" EmailTemplateId="1" EmailTemplateName="test " AgentId="1" AgentName="test" AgentEmail="test@test.com" />

      </EmailLog>

      <DistributionLog>

        <Distribution LogId="1" LogDate="01/01/2017 10:10:08" DistributionProgramId="1" DistributionProgramName="test" AssignedAgentId="1" AssignedAgentName="test,test" AssignedAgentEmail="test@test.com" />

      </DistributionLog>

      <CreationLog LogId="1" LogDate="01/01/2017 10:10:05" Imported="true" CreatedByAgentId="1" CreatedByAgentName="test, test" CreatedByAgentEmail="test@test.com" />

    </Logs>

  </Lead>

</Leads>

"""


soup = BeautifulSoup(xml, "xml")

# Get Attributes from all nodes

attrs = []

for elm in soup():  # soup() is equivalent to soup.find_all()

    attrs.append(elm.attrs)


# Since you want the data in a dataframe, it makes sense for each field to be a new row consisting of all the other node attributes

fields_attribute_list= [x for x in attrs if 'FieldId' in x.keys()]

other_attribute_list = [x for x in attrs if 'FieldId' not in x.keys() and x != {}]


# Make a single dictionary with the attributes of all nodes except for the `Field` nodes.

attribute_dict = {}

for d in other_attribute_list:

    for k, v in d.items():  

        attribute_dict.setdefault(k, v)


# Update each field row with attributes from all other nodes.

full_list = []

for field in fields_attribute_list:

    field.update(attribute_dict)

    full_list.append(field)


# Make Dataframe

df = pd.DataFrame(full_list)

但是,請注意,此方法會使用相同的名稱(例如LogId您的xml中的名稱)覆蓋屬性ID 。無論如何,這段代碼應(yīng)該可以幫助您入門。


查看完整回答
反對 回復(fù) 2021-05-25
?
白衣染霜花

TA貢獻(xiàn)1796條經(jīng)驗 獲得超10個贊

我認(rèn)為您會發(fā)現(xiàn)BeautifulSoup執(zhí)行XML / HTML解析要容易得多。它還很好地處理了格式錯誤的XML和HTML。


pip install beautifulsoup4


以下是如何解析BeautifulSoup提供的xml。


from bs4 import BeautifulSoup 

import pandas as pd


xml = """

<?xml version="1.0" encoding="utf-8"?>

<Leads>

    <Lead Id="123" LeadTitle="test, test.,  , (123) 456-7890, " CreateDate="01/01/2017 11:11:11" ModifyDate="01/04/2017 03:03:03" ACount="1" LCount="4" RCount="0" ROnly="false" Flagged="false" LastDistributionDate="01/01/2017 10:10:10" LeadFormType="test test"></Lead>

    <Lead Id="123" />

    <Lead Id="456" />

</Leads>

"""


soup = BeautifulSoup(xml, "xml")

leads = soup.findAll('Lead')

lead_list = []

for lead in leads:

    lead_list.append(lead.attrs)


df = pd.DataFrame(lead_list)

df

輸出:


ACount  CreateDate  Flagged Id  LCount  LastDistributionDate    LeadFormType    LeadTitle   ModifyDate  RCount  ROnly

0   1   01/01/2017 11:11:11 false   123 4   01/01/2017 10:10:10 test test   test, test., , (123) 456-7890,  01/04/2017 03:03:03 0   false

1   NaN NaN NaN 123 NaN NaN NaN NaN NaN NaN NaN

2   NaN NaN NaN 456 NaN NaN NaN NaN NaN NaN NaN


查看完整回答
反對 回復(fù) 2021-05-25
  • 2 回答
  • 0 關(guān)注
  • 166 瀏覽
慕課專欄
更多

添加回答

舉報

0/150
提交
取消
微信客服

購課補(bǔ)貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動學(xué)習(xí)伙伴

公眾號

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號