第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

為了賬號(hào)安全,請及時(shí)綁定郵箱和手機(jī)立即綁定
已解決430363個(gè)問題,去搜搜看,總會(huì)有你想問的

我正在嘗試使用 Python 從 30 個(gè)相似鏈接中抓取多個(gè)表

我正在嘗試使用 Python 從 30 個(gè)相似鏈接中抓取多個(gè)表

DIEA 2023-06-20 10:16:43
我有 10 個(gè)公司鏈接。https://www.zaubacorp.com/company/ASHRAFI-MEDIA-NETWORK-PRIVATE-LIMITED/U22120GJ2019PTC111757,https://www.zaubacorp.com/company/METTLE-PUBLICATIONS-PRIVATE-LIMITED/U22120MH2019PTC329729,https://www.zaubacorp.com/company/PRINTSCAPE-INDIA-PRIVATE-LIMITED/U22120MH2020PTC335354,https://www.zaubacorp.com/company/CHARVAKA-TELEVISION-NETWORK-PRIVATE-LIMITED/U22121KA2019PTC126665,https://www.zaubacorp.com/company/BHOOKA-NANGA-FILMS-PRIVATE-LIMITED/U22130DL2019PTC353194,https://www.zaubacorp.com/company/WHITE-CAMERA-SCHOOL-OF-PHOTOGRAPHY-PRIVATE-LIMITED/U22130JH2019PTC013311,https://www.zaubacorp.com/company/RLE-PRODUCTIONS-PRIVATE-LIMITED/U22130KL2019PTC059208,https://www.zaubacorp.com/company/CATALIZADOR-MEDIA-PRIVATE-LIMITED/U22130KL2019PTC059793,https://www.zaubacorp.com/company/TRIPPLED-MEDIAWORKS-OPC-PRIVATE-LIMITED/U22130MH2019OPC333171,https://www.zaubacorp.com/company/KRYSTAL-CINEMAZ-PRIVATE-LIMITED/U22130MH2019PTC330391現(xiàn)在我正在嘗試從這些鏈接中抓取表格并將數(shù)據(jù)以良好的格式保存在 csv 列中。我想抓取“公司詳細(xì)信息”、“股本和員工人數(shù)”、“上市和年度合規(guī)詳細(xì)信息”、“聯(lián)系方式”、“董事詳細(xì)信息”的表格。如果任何表沒有數(shù)據(jù)或缺少任何列,我希望輸出 csv 文件中的該列為空白。我寫了一段代碼,但無法得到輸出。我在這里做錯(cuò)了什么。請幫忙import pandas as pdfrom bs4 import BeautifulSoupfrom urllib.request import urlopenimport requestsimport csvimport lxmlurl_file = "Zaubalinks.txt"with open(url_file, "r") as url:    url_pages = url.read()# we need to split each urls into lists to make it iterablepages = url_pages.split("\n") # Split by lines using \n# now we run a for loop to visit the urls one by onedata = []for single_page in pages:    r = requests.get(single_page)    soup = BeautifulSoup(r.content, 'html5lib')    table = soup.find_all('table')  # finds all tables    table_top = pd.read_html(str(table))[0]  # the top table
查看完整描述

2 回答

?
UYOU

TA貢獻(xiàn)1878條經(jīng)驗(yàn) 獲得超4個(gè)贊

import requests

from bs4 import BeautifulSoup

import pandas as pd


companies = {

    'ASHRAFI-MEDIA-NETWORK-PRIVATE-LIMITED/U22120GJ2019PTC111757',

    'METTLE-PUBLICATIONS-PRIVATE-LIMITED/U22120MH2019PTC329729',

    'PRINTSCAPE-INDIA-PRIVATE-LIMITED/U22120MH2020PTC335354',

    'CHARVAKA-TELEVISION-NETWORK-PRIVATE-LIMITED/U22121KA2019PTC126665',

    'BHOOKA-NANGA-FILMS-PRIVATE-LIMITED/U22130DL2019PTC353194',

    'WHITE-CAMERA-SCHOOL-OF-PHOTOGRAPHY-PRIVATE-LIMITED/U22130JH2019PTC013311',

    'RLE-PRODUCTIONS-PRIVATE-LIMITED/U22130KL2019PTC059208',

    'CATALIZADOR-MEDIA-PRIVATE-LIMITED/U22130KL2019PTC059793',

    'TRIPPLED-MEDIAWORKS-OPC-PRIVATE-LIMITED/U22130MH2019OPC333171',

    'KRYSTAL-CINEMAZ-PRIVATE-LIMITED/U22130MH2019PTC330391'

}



def main(url):

    with requests.Session() as req:

        goal = []

        for company in companies:

            r = req.get(url.format(company))

            df = pd.read_html(r.content)

            target = pd.concat([df[x].T for x in [0, 3, 4]], axis=1)

            goal.append(target)

        new = pd.concat(goal)

        new.to_csv("data.csv")



main("https://www.zaubacorp.com/company/{}")


查看完整回答
反對 回復(fù) 2023-06-20
?
largeQ

TA貢獻(xiàn)2039條經(jīng)驗(yàn) 獲得超8個(gè)贊

Fortunatley,看來您可以使用更簡單的方法到達(dá)那里。以一個(gè)隨機(jī)鏈接為例,它應(yīng)該是這樣的:


url = 'https://www.zaubacorp.com/company/CHARVAKA-TELEVISION-NETWORK-PRIVATE-LIMITED/U22121KA2019PTC126665'


import pandas as pd


tables = pd.read_html(url)

從這里開始,您的表格位于tables[0]、tables[3]、tables[4]、tables[15]等中。只需使用一個(gè)for循環(huán)來輪換所有 url。


查看完整回答
反對 回復(fù) 2023-06-20
  • 2 回答
  • 0 關(guān)注
  • 175 瀏覽
慕課專欄
更多

添加回答

舉報(bào)

0/150
提交
取消
微信客服

購課補(bǔ)貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動(dòng)學(xué)習(xí)伙伴

公眾號(hào)

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號(hào)