4 回答

TA貢獻(xiàn)2080條經(jīng)驗(yàn) 獲得超4個(gè)贊
您可以使用另一個(gè)列表來保存404 url(如果404 url小于正常url),然后獲取差異集,所以:
from urllib.request import urlopen
from urllib.error import HTTPError
exclude_urls = set()
try:
urlopen("url")
except HTTPError as err:
if err.code == 404:
exclude_urls.add(url)
valid_urls = set(all_urls) - exclude_urls

TA貢獻(xiàn)1789條經(jīng)驗(yàn) 獲得超10個(gè)贊
你可以這樣做:
from urllib.request import urlopen
from urllib.error import HTTPError
def load_data(csv_name):
...
def save_data(data,csv_name):
...
links=load_data(csv_name)
new_links=set()
for i in links:
try:
urlopen("url")
except HTTPError as err:
if err.code == 404:
print ('invalid')
else:
new_links.add(i)
save_data( list(new_links),csv_name)

TA貢獻(xiàn)1783條經(jīng)驗(yàn) 獲得超4個(gè)贊
嘗試這樣的事情:
from urllib.request import urlopen
from urllib.error import HTTPError
# 1. Load the CSV file into a list
with open('urls.csv', 'r') as file:
reader = csv.reader(file)
urls = [row[0] for row in reader] # Assuming each row has one URL
# 2. Check each URL for validity using your code
valid_urls = []
for url in urls:
try:
urlopen(url)
valid_urls.append(url)
except HTTPError as err:
if err.code == 404:
print(f'Invalid URL: {url}')
else:
raise # If it's another type of error, raise it so you're aware
# 3. Write the cleaned list back to the CSV file
with open('cleaned_urls.csv', 'w') as file:
writer = csv.writer(file)
for url in valid_urls:
writer.writerow([url])
- 4 回答
- 0 關(guān)注
- 202 瀏覽
添加回答
舉報(bào)