首頁(yè) 猿問(wèn) 從pandas DataFrame...

從pandas DataFrame python中提取文件路徑

Python

BIG陽(yáng) 2022-05-24 10:35:11

我有一個(gè) Excel 文件，其中包含列中文件夾的文件路徑。可能有多個(gè)文件路徑存儲(chǔ)在一行中。我可以像這樣將excel文件讀入熊貓?，F(xiàn)在我要做的是DataFrame df逐行遍歷我的 pandas 并提取存儲(chǔ)的目錄，以便我可以將它們用作其他功能的輸入目錄。如果我使用 iloc 訪(fǎng)問(wèn)數(shù)據(jù)框中的行，我會(huì)得到一個(gè)str類(lèi)似的對(duì)象，而我想要的是擁有該類(lèi)型的每一行，list這樣我就可以遍歷它。我的數(shù)據(jù)框中變量格式的示例。import pandas as pdpath_1 = '[\'C:\\\\tmp_patients\\\\Pat_MAV_BE_B01_\']'path_2 = '[\'C:\\\\tmp_patients\\\\Pat_MAV_B16\', \'C:\\\\tmp_patients\\\\Pat_MAV_BE_B16_2017-06-30_08-49-28\']'d = {'col1': [path_1, path_2]}df = pd.DataFrame(data=d)#or read directly excel # df= pd.read_excel(filepath_to_excel)for idx in range(len(df)): paths = df['col1'].iloc[idx] for a_single_path in paths: print(a_single_path) # todo: process all the files found at the location "a single path" with os.walk讀取文件后數(shù)據(jù)的外觀(guān)pd.read_excel()

查看完整描述

2 回答

慕村9548890

TA貢獻(xiàn)1884條經(jīng)驗(yàn) 獲得超4個(gè)贊

如果您想要單個(gè)目錄的行：

數(shù)據(jù)：

注意正在使用的列名是file_path_lists，但問(wèn)題截圖中的列名是col1

from pathlib import Path

from ast import literal_eval

df = pd.read_excel('test.xlsx')

將行 from和each轉(zhuǎn)換str為單獨(dú)的行：listexplodelist

df.file_path_lists = df.file_path_lists.apply(literal_eval)

df2 = pd.DataFrame(df.explode('file_path_lists'))

df2.dropna(inplace=True)

print(df2.file_path_lists[0])

>>> 'C:\\tmp_patients\\Pat_MAV_BE_B01_'

注意路徑仍然是str

轉(zhuǎn)換為pathlib對(duì)象：

pathlib標(biāo)準(zhǔn)庫(kù)的一部分，應(yīng)該使用而不是os. Python 3 的 pathlib 模塊：馴服文件系統(tǒng)

df2.file_path_lists = df2.file_path_lists.apply(Path)

print(df2.file_path_lists[0])

>>> WindowsPath('C:/tmp_patients/Pat_MAV_BE_B01_')

現(xiàn)在每個(gè)都是一個(gè)pathlib對(duì)象。

訪(fǎng)問(wèn)每個(gè)目錄：

for dir in df2.file_path_lists:

print(dir)

print(type(dir))

>>> C:\tmp_patients\Pat_MAV_BE_B01_

C:\tmp_patients\Pat_MAV_B16

C:\tmp_patients\Pat_MAV_BE_B16_2017-06-30_08-49-28

打印在患者目錄中找到的文件列表：

for dir in df2.file_path_lists:

patient_files = list(dir.glob('*.*')) # use .rglob if there are subdirs

print(patient_files)

如果您想要行l(wèi)ists而不是每個(gè)目錄的行：

跳過(guò).explode

df = pd.read_excel('test.xlsx')

df.file_path_lists = df.file_path_lists.apply(literal_eval)

print(type(df.file_path_lists[0]))

>>> list

for row in df.file_path_lists: # iterate the row

for x in row: # iterate the list inside the row

print(x)

>>> C:\tmp_patients\Pat_MAV_BE_B01_

C:\tmp_patients\Pat_MAV_B16

C:\tmp_patients\Pat_MAV_BE_B16_2017-06-30_08-49-28

反對(duì) 回復(fù) 2022-05-24

慕碼人8056858

TA貢獻(xiàn)1803條經(jīng)驗(yàn) 獲得超6個(gè)贊

您的示例輸入具有看起來(lái)像數(shù)組的字符串。我認(rèn)為read_excel不會(huì)那樣做，所以你不需要.apply(literal_eval)下面的電話(huà)。

假設(shè)您使用的是 pandas 0.25 或更高版本，因此您可以使用explode：

from ast import literal_eval

path_1 = "['C:\\\\develop\\\\python-util-script\\\\Pat_MAV_B01']"

path_2 = "['C:\\\\develop\\\\python-util-script\\\\Pat_MAV_B16', 'C:\\\\develop\\\\python-util-script\\\\Pat_MAV_BE_B16_2017-06-30_08-49-28']"

d = {'col1': [path_1, path_2]}

df = pd.DataFrame(data=d)

df['col1'].apply(literal_eval).explode()

輸出：

0 C:\develop\python-util-script\Pat_MAV_B01

1 C:\develop\python-util-script\Pat_MAV_B16

1 C:\develop\python-util-script\Pat_MAV_BE_B16_2...

Name: col1, dtype: object

反對(duì) 回復(fù) 2022-05-24

2 回答
0 關(guān)注
222 瀏覽

關(guān)注

添加回答

舉報(bào)

0/150

提交

取消

使用 Ctrl+D 可將網(wǎng)站添加到書(shū)簽

微信客服

購(gòu)課補(bǔ)貼
聯(lián)系客服咨詢(xún)優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動(dòng)學(xué)習(xí)伙伴

公眾號(hào)

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號(hào)

第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

熱搜

最近搜索清空

從pandas DataFrame python中提取文件路徑

從pandas DataFrame python中提取文件路徑

2 回答

如果您想要單個(gè)目錄的行：

數(shù)據(jù)：

添加回答