首頁猿問處理嵌套字典列表的最...

處理嵌套字典列表的最 Pythonic 方法是什么？

Python

慕的地6264312 2023-10-11 21:21:18

識別 API 返回的不同嵌套字典類型以便應(yīng)用正確類型的解析的最 Pythonic 方法是什么？我正在從 Reddit 進(jìn)行 API 調(diào)用來獲取 URL，并且正在獲取具有不同鍵名稱和不同嵌套字典結(jié)構(gòu)的嵌套字典。我正在拉取我需要的 URL，但我需要一種更 Pythonic 的方式來識別嵌套字典的不同鍵名稱和不同結(jié)構(gòu)，因為if我在一個for循環(huán)中嘗試的語句會遇到錯誤，因為“if”字典不包含key 我NoneType只是從if“詢問”該鍵是否在字典中的語句中得到一個錯誤。在接下來的幾段中，我描述了該問題，但您也許可以深入研究字典示例和下面的代碼，并了解我無法一次性識別三種類型字典之一的問題。嵌套字典沒有相同的結(jié)構(gòu)，我的代碼充滿了trys 和我認(rèn)為的冗余for循環(huán)。我有一個函數(shù)可以處理三種類型的嵌套字典。topics_data（下面使用）是一個 Pandas Dataframe，列是包含嵌套字典的vid列名稱。topics_data有時，單元格中的對象vid是None我正在閱讀的帖子是否不是視頻帖子。API 僅返回三種主要類型的嵌套字典（如果沒有None）。NoneType我最大的問題是，如果我嘗試if使用以另一個鍵開頭的鍵捕獲嵌套字典的語句（reddit_video例如相反），則識別第一個鍵名稱而不會出現(xiàn)錯誤oembed。由于這個問題，我為三種嵌套字典類型中的每一種迭代嵌套字典列表三次。我希望能夠迭代嵌套字典列表一次，并一次性識別和處理每種類型的嵌套字典。下面是我得到的三種不同類型的嵌套字典的示例，以及我現(xiàn)在設(shè)置來處理它們的丑陋代碼。我的代碼可以工作，但很丑陋。請挖進(jìn)去看看。嵌套字典...嵌套字典一{'reddit_video': {'fallback_url': 'https://v.redd.it/te7wsphl85121/DASH_2_4_M?source=fallback', 'height': 480, 'width': 480, 'scrubber_media_url': 'https://v.redd.it/te7wsphl85121/DASH_600_K', 'dash_url': 'https://v.redd.it/te7wsphl85121/DASHPlaylist.mpd?a=1604490293%2CYmQzNDllMmQ4MDVhMGZhODMyYmIxNDc4NTZmYWNlNzE2Nzc3ZGJjMmMzZGJjMmYxMjRiMjJiNDU4NGEzYzI4Yg%3D%3D&v=1&f=sd', 'duration': 17, 'hls_url': 'https://v.redd.it/te7wsphl85121/HLSPlaylist.m3u8?a=1604490293%2COTg2YmIxZmVmZGNlYTVjMmFiYjhkMzk5NDRlNWI0ZTY4OGE1NzgxNzUyMDhkYjFiNWYzN2IxYWNkZjM3ZDU2YQ%3D%3D&v=1&f=sd', 'is_gif': False, 'transcoding_status': 'completed'}}嵌套字典二{'type': 'gfycat.com', 'oembed': {'provider_url': 'https://gfycat.com', 'description': 'Hi! We use cookies and similar technologies ("cookies"), including third-party cookies, on this website to help operate and improve your experience on our site, monitor our site performance, and for advertising purposes. By clicking "Accept Cookies" below, you are giving us consent to use cookies (except consent is not required for cookies necessary to run our site).', 'title': 'Protestors in Hong Kong are cutting down facial recognition towers.', 'type': 'video', 'author_name': 'Gfycat', 'height': 600, 'width': 600,

查看完整描述

1 回答

拉丁的傳說

TA貢獻(xiàn)1789條經(jīng)驗獲得超8個贊

更新：意識到OP的文本正在處理非唯一的查找。添加了一段來描述如何做到這一點。

如果您發(fā)現(xiàn)自己多次循環(huán)字典列表來執(zhí)行查找，請將列表重組為字典，以便查找成為鍵。例如這個：

a = [{"id": 1, "value": "foo"}, {"id": 2, "value": "bar"}]

for item in a:

if item["id"] == 1:

print(item["value"])

可以變成這樣：

a = [{"id": 1, "value": "foo"}, {"id": 2, "value": "bar"}]

a = {item["id"]: item for item in a} # index by lookup field

print(a[1]["value"]) # no loop

... # Now we can continue to loopup by id eg a[2] without a loop

如果它是非唯一查找，您可以執(zhí)行類似的操作：

indexed = {}

a = [{"category": 1, "value": "foo"}, {"category": 2, "value": "bar"}, {"category": 1, "value": "baz"}]

for item in a: # This loop only has to be executed once

if indexed.get(item["category"], None) is not None:

indexed[item["category"]].append(item)

else:

indexed[item["category"]] = [item]

# Now we can do:

all_category_1_data = indexed[1]

all_category_2_data = indexed[2]

如果出現(xiàn)索引錯誤，請使用默認(rèn)字典索引來更輕松地處理

if a.get(1, None) is not None:

print(a[1]["value"])

else:

print("1 was not in the dictionary")

在我看來，這個 API 沒有任何“Pythonic”，但如果 API 返回您需要循環(huán)的列表，那么它可能是一個設(shè)計糟糕的 API

更新：好的，我會嘗試修復(fù)您的代碼：

def download_vid(topics_data, ydl_opts):

indexed_data = {'reddit': [], 'gfycat': [], 'thumbnail': []}

for item in topics_data['vid']:

if item.get('reddit_video', None) is not None:

indexed_data['reddit'].append(item)

elif item.get('type', None) == "gfycat.com":

indexed_data['gfycat'].append(item)

elif item.get('oembed', None) is not None:

if item['oembed'].get('thumbnail_url', None) is not None:

indexed_data['thumbnail'].append(item)

for k, v in indexed_data.items():

assert k in ('reddit_video', 'gfycat', 'thumbnail')

if k == 'reddit_video':

B = v['reddit_video']['fallback_rul']

...

elif k == 'gfycat':

C = v['oembed']['thumbnail_url']

...

elif k == 'thumbnail':

D = v['oembed']['thumbnail_url']

...

以防萬一不清楚為什么這樣更好：

OP 循環(huán)了 topic_data['vid'] 3 次。我做了兩次。
更重要的是，如果加更多的題目，我仍然只做兩次。OP將不得不再次循環(huán)。
無異常處理。
現(xiàn)在每組對象都已編入索引。所以O(shè)P可以做，例如indexed_data['gfycat']來獲取所有這些對象（如果需要的話），這是一個哈希表查找，所以它很快

反對回復(fù) 2023-10-11