首頁(yè) 猿問(wèn) 如何避免在 heapq 中使用...

如何避免在 heapq 中使用 _siftup 或 _siftdown

Python

慕桂英3389331 2021-12-17 16:41:05

我不知道如何在不使用_siftupor 的情況下有效地解決以下問(wèn)題_siftdown：當(dāng)一個(gè)元素?zé)o序時(shí)，如何恢復(fù)堆不變式？換句話說(shuō)，更新old_value中heap至new_value，并保持heap工作。您可以假設(shè)old_value堆中只有一個(gè)。功能定義如下：def update_value_in_heap(heap, old_value, new_value):這是我的真實(shí)場(chǎng)景，有興趣的可以看看。你可以想象它是一個(gè)小型的自動(dòng)完成系統(tǒng)。我需要統(tǒng)計(jì)單詞出現(xiàn)的頻率，并保持前k個(gè)max-count單詞，隨時(shí)準(zhǔn)備輸出。所以我heap在這里使用。當(dāng)一個(gè)字?jǐn)?shù)++時(shí)，如果它在堆中，我需要更新它。所有的詞和計(jì)數(shù)都存儲(chǔ)在 trie-tree 的葉子中，heaps存儲(chǔ)在 trie-tree 的中間節(jié)點(diǎn)中。如果你關(guān)心堆外這個(gè)詞，別擔(dān)心，我可以從trie-tree的葉子節(jié)點(diǎn)中得到它。當(dāng)用戶輸入一個(gè)單詞時(shí)，它將首先從堆中讀取然后更新它。為了獲得更好的性能，我們可以考慮通過(guò)批量更新來(lái)降低更新頻率。那么當(dāng)一個(gè)特定的字?jǐn)?shù)增加時(shí)，如何更新堆呢？這是 _siftup 或 _siftdown 版本的簡(jiǎn)單示例（不是我的場(chǎng)景）：>>> from heapq import _siftup, _siftdown, heapify, heappop>>> data = [10, 5, 18, 2, 37, 3, 8, 7, 19, 1]>>> heapify(data)>>> old, new = 8, 22 # increase the 8 to 22>>> i = data.index(old)>>> data[i] = new>>> _siftup(data, i)>>> [heappop(data) for i in range(len(data))][1, 2, 3, 5, 7, 10, 18, 19, 22, 37]>>> data = [10, 5, 18, 2, 37, 3, 8, 7, 19, 1]>>> heapify(data)>>> old, new = 8, 4 # decrease the 8 to 4>>> i = data.index(old)>>> data[i] = new>>> _siftdown(data, 0, i)>>> [heappop(data) for i in range(len(data))][1, 2, 3, 4, 5, 7, 10, 18, 19, 37]索引的成本為 O(n)，更新的成本為 O(logn)。heapify是另一種解決方案，但效率低于_siftup或_siftdown。但是_siftupand_siftdown是heapq中的protected成員，所以不建議從外部訪問(wèn)。那么有沒(méi)有更好更有效的方法來(lái)解決這個(gè)問(wèn)題呢？這種情況的最佳實(shí)踐？感謝您的閱讀，我真的很感謝它幫助我。:)

查看完整描述

2 回答

慕勒3428872

TA貢獻(xiàn)1848條經(jīng)驗(yàn) 獲得超6個(gè)贊

使用heapify。

您必須記住的一件重要事情是理論復(fù)雜性和性能是兩個(gè)不同的東西（即使它們是相關(guān)的）。換句話說(shuō)，實(shí)施也很重要。漸近復(fù)雜性為您提供了一些下限，您可以將其視為保證，例如 O(n) 中的算法確保在最壞的情況下，您將執(zhí)行許多輸入大小線性的指令。這里有兩件重要的事情：

常量被忽略，但常量在現(xiàn)實(shí)生活中很重要；
最壞的情況取決于您考慮的算法，而不僅僅是輸入。

根據(jù)您考慮的主題/問(wèn)題，第一點(diǎn)可能非常重要。在某些域中，隱藏在漸近復(fù)雜性中的常量非常大，以至于您甚至無(wú)法構(gòu)建大于常量的輸入（或者考慮該輸入是不現(xiàn)實(shí)的）。情況并非如此，但這是您始終必須牢記的事情。

給出這兩個(gè)觀察結(jié)果，你真的不能說(shuō)：實(shí)現(xiàn) B 比 A 快，因?yàn)?A 派生自 O(n) 算法而 B 派生自 O(log n) 算法。即使這是一個(gè)很好的論據(jù)，但它并不總是足夠的。當(dāng)所有輸入發(fā)生的可能性相同時(shí)，理論復(fù)雜性特別適合比較算法。換句話說(shuō)，當(dāng)你的算法非常通用時(shí)。

如果您知道您的用例和輸入是什么，您可以直接測(cè)試性能。使用測(cè)試和漸近復(fù)雜度將使您對(duì)算法的執(zhí)行方式有一個(gè)很好的了解（在極端情況下和任意實(shí)際情況下）。

話雖如此，讓我們對(duì)以下將實(shí)施三種不同策略的類(lèi)運(yùn)行一些性能測(cè)試（這里實(shí)際上有四種策略，但Invalidate 和 Reinsert在您的情況下似乎不正確，因?yàn)槟鷷?huì)使每個(gè)項(xiàng)目無(wú)效的次數(shù)與你看到一個(gè)給定的詞）。我將包括我的大部分代碼，以便您可以仔細(xì)檢查我沒(méi)有搞砸（您甚至可以檢查完整的筆記本）：

from heapq import _siftup, _siftdown, heapify, heappop

class Heap(list):

def __init__(self, values, sort=False, heap=False):

super().__init__(values)

heapify(self)

self._broken = False

self.sort = sort

self.heap = heap or not sort

# Solution 1) repair using the knowledge we have after every update:

def update(self, key, value):

old, self[key] = self[key], value

if value > old:

_siftup(self, key)

else:

_siftdown(self, 0, key)

# Solution 2 and 3) repair using sort/heapify in a lazzy way:

def __setitem__(self, key, value):

super().__setitem__(key, value)

self._broken = True

def __getitem__(self, key):

if self._broken:

self._repair()

self._broken = False

return super().__getitem__(key)

def _repair(self):

if self.sort:

self.sort()

elif self.heap:

heapify(self)

# … you'll also need to delegate all other heap functions, for example:

def pop(self):

self._repair()

return heappop(self)

我們可以首先檢查所有三種方法是否有效：

data = [10, 5, 18, 2, 37, 3, 8, 7, 19, 1]

heap = Heap(data[:])

heap.update(8, 22)

heap.update(7, 4)

print(heap)

heap = Heap(data[:], sort_fix=True)

heap[8] = 22

heap[7] = 4

print(heap)

heap = Heap(data[:], heap_fix=True)

heap[8] = 22

heap[7] = 4

print(heap)

然后我們可以使用以下函數(shù)運(yùn)行一些性能測(cè)試：

import time

import random

def rand_update(heap, lazzy_fix=False, **kwargs):

index = random.randint(0, len(heap)-1)

new_value = random.randint(max_int+1, max_int*2)

if lazzy_fix:

heap[index] = new_value

else:

heap.update(index, new_value)

def rand_updates(n, heap, lazzy_fix=False, **kwargs):

for _ in range(n):

rand_update(heap, lazzy_fix)

def run_perf_test(n, data, **kwargs):

test_heap = Heap(data[:], **kwargs)

t0 = time.time()

rand_updates(n, test_heap, **kwargs)

test_heap[0]

return (time.time() - t0)*1e3

results = []

max_int = 500

nb_updates = 1

for i in range(3, 7):

test_size = 10**i

test_data = [random.randint(0, max_int) for _ in range(test_size)]

perf = run_perf_test(nb_updates, test_data)

results.append((test_size, "update", perf))

perf = run_perf_test(nb_updates, test_data, lazzy_fix=True, heap_fix=True)

results.append((test_size, "heapify", perf))

perf = run_perf_test(nb_updates, test_data, lazzy_fix=True, sort_fix=True)

results.append((test_size, "sort", perf))

結(jié)果如下：

import pandas as pd

import seaborn as sns

dtf = pd.DataFrame(results, columns=["heap size", "method", "duration (ms)"])

print(dtf)

sns.lineplot(

data=dtf,

x="heap size",

y="duration (ms)",

hue="method",

)

從這些測(cè)試中我們可以看到，這heapify似乎是最合理的選擇，它在最壞的情況下具有相當(dāng)?shù)膹?fù)雜性：O(n) 并且在實(shí)踐中表現(xiàn)更好。另一方面，調(diào)查其他選項(xiàng)可能是個(gè)好主意（例如擁有專(zhuān)用于該特定問(wèn)題的數(shù)據(jù)結(jié)構(gòu)，例如使用垃圾箱將單詞放入其中，然后將它們從垃圾箱移動(dòng)到下一個(gè)看起來(lái)像可能的軌道調(diào)查）。

重要說(shuō)明：這種情況（更新與讀取比率為 1:1）對(duì)heapify和sort解決方案都不利。所以，如果你管理有AK：1分的比例，這一結(jié)論將更加清晰（你可以替換nb_updates = 1與nb_updates = k上面的代碼）。

數(shù)據(jù)框詳細(xì)信息：

heap size method duration in ms

0 1000 update 0.435114

1 1000 heapify 0.073195

2 1000 sort 0.101089

3 10000 update 1.668930

4 10000 heapify 0.480175

5 10000 sort 1.151085

6 100000 update 13.194084

7 100000 heapify 4.875898

8 100000 sort 11.922121

9 1000000 update 153.587103

10 1000000 heapify 51.237106

11 1000000 sort 145.306110

反對(duì) 回復(fù) 2021-12-17

RISEBY

TA貢獻(xiàn)1856條經(jīng)驗(yàn) 獲得超5個(gè)贊

他提供的代碼片段完全損壞了！它也很難閱讀。 _siftup()被調(diào)用 n//2 次，heapify()所以它不能比_siftup()它自己快。

要回答最初的問(wèn)題，沒(méi)有更好的方法。如果您擔(dān)心方法是私有的，請(qǐng)創(chuàng)建自己的方法來(lái)做同樣的事情。

我唯一同意的一點(diǎn)是，如果您長(zhǎng)時(shí)間不需要從堆中讀取數(shù)據(jù)，那么在需要時(shí)將其延遲可能會(huì)有所幫助heapify()。問(wèn)題是您是否應(yīng)該為此使用堆。

讓我們來(lái)看看他的片段的問(wèn)題：

該heapify()函數(shù)被多次調(diào)用以進(jìn)行“更新”運(yùn)行。導(dǎo)致這種情況的錯(cuò)誤鏈如下：

他通過(guò)了heap_fix，但期望heap和同樣適用于sort
如果self.sort總是False，self.heap則總是True
他重新定義__getitem__()和__setitem__()被稱為每次_siftup()的_siftdown()分配或讀的東西（注：這兩個(gè)是不是在C調(diào)用，因此他們使用__getitem__()和__setitem__()）
如果self.heap是True和__getitem__()與__setitem__()被調(diào)用時(shí)，_repair()函數(shù)被調(diào)用每一次_siftup()或siftdown()交換的元素。但是調(diào)用heapify()是在 C 中完成的，所以__getitem__()不會(huì)被調(diào)用，也不會(huì)以無(wú)限循環(huán)結(jié)束
他重新定義了self.sort這樣的稱呼，就像他試圖做的那樣，會(huì)失敗
他讀了一次，但更新一個(gè)項(xiàng)目的nb_updates次數(shù)，而不是他聲稱的 1:1

我修正了這個(gè)例子，我盡可能地驗(yàn)證它，但我們都會(huì)犯錯(cuò)誤。隨意檢查它自己。

代碼

import time

import random

from heapq import _siftup, _siftdown, heapify, heappop

class UpdateHeap(list):

def __init__(self, values):

super().__init__(values)

heapify(self)

def update(self, index, value):

old, self[index] = self[index], value

if value > old:

_siftup(self, index)

else:

_siftdown(self, 0, index)

def pop(self):

return heappop(self)

class SlowHeap(list):

def __init__(self, values):

super().__init__(values)

heapify(self)

self._broken = False

# Solution 2 and 3) repair using sort/heapify in a lazy way:

def update(self, index, value):

super().__setitem__(index, value)

self._broken = True

def __getitem__(self, index):

if self._broken:

self._repair()

self._broken = False

return super().__getitem__(index)

def _repair(self):

...

def pop(self):

if self._broken:

self._repair()

return heappop(self)

class HeapifyHeap(SlowHeap):

def _repair(self):

heapify(self)

class SortHeap(SlowHeap):

def _repair(self):

self.sort()

def rand_update(heap):

index = random.randint(0, len(heap)-1)

new_value = random.randint(max_int+1, max_int*2)

heap.update(index, new_value)

def rand_updates(update_count, heap):

for i in range(update_count):

rand_update(heap)

heap[0]

def verify(heap):

last = None

while heap:

item = heap.pop()

if last is not None and item < last:

raise RuntimeError(f"{item} was smaller than last {last}")

last = item

def run_perf_test(update_count, data, heap_class):

test_heap = heap_class(data)

t0 = time.time()

rand_updates(update_count, test_heap)

perf = (time.time() - t0)*1e3

verify(test_heap)

return perf

results = []

max_int = 500

update_count = 100

for i in range(2, 7):

test_size = 10**i

test_data = [random.randint(0, max_int) for _ in range(test_size)]

perf = run_perf_test(update_count, test_data, UpdateHeap)

results.append((test_size, "update", perf))

perf = run_perf_test(update_count, test_data, HeapifyHeap)

results.append((test_size, "heapify", perf))

perf = run_perf_test(update_count, test_data, SortHeap)

results.append((test_size, "sort", perf))

import pandas as pd

import seaborn as sns

dtf = pd.DataFrame(results, columns=["heap size", "method", "duration (ms)"])

print(dtf)

sns.lineplot(

data=dtf,

x="heap size",

y="duration (ms)",

hue="method",

)

結(jié)果

如您所見(jiàn)，使用_siftdown()和的“更新”方法_siftup()漸近地更快。

你應(yīng)該知道你的代碼做了什么，以及運(yùn)行需要多長(zhǎng)時(shí)間。如果有疑問(wèn)，您應(yīng)該檢查一下。@cglaced 檢查了執(zhí)行需要多長(zhǎng)時(shí)間，但他沒(méi)有質(zhì)疑應(yīng)該需要多長(zhǎng)時(shí)間。如果他這樣做了，他會(huì)發(fā)現(xiàn)兩者不匹配。而其他人則為之傾倒。

heap size method duration (ms)

0 100 update 0.219107

1 100 heapify 0.412703

2 100 sort 0.242710

3 1000 update 0.198841

4 1000 heapify 2.947330

5 1000 sort 0.605345

6 10000 update 0.203848

7 10000 heapify 32.759190

8 10000 sort 4.621506

9 100000 update 0.348568

10 100000 heapify 327.646971

11 100000 sort 49.481153

12 1000000 update 0.256062

13 1000000 heapify 3475.244761

14 1000000 sort 1106.570005

反對(duì) 回復(fù) 2021-12-17

2 回答
0 關(guān)注
313 瀏覽

關(guān)注

添加回答

舉報(bào)

0/150

提交

取消

使用 Ctrl+D 可將網(wǎng)站添加到書(shū)簽

微信客服

購(gòu)課補(bǔ)貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動(dòng)學(xué)習(xí)伙伴

公眾號(hào)

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號(hào)

第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

熱搜

最近搜索清空

如何避免在 heapq 中使用 _siftup 或 _siftdown

如何避免在 heapq 中使用 _siftup 或 _siftdown

2 回答

代碼

添加回答