3 回答

TA貢獻(xiàn)1813條經(jīng)驗(yàn) 獲得超2個(gè)贊
上面給出的levenshtein <= 1的函數(shù)不正確 - 它給出了不正確的結(jié)果,例如“床”和“出價(jià)”。
我在第一個(gè)答案中修改了上面給出的“MySQL Levenshtein距離查詢”,以接受一個(gè)“限制”,這將加快它的速度?;旧希绻阒魂P(guān)心Levenshtein <= 1,將極限設(shè)置為“2”,如果它是0或1,函數(shù)將返回精確的levenshtein距離; 如果精確的levenshtein距離為2或更大,則為2。
這個(gè)mod使它快15%到50% - 搜索詞越長(zhǎng),優(yōu)勢(shì)越大(因?yàn)樗惴梢蕴崆氨a?。)例如,搜?00,000個(gè)單詞以查找距離1的所有匹配項(xiàng)“傻笑”,原版在我的筆記本電腦上花了3分47秒,而“極限”版本需要1:39。當(dāng)然,這些對(duì)于任何實(shí)時(shí)使用來(lái)說(shuō)都太慢了。
碼:
DELIMITER $$CREATE FUNCTION levenshtein_limit_n( s1 VARCHAR(255), s2 VARCHAR(255), n INT) RETURNS INT DETERMINISTIC BEGIN DECLARE s1_len, s2_len, i, j, c, c_temp, cost, c_min INT; DECLARE s1_char CHAR; -- max strlen=255 DECLARE cv0, cv1 VARBINARY(256); SET s1_len = CHAR_LENGTH(s1), s2_len = CHAR_LENGTH(s2), cv1 = 0x00, j = 1, i = 1, c = 0, c_min = 0; IF s1 = s2 THEN RETURN 0; ELSEIF s1_len = 0 THEN RETURN s2_len; ELSEIF s2_len = 0 THEN RETURN s1_len; ELSE WHILE j <= s2_len DO SET cv1 = CONCAT(cv1, UNHEX(HEX(j))), j = j + 1; END WHILE; WHILE i <= s1_len and c_min < n DO -- if actual levenshtein dist >= limit, don't bother computing it SET s1_char = SUBSTRING(s1, i, 1), c = i, c_min = i, cv0 = UNHEX(HEX(i)), j = 1; WHILE j <= s2_len DO SET c = c + 1; IF s1_char = SUBSTRING(s2, j, 1) THEN SET cost = 0; ELSE SET cost = 1; END IF; SET c_temp = CONV(HEX(SUBSTRING(cv1, j, 1)), 16, 10) + cost; IF c > c_temp THEN SET c = c_temp; END IF; SET c_temp = CONV(HEX(SUBSTRING(cv1, j+1, 1)), 16, 10) + 1; IF c > c_temp THEN SET c = c_temp; END IF; SET cv0 = CONCAT(cv0, UNHEX(HEX(c))), j = j + 1; IF c < c_min THEN SET c_min = c; END IF; END WHILE; SET cv1 = cv0, i = i + 1; END WHILE; END IF; IF i <= s1_len THEN -- we didn't finish, limit exceeded SET c = c_min; -- actual distance is >= c_min (i.e., the smallest value in the last computed row of the matrix) END IF; RETURN c; END$$

TA貢獻(xiàn)1801條經(jīng)驗(yàn) 獲得超16個(gè)贊
damerau-levenshtein距離的實(shí)現(xiàn)可以在這里找到: Damerau-Levenshtein算法:具有轉(zhuǎn)置 的Levenshtein對(duì)純Levenshtein距離的改進(jìn)是考慮字符的交換。我在schnaader鏈接的評(píng)論中找到了它,謝謝!
添加回答
舉報(bào)