首頁猿問 Java搜索字符串內(nèi)容以進(jìn)行部分匹配

Java搜索字符串內(nèi)容以進(jìn)行部分匹配

Java

茅侃侃 2022-06-15 17:35:25

我正在做一個(gè)項(xiàng)目，我需要在一段文本中搜索特定字符串。但是，我不需要完全匹配，更多的是 % 匹配。例如，這是我正在搜索的文本段落：Fluticasone Propionate Nasal Spray, USP 50 mcg per spray is a corticosteroid indicated for the management of the nasal symptoms of perennial nonallergic rhinitis in adult and pediatric patients aged 4 years and older."然后我正在搜索以下行中的任何單詞是否與該段落匹配：1)Unspecified acute lower respiratory infection2)Vasomotor rhinitis3)Allergic rhinitis due to pollen4)Other seasonal allergic rhinitis5)Allergic rhinitis due to food6)Allergic rhinitis due to animal (cat) (dog) hair and dander7)Other allergic rhinitis8)"Allergic rhinitis, unspecified"9)Chronic rhinitis10)Chronic nasopharyngitis我最初的方法是使用布爾值并包含：boolean found = med[x].toLowerCase().contains(condition[y].toLowerCase());但是，每次循環(huán)的結(jié)果都是負(fù)面的。我期望的結(jié)果是：1) False2) True3) True4) True5) True6) True7) True8) True9) True10) FalseJava 及其方法非常新?；旧?，如果 A 中的任何單詞與 B 中的任何單詞匹配，則將其標(biāo)記為 true。我怎么做？謝謝！

查看完整描述

3 回答

隔江千里

TA貢獻(xiàn)1906條經(jīng)驗(yàn) 獲得超10個(gè)贊

您必須首先標(biāo)記其中一個(gè)字符串。你現(xiàn)在正在做的是試圖匹配整條線。

像這樣的東西應(yīng)該工作：

String text = med[x].toLowerCase();

boolean found =

Arrays.stream(condition[y].split(" "))

.map(String::toLowerCase)

.map(s -> s.replaceAll("\\W", "")

.filter(s -> !s.isEmpty())

.anyMatch(text::contains);

我添加了對(duì)標(biāo)點(diǎn)符號(hào)和任何空白字符串的刪除，這樣我們就不會(huì)對(duì)這些進(jìn)行錯(cuò)誤匹配。（\\W實(shí)際上刪除了不在的字符[A-Za-z_0-9]，但您可以將其更改為您喜歡的任何字符）。

如果你需要它來提高效率，因?yàn)槟阌泻芏辔谋?，你可能想把它轉(zhuǎn)過來并使用Set查找速度更快的 a 。

private Stream<String> tokenize(String s) {

return Arrays.stream(s.split(" "))

.map(String::toLowerCase)

.map(s -> s.replaceAll("\\W", "")

.filter(s -> !s.isEmpty());

}

Set<String> words = tokenize(med[x]).collect(Collectors.toSet());

boolean found = tokenize(condition[y]).anyMatch(words::contains);

您可能還想過濾掉停用詞，例如to等and。您可以使用此處的列表并在檢查空白字符串的過濾器之后添加一個(gè)額外的過濾器，以檢查該字符串是否不是停用詞。

反對(duì) 回復(fù) 2022-06-15

人到中年有點(diǎn)甜

TA貢獻(xiàn)1895條經(jīng)驗(yàn) 獲得超7個(gè)贊

這將為您提供“粗略”的匹配百分比。

以下是它的工作原理：

將要搜索的文本和搜索詞拆分為一組單詞。這是通過使用正則表達(dá)式拆分來完成的。每個(gè)單詞都被轉(zhuǎn)換為大寫并添加到一個(gè)集合中。
計(jì)算搜索詞中有多少單詞出現(xiàn)在文本中。
計(jì)算搜索詞中出現(xiàn)在文本中的詞的百分比。

您可能想通過去掉“a”、“the”等常用詞來增強(qiáng)這一點(diǎn)。

import java.util.Arrays;

import java.util.Set;

import java.util.stream.Collectors;

public class CrudeTextMatchThingy {

public static void main(String[] args) {

String searchText = "Fluticasone Propionate Nasal Spray, USP 50 mcg per spray is a \n" +

"corticosteroid indicated for the management of the nasal symptoms of \n" +

"perennial nonallergic rhinitis in adult and pediatric patients aged 4 years \n" +

"and older.";

String[] searchTerms = {

"Unspecified acute lower respiratory infection",

"Vasomotor rhinitis",

"Allergic rhinitis due to pollen",

"Other seasonal allergic rhinitis",

"Allergic rhinitis due to food",

"Allergic rhinitis due to animal (cat) (dog) hair and dander",

"Other allergic rhinitis",

"Allergic rhinitis, unspecified",

"Chronic rhinitis",

"Chronic nasopharyngitis"

};

Arrays.stream(searchTerms).forEach(searchTerm -> {

double matchPercent = findMatch(searchText, searchTerm);

System.out.println(matchPercent + "% - " + searchTerm);

});

}

private static double findMatch(String searchText, String searchTerm) {

Set<String> wordsInSearchText = getWords(searchText);

Set<String> wordsInSearchTerm = getWords(searchTerm);

double wordsInSearchTermThatAreFound = wordsInSearchTerm.stream()

.filter(s -> wordsInSearchText.contains(s))

.count();

return (wordsInSearchTermThatAreFound / wordsInSearchTerm.size()) * 100.0;

}

private static Set<String> getWords(String term) {

return Arrays.stream(term.split("\\b"))

.map(String::trim)

.map(String::toUpperCase)

.filter(s -> s.matches("[A-Z0-9]+"))

.collect(Collectors.toSet());

}

輸出：

0.0% - Unspecified acute lower respiratory infection

50.0% - Vasomotor rhinitis

20.0% - Allergic rhinitis due to pollen

25.0% - Other seasonal allergic rhinitis

20.0% - Allergic rhinitis due to food

20.0% - Allergic rhinitis due to animal (cat) (dog) hair and dander

33.33333333333333% - Other allergic rhinitis

33.33333333333333% - Allergic rhinitis, unspecified

50.0% - Chronic rhinitis

0.0% - Chronic nasopharyngitis

如果你不想要一個(gè)百分比，而是真或假，你可以這樣做......，

boolean matches = findMatch(searchText, searchTerm) > 0.0;

希望這可以幫助。

反對(duì) 回復(fù) 2022-06-15

瀟湘沐

TA貢獻(xiàn)1816條經(jīng)驗(yàn) 獲得超6個(gè)贊

如果你用可搜索的詞構(gòu)建一個(gè)列表，這會(huì)容易得多。假設(shè)您的段落存儲(chǔ)為字符串：

ArrayList<String> dictionary = new ArrayList<>();

dictionary.add("acute lower respiratory infection");

dictionary.add("rhinitis");

for(int i =0; i<dictionary.size(); i++){

if(paragraph.contains(dictionary.get(i))){

System.out.println(i + "True");

}

else{

System.out.println(i +"False");

}

反對(duì) 回復(fù) 2022-06-15

3 回答
0 關(guān)注
224 瀏覽

關(guān)注

添加回答

舉報(bào)

0/150

提交

取消

使用 Ctrl+D 可將網(wǎng)站添加到書簽

微信客服

購(gòu)課補(bǔ)貼
聯(lián)系客服咨詢優(yōu)惠詳情

幫助反饋 APP下載

慕課網(wǎng)APP
您的移動(dòng)學(xué)習(xí)伙伴

公眾號(hào)

掃描二維碼
關(guān)注慕課網(wǎng)微信公眾號(hào)

第七色在线视频,2021少妇久久久久久久久久,亚洲欧洲精品成人久久av18,亚洲国产精品特色大片观看完整版,孙宇晨将参加特朗普的晚宴

熱搜

最近搜索清空

Java搜索字符串內(nèi)容以進(jìn)行部分匹配

Java搜索字符串內(nèi)容以進(jìn)行部分匹配

3 回答

添加回答