3 回答

TA貢獻(xiàn)1906條經(jīng)驗(yàn) 獲得超10個(gè)贊
您必須首先標(biāo)記其中一個(gè)字符串。你現(xiàn)在正在做的是試圖匹配整條線。
像這樣的東西應(yīng)該工作:
String text = med[x].toLowerCase();
boolean found =
Arrays.stream(condition[y].split(" "))
.map(String::toLowerCase)
.map(s -> s.replaceAll("\\W", "")
.filter(s -> !s.isEmpty())
.anyMatch(text::contains);
我添加了對(duì)標(biāo)點(diǎn)符號(hào)和任何空白字符串的刪除,這樣我們就不會(huì)對(duì)這些進(jìn)行錯(cuò)誤匹配。(\\W實(shí)際上刪除了不在的字符[A-Za-z_0-9],但您可以將其更改為您喜歡的任何字符)。
如果你需要它來提高效率,因?yàn)槟阌泻芏辔谋?,你可能想把它轉(zhuǎn)過來并使用Set查找速度更快的 a 。
private Stream<String> tokenize(String s) {
return Arrays.stream(s.split(" "))
.map(String::toLowerCase)
.map(s -> s.replaceAll("\\W", "")
.filter(s -> !s.isEmpty());
}
Set<String> words = tokenize(med[x]).collect(Collectors.toSet());
boolean found = tokenize(condition[y]).anyMatch(words::contains);
您可能還想過濾掉停用詞,例如to等and。您可以使用此處的列表并在檢查空白字符串的過濾器之后添加一個(gè)額外的過濾器,以檢查該字符串是否不是停用詞。

TA貢獻(xiàn)1895條經(jīng)驗(yàn) 獲得超7個(gè)贊
這將為您提供“粗略”的匹配百分比。
以下是它的工作原理:
將要搜索的文本和搜索詞拆分為一組單詞。這是通過使用正則表達(dá)式拆分來完成的。每個(gè)單詞都被轉(zhuǎn)換為大寫并添加到一個(gè)集合中。
計(jì)算搜索詞中有多少單詞出現(xiàn)在文本中。
計(jì)算搜索詞中出現(xiàn)在文本中的詞的百分比。
您可能想通過去掉“a”、“the”等常用詞來增強(qiáng)這一點(diǎn)。
import java.util.Arrays;
import java.util.Set;
import java.util.stream.Collectors;
public class CrudeTextMatchThingy {
public static void main(String[] args) {
String searchText = "Fluticasone Propionate Nasal Spray, USP 50 mcg per spray is a \n" +
"corticosteroid indicated for the management of the nasal symptoms of \n" +
"perennial nonallergic rhinitis in adult and pediatric patients aged 4 years \n" +
"and older.";
String[] searchTerms = {
"Unspecified acute lower respiratory infection",
"Vasomotor rhinitis",
"Allergic rhinitis due to pollen",
"Other seasonal allergic rhinitis",
"Allergic rhinitis due to food",
"Allergic rhinitis due to animal (cat) (dog) hair and dander",
"Other allergic rhinitis",
"Allergic rhinitis, unspecified",
"Chronic rhinitis",
"Chronic nasopharyngitis"
};
Arrays.stream(searchTerms).forEach(searchTerm -> {
double matchPercent = findMatch(searchText, searchTerm);
System.out.println(matchPercent + "% - " + searchTerm);
});
}
private static double findMatch(String searchText, String searchTerm) {
Set<String> wordsInSearchText = getWords(searchText);
Set<String> wordsInSearchTerm = getWords(searchTerm);
double wordsInSearchTermThatAreFound = wordsInSearchTerm.stream()
.filter(s -> wordsInSearchText.contains(s))
.count();
return (wordsInSearchTermThatAreFound / wordsInSearchTerm.size()) * 100.0;
}
private static Set<String> getWords(String term) {
return Arrays.stream(term.split("\\b"))
.map(String::trim)
.map(String::toUpperCase)
.filter(s -> s.matches("[A-Z0-9]+"))
.collect(Collectors.toSet());
}
}
輸出:
0.0% - Unspecified acute lower respiratory infection
50.0% - Vasomotor rhinitis
20.0% - Allergic rhinitis due to pollen
25.0% - Other seasonal allergic rhinitis
20.0% - Allergic rhinitis due to food
20.0% - Allergic rhinitis due to animal (cat) (dog) hair and dander
33.33333333333333% - Other allergic rhinitis
33.33333333333333% - Allergic rhinitis, unspecified
50.0% - Chronic rhinitis
0.0% - Chronic nasopharyngitis
如果你不想要一個(gè)百分比,而是真或假,你可以這樣做......,
boolean matches = findMatch(searchText, searchTerm) > 0.0;
希望這可以幫助。

TA貢獻(xiàn)1816條經(jīng)驗(yàn) 獲得超6個(gè)贊
如果你用可搜索的詞構(gòu)建一個(gè)列表,這會(huì)容易得多。假設(shè)您的段落存儲(chǔ)為字符串:
ArrayList<String> dictionary = new ArrayList<>();
dictionary.add("acute lower respiratory infection");
dictionary.add("rhinitis");
for(int i =0; i<dictionary.size(); i++){
if(paragraph.contains(dictionary.get(i))){
System.out.println(i + "True");
}
else{
System.out.println(i +"False");
}
}
添加回答
舉報(bào)