Skip to content
Advertisement

Apache Lucene to replace found terms

I’m looking for a way to find-and-replace words basing on queries in a text using Apache Lucene. Example – I have a text “Happy New Year!” and Lucene query “year~2” with fuzzy-detection and some replace characters (“###”). As the result I want the following – “Happy New ###!”. Is there a way to achieve this using Apache Lucene only?

Advertisement

Answer

Just in case for anyone who needs this. I managed to solve the problem using Apache Highlighter. See code sample below

Highlighter highlighter = new Highlighter((originalText, tokenGroup) -> {
    if (tokenGroup.getTotalScore() <= 0) {
        return originalText;
    }
    return "###";
}, new QueryScorer(query));
// ...
String highlighted = highlighter.getBestFragments(tokenStream, fieldText, 100, "...");
Advertisement