Tag: lucene

Hibernate search with lucene does not index similar names correctly

I’m learning Hibernate Search 6.1.3.Final with Lucene 8.11.1 as backend and Spring Boot 2.6.6. I’m trying to create a search for product names, barcodes and manufacturers. Currently, I’m doing an integration test to see what happens when a couple of products have similar name: As you can see in the test, I expect to obtain the two tobaccos with similar

Apache Lucene to replace found terms

full-text-search java lucene replace

I’m looking for a way to find-and-replace words basing on queries in a text using Apache Lucene. Example – I have a text “Happy New Year!” and Lucene query “year~2” with fuzzy-detection and some replace characters (“###”). As the result I want the following – “Happy New ###!”. Is there a way to achieve this using Apache Lucene only? Answer

How to sort a Lucene long date range by date

java lucene

Problem: I want to search books by date range, but sort the result. Searching by date range works, but the documents are not sorted properly (Insertion order, see ID?): To sort them by date, I changed my code to: Add NumericDocValuesField: Add a Sort: Question: What am I doing wrong? What do I need to change so the documents get

Apache Solr – Indexing ZIP files

apache-tika extract java lucene solr

My web app is an e-mail service. It stores email messages in MySQL database and email attachments are on a disk. The database is similar to: I index it with the following data-config.xml: This is working good with all the files except compressed files such as .zip. For .zip files the attach_content field gets filled only with the file names

Lucene LongPoint Range search doesn’t work

indexing java java-11 lucene range-query

I am using Lucene 8.2.0 in Java 11. I am trying to index a Long value so that I can filter by it using a range query, for example like so: +my_range_field:[1 TO 200]. However, any variant of that, even my_range_field:[* TO *], returns 0 results in this minimal example. As soon as I remove the + from it to

Can a Hibernate Search FieldBridge configure facets for dynamic fields?

hibernate hibernate-search java lucene

Using Hibernate Search 5.11.3 with programmatic API (no annotations), is there a way to facet on dynamic fields added in a class or field bridge? I don’t see any ‘facet’ config available in FieldMetadataBuilder when using MetadataProvidingFieldBridge. I have tried various combinations of luceneOptions.addSortedDocValuesFieldToDocument() and luceneOptions.addFieldToDocument() in the set() method. This successfully updates the index, but I cannot perform facet

Add weights to documents Lucene 8

java lucene

I am currently working on a small search engine for college using Lucene 8. I already built it before, but without applying any weights to documents. I am now required to add the PageRanks of documents as a weight for each document, and I already computed the PageRank values. How can I add a weight to a Document object (not

Lucene split package: module reads package ‘org.apache.lucene.analysis.standard’ from both ‘lucene.analyzers.common’ and ‘lucene.core’

java java-platform-module-system lucene

Given my module-info.java: I get the following error: Module ‘my_module’ reads package ‘org.apache.lucene.analysis.standard’ from both ‘lucene.analyzers.common’ and ‘lucene.core’ In my code I use the following imports: How can resolve this split package problem? Answer As you may already know, Lucene doesn’t support the Java Platform Module System properly, so it doesn’t define modules and contains split packages, which don’t work

Lemmatization with apache lucene

java lemmatization lucene nlp stemming

I’m developing a text analysis project using apache lucene. I need to lemmatize some text (transform the words to their canonical forms). I’ve already written the code that makes stemming. Using it, I am able to convert the following sentence The stem is the part of the word that never changes even when morphologically inflected; a lemma is the base

Lucene: Multi-word phrases as search terms

java lucene search

I’m trying to make a searchable phone/local business directory using Apache Lucene. I have fields for street name, business name, phone number etc. The problem that I’m having is that when I try to search by street where the street name has multiple words (e.g. ‘the crescent’), no results are returned. But if I try to search with just one