Skip to main content

Lucene, sample JAVA code to Search an indexed file folder


Please find below the Lucene sample JAVA code to search the files inside a folder. This code will search the indexed folder for a search query in an indexed field.

This java code is expecting the index path ( where the index files were created ) , field which need to be searched and the query need be searched as program arguments like  "java SearchFiles [-index dir] [-field f] [-query string]" .


import java.io.File;
import java.util.ArrayList;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.queryParser.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.util.Version;

public class SearchFiles {

public static void main(String[] args){
try{
String usage = "Usage: SearchFiles [-index dir] [-field f] [-query string] \n\n";
String index = "index";
String field = "contents";
String queryString = null;
int hitsPerPage = 100;
for (int i = 0; i < args.length; i++) {
if ("-index".equals(args[i])) {
index = args[i + 1];
i++;
} else if ("-field".equals(args[i])) {
field = args[i + 1];
i++;
} else if ("-query".equals(args[i])) {
queryString = args[i + 1];
i++;
}
}
new SearchFiles().searchFiles(index,field,queryString,hitsPerPage);
}catch (Exception e) {
e.printStackTrace();
}
}

public ArrayList<String> searchFiles(String index,String field,
String queryString,int hitsPerPage) throws Exception {

ArrayList<String> returnStringList = new ArrayList<String>();

IndexSearcher searcher = new IndexSearcher(FSDirectory.open(new File(index)));
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_31);
QueryParser parser = new QueryParser(Version.LUCENE_31, field, analyzer);
Query query = parser.parse(queryString);
int numberOfPages = 5;

TopDocs results = searcher.search(query, numberOfPages * hitsPerPage);

ScoreDoc[] hits = results.scoreDocs;
int numTotalHits = results.totalHits;
System.out.println(numTotalHits + " total matching documents");
returnStringList.add(numTotalHits + " total matching documents");

int start = 0;
int end = Math.min(numTotalHits, hitsPerPage);
for (int i = start; i < end; i++) {
Document doc = searcher.doc(hits[i].doc);
String path = doc.get("path");
if (path != null) {
System.out.println((i + 1) + ". " + path);
returnStringList.add((i + 1) + ". " + path);
String title = doc.get("title");
if (title != null) {
System.out.println("   Title: " + doc.get("title"));
returnStringList.add("   Title: " + doc.get("title"));
}
}
}
searcher.close();
return returnStringList;
}
}

Comments

Popular posts from this blog

Google Chrome shortcut keys

If you are a Google Chromey guy, please find below the list of shortcut keys for some of the most used features  :-) Find more shortcut keys @  http://www.google.com/support/chrome/bin/static.py?page=guide.cs&guide=25799&topic=28650

ATG User Profile schema ER diagram

Check out the Product Catalog  schema ER-Diagram @  http://tips4ufromsony.blogspot.in/2012/01/atg-product-catalog-schema-er-diagram.html Check out the O rder schema ER-Diagram @   http://tips4ufromsony.blogspot.in/2012/02/atg-order-schema-er-diagram.html If you would like to know the relationship between different User Profile schema tables, please find below screen shot of  Profile schema ER Diagrams.  

Basic design decisions for a commerce search setup ( with an ATG Search view)

In this blog I would like to explain the basic set of configuration/design decisions needed to setup an ATG search project. Most of these design decisions are common for all Enterprise search applications. 1. Decide the searchable properties :   This means the properties that the business want the user to search in the ecommerce platform. In ATG search these are configured as the text properties in the product-catalog-output-config.xml ( the definitionFile of the \atg\commerce\search\ProductCatalogOutputConfig). Usually the displayName of product/sku, displayName of department/category/sub-category, skuId, brandName are the properties configured as searchable. 2. Decide the search refinement properties or the faceted properties :   After a user search for a keyword, search refinement is the next step done to filter his results. ATG supports the search refinement using the Faceted Search concept. Read more about facted search @...

SOAP UI faster start up

If you feel like your SOAP UI is starting up very slowly, check whether this is due to any start up web page call. You can check this @ Preferences - UI Settings - Show Startup Page ==> Here you can deselect this option to improve the start-up time.

ATG Order update - InvalidVersionException and ConcurrentUpdateException

ATG repository item descriptor can have the version property. The atg.adapter.gsa.ItemTransactionState holds this version information. For example consider the Order item-descriptor. It has the version property defined against the table dcspp_order. Means, the dcspp_order table has the column version which defines which version of order is currently in the DB. Each order update flow will update this column.  <property name="version" display-name-resource="version" data-type="int" queryable="true" readable="true" column-name="version" hidden="false" category-resource="categoryInfo" expert="true" required="false" cache-mode="inherit" writable="true">     <attribute name="uiwritable" value="false"/>     <attribute name="propertySortPriority" value="30"/>   </property> ------------------------------- Du...