Skip to main content


Showing posts from October, 2011

Lucene, sample JAVA code to Index a file folder

Please find below the Lucene sample code to index the files inside a folder. This code will index ( or create fields for ) the file path, file title, modified date and contents of the file. This java code is expecting the index path ( where the index files will be created ) and file folder path as program arguments like  "java IndexFiles  [-index INDEX_PATH] [-docs DOCS_PATH]" . The logic of the code is to iterate through each file in the folder and call the method indexDoc(), where the above said fields are created and added to a Document object. This means that for each file there will be a document object and these document objects will be added to IndexWriter. Please find below the screen shot of the indexd file folder : import; import; import; import; import; import java.util.Date; import org.apache.lucene.analysis.Analyzer; import org.apache.lucene.ana

Good features of Eclipse3.5 (Eclipse Galileo) JDT

This blog will list down the new features of Eclipse Galileo JDT. I will write another blog regarding the features of Eclipse Helios and Eclipse Indigo. Read about Eclipse Helios features @ ========================================================== 1. Toggle Breadcrumb —> Will list the name of the file and the method name with respect to your cursor position , on the top of the Eclipse IDE. From here you can go to other methods, other classes in same package , …. Screen shot of Toggle Breadcrumb: ========================================================== 2. From the method call , you can either go to declaration or to implementation Screen shot of implementation call: ========================================================== 3. Advanced Open Type –> You can restrict the open type to a selected Working set only. Screen shot of Advanced Open Type: ==============

Apache Lucene quick links

Lucene home page –> Download Lucene from –> Lucene API Doc –> Lucene docs for each release –> Where can I get help from –> Lucene wiki  –> Lucene , how to improve search speed –> Lucene , how to improve index speed  –> Lucene FAQ –>

Apache Lucene Search Engine’s Features

Apache Lucene is a high-performance, full featured text search engine library written entirely in Java. It is part of Apache Jakarta Project. Lucene was originally written by Doug Cutting in Java. While suitable for any application which requires full text indexing and searching capability, Lucene has been widely recognized for its utility in the implementation of Internet search engines and local, single-site searching. Lucene is Doug Cutting’s wife’s middle name ! Features 1. Scalable, High-Performance Indexing Over 95GB/hour on modern hardware Small RAM requirements — only 1MB heap Incremental indexing as fast as batch indexing Index size roughly 20-30% the size of text indexed 2. Powerful, Accurate and Efficient Search Algorithms Ranked searching — best results returned first Sorting by any field Multiple-index searching with merged results Allows simultaneous update and searching 3. Flexible Queries Phrase queries –>  like “star wars” –> search fo