Skip to main content

Apache Lucene Search Engine’s Features


Apache Lucene is a high-performance, full featured text search engine library written entirely in Java. It is part of Apache Jakarta Project. Lucene was originally written by Doug Cutting in Java. While suitable for any application which requires full text indexing and searching capability, Lucene has been widely recognized for its utility in the implementation of Internet search engines and local, single-site searching. Lucene is Doug Cutting’s wife’s middle name !

Features

1. Scalable, High-Performance Indexing

  • Over 95GB/hour on modern hardware
  • Small RAM requirements — only 1MB heap
  • Incremental indexing as fast as batch indexing
  • Index size roughly 20-30% the size of text indexed


2. Powerful, Accurate and Efficient Search Algorithms

  • Ranked searching — best results returned first
  • Sorting by any field
  • Multiple-index searching with merged results
  • Allows simultaneous update and searching


3. Flexible Queries

  • Phrase queries –>  like “star wars” –> search for the full word star wars.
  • Wildcard queries  –> like star* or  sta?  –> search for a single character or multi character replacements for the search words
  • Fuzzy queries  –> like star~0.8  –> search for the similar words with some weightage
  • Proximity queries  –> like  ”star wars”~10 –> search for a “star” and “wars” within 10 words of each other in a document
  • Range queries  –>  like {star-stun}  –>  search for documents in between star and stun. Exclusive queries are denoted by curly brackets
  • Fielded searching   –>  fields like  title, author, contents
  • Date-range searching   –> like [2006-2007]  –>  search for documents with field value in between 2006 and 2007. Inclusive queries are denoted by square brackets
  • Boolean Operators  –>  like star AND wars . The OR operator is the default conjunction operator.
  • Boosting a Term –>  like star^4  wars –> make documents with term star more relevant
  • + Operator  –>  like +star wars –>  search for documents that must contain “star” and may contain “wars”
  • - Operator  –>  like star -wars –>  search for documents that contain “star” and not contains “wars”
  • Grouping –>  like (star AND wars) OR website –>  using parentheses to group clauses to form sub queries
  • Escape special character –>  The current list special characters are   + – && || ! ( ) { } [ ] ^ ” ~ * ? : \  . To escape these character use the \ before the character.


4. Cross-Platform Solution

  • Available as Open Source software under the Apache License which lets you use Lucene in both commercial and Open Source programs
  • 100%-pure Java
  • Implementations in other programming languages available that are index-compatible


At the core of Lucene's logical architecture is the idea of a document containing fields of text. This flexibility allows Lucene's API to be independent of the file format. Text from PDFs, HTML, Microsoft Word, and OpenDocument documents, as well as many others (except images), can all be indexed as long as their textual information can be extracted.
Index  --> sequence of documents ( Directory)
Document  -->  sequence of fields
Field  --> named sequence of terms
Term  --> a text string (e.g., a word)
Terms:
A search query is broken up into terms and operators. There are two types of terms: Single Terms and Phrases. A Single Term is a single word such as "test" or "hello". A Phrase is a group of words surrounded by double quotes such as "hello dolly". Multiple terms can be combined together with Boolean operators to form a more complex query.

Fields:
When performing a search you can either specify a field, or use the default field. You can search any field by typing the field name followed by a colon ":" and then the term you are looking for.

Comments

Popular posts from this blog

Display date and time for a DATE field @ SqlDeveloper

For date fields, by default SQL Developer will display only the date without time. To set it to display the time as well, do the following: Go to SQL Developer –> Tools >> Preferences. Select Database >> NLS Parameters from the left panel. From the list of NLS parameters, enter DD-MON-RR HH24:MI:SS into the Date Format field. Save and close

Income Tax process and e-filing

http://financeminister.in/income_tax_calculator.php https://incometaxindiaefiling.gov.in Below I am listing the step-by-step activities of the Tax Process that a working professional need to do in a given Financial Year. Here FY refers to Financial Year  and  AY refers to Assesment Year. Each month we will pay the Tax (From APRIL 20xx to MARCH 20xx+1) through our Employer for the FY 20xx – 20xx+1. In the month April 20xx, we will give the investment details to the employer (in our employer specified portal) for the FY 20xx – 20xx+1. In the month January 20xx+1, we will give the investment proof details , Rent receipts… to the Employer Finance Department for the FY 20xx – 20xx+1. In the month MAY/JUNE 20xx+1, employer gives the Form 16 for the FY 20xx – 20xx+1 to us (The proof given by the employer to the employee for the tax paid by the employee). In the month July 20xx+1 (on or before July 31st of every year), we will fill the ITR forms (earlier it was NayaSaral f...

Eclipse Kepler - good features of Eclipse 4.3 ( Eclipse Kepler ) JDT / Workbench

Detached windows with sash When you detach a view or editor into its own separate window, it now has all the capabilities of a normal workbench window. They now support multiple stacks of views separated by sashes with arbitrary layouts. The detached parts will remain synchronized with the master window that they were detached from. This is especially handy for people developing with two or more monitors, so they can spread views across several monitors and keep them synchronized. Import nested projects The Import Projects wizard now has an option to continue searching for projects to import recursively within any project it finds. This allows you to import physically nested projects at the same time. Open Resource dialog enhancements The Open Resource (Ctrl+Shift+R) dialog now offers direct access to the Show In and Open With menus via drop-down buttons. On platforms that support mnemonics, the buttons are also accessible via Alt+W and Alt+H. Whole word option...

Google Chrome shortcut keys

If you are a Google Chromey guy, please find below the list of shortcut keys for some of the most used features  :-) Find more shortcut keys @  http://www.google.com/support/chrome/bin/static.py?page=guide.cs&guide=25799&topic=28650

ATG User Profile schema ER diagram

Check out the Product Catalog  schema ER-Diagram @  http://tips4ufromsony.blogspot.in/2012/01/atg-product-catalog-schema-er-diagram.html Check out the O rder schema ER-Diagram @   http://tips4ufromsony.blogspot.in/2012/02/atg-order-schema-er-diagram.html If you would like to know the relationship between different User Profile schema tables, please find below screen shot of  Profile schema ER Diagrams.