BlueShoes PHP Plugins: IndexServer

What do you need this indexserver for?

The strength of this package is to index all sorts of things (websites, file systems, files, databases tables, ...). You feed the indexer with information, and later you can query it.

Features:

Boolean search operators like + - (and, not).
Search for "fixed names" (eg "Bill Gates") using right neighbors.
Stemming, metaphone, soundex, fast part-word searches like foo*, foo*bar AND *foo.
Good weightening
Foreign keys (for db-related indexes).
Stopword lists (multilingual).
Settings via xml (only basics implemented yet).
For db's: auto-calculated settings from name conventions and table structures (table scanning).
Returns hints "Did you mean xy" after a search.
Support for different data formats, currently:
- strings,
- arrays,
- db tables,
- text files,
- built-in mime-type handlers for html, pdf, doc and xls,
- a generic interface for custom mime-type handlers
Automatic creation of the internal (MySQL) database tables.
Highlighting of the matched words in the results.

Examples:

South Park: Episode script files are indexed line by line.
The search on this website is done using this plugin.
Shakespere: A file that is indexed line by line.
In the download there is the shakespere example, the south park example as well as a filesystem example: Some directories with text, html, pdf, word and excel files.

How does the weightening work?

After finding results for keywords it is very important to order the results based on relevance. To achieve this weightening of different parts of the content is important.

Weight points can be given for different parts of the content that gets indexed. For different data types (db's, html) there exist default weight properties. Examples: - The words in the title of a website are more important than the words in the body. - A CHAR(20) db field is more important than a BLOB. foreign key fields are even less important.
A count is maintained on each word, so we know if a word is special or common for your application. 'madonna' may be a special word if you're indexing the world, but if you're indexing a db about madonna songs then it's different.
If a word is used 30 times in a text with 1'000 words, then it's more important than a word that's used once in 10'000 words.
long words are considered more special, thus are more important when searching.

Download:

See the download page.
This component ships with the BlueShoes Framework.

Documentation

API-Doc

License:

Available with

BlueShoes "developer basic" license
BlueShoes commercial licenses

Check the license overview page for details.

Let us know

Have you done something interesting with BlueShoes or one of its components?