Archive for the 'Search Engines' Category

Solr Post.jar – post to different Solr port other than 8983

What if you want to post data to Solr instance whose http port is different from 8983 using post.jar which comes along Solr package?

For example, your Solr http address is : http://localhost:8080/solr

To post data to this Solr index: java -Durl=http://localhost:8080/solr/update -jar post.jar *.xml

That’s all… :)

Select/delete all items in Solr

To select all items for a field in Solr you can use the query : some_item:[* TO *], but if this field is missing from some documents you will not select those documents.

To select all documents you can use the id defined in /conf/schema.xml, for example: <uniqueKey>solr_id</uniqueKey> you can use solr_id:[* TO *].

Now that you have all documents selected you can delete them :D

To delete all documents in Solr use this update xml:

<delete><query>solr_id:[*TO *]</query></delete>

and of course you have to commit:

<commit />

That’s all !

Source: http://blog.tremend.ro/2007/03/02/selectdelete-all-items-in-solr/

How Search Engine Work

 

“Spiders” take a Web page’s content and create key search words that enable online users to find pages they’re looking for.

Source: http://computer.howstuffworks.com/search-engine1.htm

Facets and Tagging

RawSugar Faceted Search

Carrot2 – Open Source Framework for Building Search Clustering Engines

Carrot2 is an Open Source Search Results Clustering Engine. It can automatically organize (cluster) search results into thematic categories:

 Search results clustered with Carrot2 (live demo)

Carrot2 provides an architecture for acquiring search results from various sources (YahooAPI, GoogleAPI, MSN Search API, eTools Meta Search, Alexa Web Search, PubMed, OpenSearch, Lucene index, SOLR), clustering the results and visualising the clusters. Currently, 5 clustering algorithms are available that are suitable for different kinds of document clustering tasks.

Thanks to its flexible architecture, high quality and a friendly BSD-like license, Carrot2 has been successfully used in a number of commercial and research applications and resulted in a number of interesting publications. To get started, please have a look at live demos and the downloads section. If you have any questions or comments about Carrot2, please let us know.

For consulting services, installation, maintenance and text mining expertise, please contact the Carrot2 spin-off company called Carrot Search. Carrot Search offers Lingo3G — the third generation high-performance document clustering engine featuring hierarchical clustering, ontologies, synonyms and advanced tuning capabilities.

Source: http://project.carrot2.org/index.html


Categories

Archives


Follow

Get every new post delivered to your Inbox.