Warning: include(./views/auth.php): failed to open stream: Permission denied in /var/www/html/live/loginRightSlider.php on line 18

Warning: include(): Failed opening './views/auth.php' for inclusion (include_path='.:/usr/share/php') in /var/www/html/live/loginRightSlider.php on line 18

Apache Solr

Updated on 19 February 2020

By Mukesh Kumar

7 min read 8 views

Updated on 19 February 2020

Share this article

Copy Link Link Copied

Solr is a scalable, ready to deploy, search/storage engine optimized to search large volumes of text-centric data. Solr is enterprise-ready, fast and highly scalable.

Solr is an open-source search platform which is used to build search applications. It was built on top of Lucene (full text search engine). Solr is enterprise-ready, fast and highly scalable. The applications built using Solr are sophisticated and deliver high performance.

It was Yonik Seely who created Solr in 2004 in order to add search capabilities to the company website of CNET Networks. In Jan 2006, it was made an open-source project under Apache Software Foundation. Its latest version, Solr 6.0, was released in 2016 with support for execution of parallel SQL queries.

Solr can be used along with Hadoop. As Hadoop handles a large amount of data, Solr helps us in finding the required information from such a large source. Not only search, Solr can also be used for storage purpose. Like other NoSQL databases, it is a non-relational data storage and processing technology.

In short, Solr is a scalable, ready to deploy, search/storage engine optimized to search large volumes of text-centric data.

Features of Apache Solr

Restful APIs − To communicate with Solr, it is not mandatory to have Java programming skills. Instead you can use restful services to communicate with it. We enter documents in Solr in file formats like XML, JSON and .CSV and get results in the same file formats.
Full text search − Solr provides all the capabilities needed for a full text search such as tokens, phrases, spell check, wildcard, and auto-complete.
Enterprise ready − According to the need of the organization, Solr can be deployed in any kind of systems (big or small) such as standalone, distributed, cloud, etc.
Flexible and Extensible − By extending the Java classes and configuring accordingly, we can customize the components of Solr easily.
NoSQL database − Solr can also be used as big data scale NOSQL database where we can distribute the search tasks along a cluster.
Admin Interface − Solr provides an easy-to-use, user friendly, feature powered, user interface, using which we can perform all the possible tasks such as manage logs, add, delete, update and search documents.
Highly Scalable − While using Solr with Hadoop, we can scale its capacity by adding replicas.
Text-Centric and Sorted by Relevance − Solr is mostly used to search text documents and the results are delivered according to the relevance with the user’s query in order.

Lucene in Search Applications

Lucene is simple yet powerful Java-based search library. It can be used in any application to add search capability. Lucene is a scalable and high-performance library used to index and search virtually any kind of text. Lucene library provides the core operations which are required by any search application, such as Indexing and Searching.

How do Search Engines Work?

Any search application is required to perform some or all of the following operations.

Step	Title	Description
1	Acquire Raw Content	The very first step of any search application is to collect the target contents on which search is to be conducted.
2	Build the document	The next step is to build the document(s) from the raw contents which the search application can understand and interpret easily.
3	Analyze the document	Before indexing can start, the document is to be analyzed.
4	Indexing the document	Once the documents are built and analyzed, the next step is to index them so that this document can be retrieved based on certain keys, instead of the whole contents of the document. Indexing is similar to the indexes that we have at the end of a book where common words are shown with their page numbers so that these words can be tracked quickly, instead of searching the complete book.
5	User Interface for Search	Once a database of indexes is ready, then the application can perform search operations. To help the user make a search, the application must provide a user interface where the user can enter text and initiate the search process
6	Build Query	Once the user makes a request to search a text, the application should prepare a query object using that text, which can then be used to inquire the index database to get relevant details.
7	Search Query	Using the query object, the index database is checked to get the relevant details and the content documents.
8	Render Results	Once the required result is received, the application should decide how to display the results to the user using its User Interface.

SOLR URL in your browser.

http://localhost:8983/

If the installation process is successful, then you will get to see the dashboard of the Apache Solr user interface as shown below.

SOLR General Terminology

The following is a list of general terms that are used across all types of Solr setups −

Instance − Just like a tomcat instance or a jetty instance, this term refers to the application server, which runs inside a JVM. The home directory of Solr provides reference to each of these Solr instances, in which one or more cores can be configured to run in each instance.
Core − While running multiple indexes in your application, you can have multiple cores in each instance, instead of multiple instances each having one core.
Home − The term $SOLR_HOME refers to the home directory which has all the information regarding the cores and their indexes, configurations, and dependencies.
Shard − In distributed environments, the data is partitioned between multiple Solr instances, where each chunk of data can be called as a Shard. It contains a subset of the whole index.

SolrCloud Terminology

In an earlier chapter, we discussed how to install Apache Solr in standalone mode. Note that we can also install Solr in distributed mode (cloud environment) where Solr is installed in a master-slave pattern. In distributed mode, the index is created on the master server and it is replicated to one or more slave servers.

The key terms associated with Solr Cloud are as follows −

Node − In Solr cloud, each single instance of Solr is regarded as a node.
Cluster − All the nodes of the environment combined together make a cluster.
Collection − A cluster has a logical index that is known as a collection.
Shard − A shard is portion of the collection which has one or more replicas of the index.
Replica − In Solr Core, a copy of shard that runs in a node is known as a replica.
Leader − It is also a replica of shard, which distributes the requests of the Solr Cloud to the remaining replicas.
Zookeeper − It is an Apache project that Solr Cloud uses for centralized configuration and coordination, to manage the cluster and to elect a leader.

Configuration Files

The main configuration files in Apache Solr are as follows −

Solr.xml − It is the file in the $SOLR_HOME directory that contains Solr Cloud related information. To load the cores, Solr refers to this file, which helps in identifying them.
Solrconfig.xml − This file contains the definitions and core-specific configurations related to request handling and response formatting, along with indexing, configuring, managing memory and making commits.
Schema.xml − This file contains the whole schema along with the fields and field types.
Core.properties − This file contains the configurations specific to the core. It is referred for core discovery, as it contains the name of the core and path of the data directory. It can be used in any directory, which will then be treated as the core directory.

Starting Solr

After installing Solr, browse to the bin folder in Solr home directory and start Solr using the following command.

[Hadoop@localhost ~]$ cd

[Hadoop@localhost ~]$ cd Solr/

[Hadoop@localhost Solr]$ cd bin/

[Hadoop@localhost bin]$ ./Solr start

This command starts Solr in the background, listening on port 8983 by displaying the following message.

Waiting up to 30 seconds to see Solr running on port 8983 [\]

Started Solr server on port 8983 (pid = 6035). Happy searching!

Starting Solr in foreground

If you start Solr using the start command, then Solr will start in the background. Instead, you can start Solr in the foreground using the –f option.

[Hadoop@localhost bin]$ ./Solr start –f