Elasticsearch DSL for searching and ranking information

Published: June 9, 2021 8 min. read

By Svitla Team

Web Development

In modern information systems, the amount of data increases significantly every hour. Each user enters new information into information systems, and in turn, this also increases the size of backups, the size of logs, duplicate transactions, and so on. For an effective search for information, it is necessary to use the appropriate means that will effectively solve the task. Moreover, the amount of information can be so large that it is necessary to use multithreaded calculations to work with the tasks of searching, sorting and matching information. The wonderful Elasticsearch tool can help you with this. To work with this system, you can also use the library for Python Elasticsearch DSL. In this blog, we’ll talk about the basic possibilities for finding information using these tools.

What is Elasticsearch?

Elasticsearch is a free software search server. It provides a distributed, multitenant full-text search engine with an HTTP web interface and support for seamless JSON documents. The first version of Elasticsearch went live in February 2010. Elasticsearch can be used to index and search any type of document. It provides extensive search, and has near real-time search support. Elasticsearch has the ability to distribute indexes that can be separated by shards, and each shard can have zero or more replicas. Each node contains one or more shards and acts as a coordinator for delegating operations to the desired shard. Balancing and routing is performed automatically.

In favor of the Elasticsearch system, the following can be confirmed:

In Elasticsearch, you can perform and combine different kinds of searches, regardless of the data type. Information can include structured, unstructured, geographic, metric, and other data types.
Libraries for different programming languages and HTTP API requests are supported.
A GET request can quickly retrieve data in the required form.
Elasticsearch can efficiently analyze billions of records in a matter of seconds.
The system provides aggregates to help you investigate trends and patterns in your data.

In a nutshell, Elasticsearch provides scale-out search, multithreading support. Search indexes can be divided into shards, each shard can have multiple replicas, each node can host multiple shards, with each node acting as a coordinator to delegate operations to the correct shard, rebalancing and routing are automatic. Related data is often stored in the same index, which consists of one or more primary shards and possibly multiple replicas.

Installing Elasticsearch

Installing an Elasticsearch system is not very difficult. If, for example, you want to try to install the system on MacOS, you would use brew. Other systems have corresponding installation tools, which you can see as part of the documentation on the Elasticsearch site.

brew tap elastic/tap

This will take some time for installation:

==> New Formulae
archey4             dory                llvm@11             marcli              organize-tool       stp                 zinit
conftest            gnupg@2.2           lychee              minisat             revive              webhook
csvtk               lefthook            macchina            mr2                 six                 xplr
==> Updated Formulae
Updated 680 formulae.
==> Renamed Formulae
fcct -> butane

==> Tapping elastic/tap
Cloning into '/usr/local/Homebrew/Library/Taps/elastic/homebrew-tap'...
remote: Enumerating objects: 870, done.
remote: Counting objects: 100% (111/111), done.
remote: Compressing objects: 100% (84/84), done.
remote: Total 870 (delta 63), reused 55 (delta 26), pack-reused 759
Receiving objects: 100% (870/870), 202.89 KiB | 490.00 KiB/s, done.
Resolving deltas: 100% (649/649), done.
Tapped 17 formulae (50 files, 319.4KB).
Code language: PHP (php)

Next command will install a full set of Elastisearch.

brew install elastic/tap/elasticsearch-full

Then, add in your

.bash_profile
Code language: CSS (css)

the following lines:

ES_HOME=/usr/local/var/homebrew/linked/elasticsearch-full
export ES_HOME
Code language: JavaScript (javascript)

This will help you run Elasticsearch from the installation directory on your computer.

Starting and Testing the Elasticsearch Installation

Start elasticsearch and restart at login:

brew services start elasticsearch

Or, if you don't need a background service you can just run:

elasticsearch

To test the elasticsearch installation type:

curl localhost:9200
Code language: CSS (css)

This will produce the following output:

{
  "name" : "MacBook-Pro-K-2",
  "cluster_name" : "elasticsearch_konst1970",
  "cluster_uuid" : "O2BwRmCCQ8amY3CiZa7Bpg",
  "version" : {
    "number" : "7.12.0",
    "build_flavor" : "default",
    "build_type" : "tar",
    "build_hash" : "78722783c38caa25a70982b5b042074cde5d3b3a",
    "build_date" : "2021-03-18T06:17:15.410153305Z",
    "build_snapshot" : false,
    "lucene_version" : "8.8.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}
Code language: JSON / JSON with Comments (json)

This means that Elasticsearch version 7.12.0 is up and running.

Using Python and Elasticsearch

There are many ways to use Elasticsearch with the Python programming language. For instance you can write HTTP requests to Elasticsearch API with your favorite Python network library. Or you can use the official low-level Python Elasticsearch library known as elasticsearch.

On a higher level, it is possible to use Elasticsearch DSL library for Python to create more compact and effective code. As mentioned on their website: “Elasticsearch DSL is a high-level library whose aim is to help with writing and running queries against Elasticsearch. It is built on top of the official low-level client (elasticsearch-py). It provides a more convenient and idiomatic way to write and manipulate queries. It stays close to the Elasticsearch JSON DSL, mirroring its terminology and structure. It exposes the whole range of the DSL from Python either directly using defined classes or a queryset-like expressions. It also provides an optional wrapper for working with documents as Python objects: defining mappings, retrieving and saving documents, wrapping the document data in user-defined classes.”

To install elasticsearch and elasticsearch_dsl libraries on your computer please use pip. Consider the Elasticsearch engine is already installed on your system.

pip install elasticsearch
pip install elasticsearch_dsl

To try Elasticsearch with Python please run the following code:

from elasticsearch import Elasticsearch

es = Elasticsearch([{'host': 'localhost', 'port': 9200}])

print (es)
Code language: JavaScript (javascript)

This will indicate that Elasticsearch works correctly:

<Elasticsearch([{'host': 'localhost', 'port': 9200}])>
Code language: CSS (css)

Please note, if you like to delete records from Elasticsearch, it is necessary to make additional configurations. To remove information from the Elasticsearch system, you can use the following configuration request:

curl -XPUT -H "Content-Type: application/json" 
Code language: JavaScript (javascript)

This command will provide the following output that indicates the successful system operation.

http://127.0.01:9200/_all/_settings -d '{"index.blocks.read_only_allow_delete": null}'
Code language: JavaScript (javascript)

Add, search, and delete information from Elasticsearch.

Let’s create a simple Python script to add, search, and delete information to Elasticsearch. This script uses data records generated by JSON Data Generator on this website. The aim of the script is:

First, the script tests the Elasticsearch server on port 9200.
Then, it deletes all records from the search index.
Then, the script adds records to the search index.
Finally, the script finds all records with an age greater and equal than 20.

from elasticsearch import Elasticsearch
from elasticsearch_dsl import Q
from elasticsearch_dsl import Search

# establish connection with Elasticsearch
es = Elasticsearch([{'host': 'localhost', 'port': 9200}])

print (es)

# delete all records from elasticsearch
s1 = Search(using=es, index='my-index').query("range", index={'gte': 0})
response = s1.delete()

data = [
 {
   "index": 0,
   "guid": "e5598a00-b2ed-437c-b13b-1e6f387bf23f",
   "isActive": False,
   "balance": "$2,428.26",
   "picture": "http://placehold.it/32x32",
   "age": 21,
   "eyeColor": "blue",
   "name": "Cherry Baird",
   "gender": "female",
   "company": "UPLINX",
   "email": "cherrybaird@uplinx.com",
   "phone": "+1 (957) 504-3326",
   "address": "585 Ludlam Place, Deseret, Virginia, 5268",
   "about": "Cupidatat reprehenderit mollit et qui pariatur enim est commodo non duis sit. Do mollit esse commodo ad pariatur dolore qui. Deserunt ullamco eiusmod cillum eiusmod pariatur do minim elit minim veniam incididunt ad Lorem est. Quis elit nostrud non sit dolore. Ea nulla velit enim nostrud Lorem.\r\n",
   "registered": "2015-03-18T11:29:59 -02:00",
   "latitude": 11.444065,
   "longitude": -104.466353,
   "tags": [
     "veniam",
     "amet",
     "nostrud",
     "ipsum",
     "pariatur",
     "ad",
     "sunt"
   ],
   "friends": [
     {
       "id": 0,
       "name": "Clements Fletcher"
     },
     {
       "id": 1,
       "name": "Stuart Mcintosh"
     },
     {
       "id": 2,
       "name": "Finch Cleveland"
     }
   ],
   "greeting": "Hello, Cherry Baird! You have 10 unread messages.",
   "favoriteFruit": "strawberry"
 },
 ... # add more records here
]

# add records to elasticsearch
for body in data:
  result = es.index(index='my-index', body=body)
  print(result)

# form query for search: match company and age 
query = Q('match', company='UPLINX') & Q('range', age={'gte': 20})
s = Search(using=es, index='my-index').query(query)
response = s.execute()

# print search results
for hit in response:
   print(hit.name)
Code language: PHP (php)

This script will generate the following output:

<Elasticsearch([{'host': 'localhost', 'port': 9200}])>
{'_index': 'my-index', '_type': '_doc', '_id': 'BTj4CnkB-8-eDhz1rKIb', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 130, '_primary_term': 5}
{'_index': 'my-index', '_type': '_doc', '_id': 'Bjj4CnkB-8-eDhz1rqJ9', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 131, '_primary_term': 5}
...
{'_index': 'my-index', '_type': '_doc', '_id': 'Czj4CnkB-8-eDhz1saI1', '_version': 1, 'result': 'created', '_shards': {'total': 2, 'successful': 1, 'failed': 0}, '_seq_no': 136, '_primary_term': 5}
Cherry Baird
Manning Maddox
Hawkins WilsonCode language: JavaScript (javascript)

Here, we found 3 records with an age greater than equal to 20 and specified request options. This simple example demonstrates how to build search systems with Python libraries and use powerful search systems such as Elasticsearch.

Conclusion

Even with a complex system such as Elasticsearch, you can work with a library in the Python language and solve the assigned tasks quite effectively. At the same time, the main computational load is on the side of Elasticsearch and Python does not slow down the search system in any way. In terms of functionality and capabilities of building queries to the search system, the Elasticsearch-DSL library also provides ample opportunities and allows you to solve practical tasks for finding the necessary information.

Svitla Systems' qualified software engineers have extensive experience with Elasticsearch and information retrieval systems. You can contact our company for software development, including complex software systems in which it is necessary to implement various methods of information retrieval. Also, our developers know how to correctly build a system architecture that will work as quickly as possible with large amounts of information and using cloud systems.

Written by

Svitla Team

FAQ

What is Elasticsearch DSL and how does it work for searching data?

Elasticsearch DSL sits as a high-level Python library that simplifies and makes more native to the language query writing and execution against an Elasticsearch server. It is based on the official low-level elasticsearch-py client. It enables building complex search queries through Python classes and expressions rather than with JSON, thereby closely imitating the native JSONQuery language of Elasticsearch. You can easily define, combine, and run searches with Elasticsearch DSL support, plus treat documents directly as Python objects, simplifying much of the interaction with indexes inside Elasticsearch for retrieving data. This method accelerates searching, filtering, and analyzing large datasets.

How can I use Elasticsearch DSL to rank search results effectively?

Use Elasticsearch DSL to build queries that will take advantage of all the powerful features for scoring and relevance available within Elasticsearch. With an appropriate combination of match, term, and range queries together with the usage of boost or function_score, among others, one gets a way to control how results are scored and ordered. Fine-tune field weighting or condition weighting within the DSL itself so that pertinent documents stay at the very top of your result list. This flexibility helps you customize ranking based on your particular data as well as users’ demands.

What are the main differences between Elasticsearch DSL and simple query strings?

Elasticsearch DSL provides a structured way to build complex queries using the power of Python classes and objects, making the management of advanced search logic easier to create, combine, and maintain. In contrast, simple query strings are basic text-based queries with very limited flexibility, well-suited for simple searches. Aggregations and scoring rules can be defined in detail within DSL, easy-to-use string queries do not support such expressiveness, and even less so do they support maintainability when requirements are getting complicated. Hence, robust search applications would be better off being developed with a DSL.

How do I build complex search queries using Elasticsearch DSL?

Start by importing Search and Q, then create a Search(using=es, index='your-index') object and compose Boolean logic with Q objects—using &, |, and ~ to represent must, should, and must not clauses. Within each Q, specify granular criteria such as match, term, range, or even nested queries, and apply boosts or function_score for custom relevance tuning. Because the DSL mirrors Elasticsearch’s JSON syntax, every Python expression is automatically translated into the equivalent JSON query body, keeping code readable while letting you nest filters, aggregations, and scripting as needed. Finally, call .execute() to run the search and iterate over the ranked results just as you would with any Python iterable.

What are common challenges when using Elasticsearch DSL for information retrieval and ranking?

Typical challenges include query design that would balance relevance with performance, especially when dealing with big or complex data sets. It may become a challenge to tune scoring and ranking so that the top results are consistently relevant when combining several query types or custom scoring functions. Other issues that require proper planning include index structure management, managing data mappings, and edge cases like missing and inconsistent data. There is also a learning curve in debugging queries to be fast as well as accurate for a newbie in the concepts of querying within Elasticsearch.

Elasticsearch DSL for searching and ranking information

What is Elasticsearch?

Installing Elasticsearch

Starting and Testing the Elasticsearch Installation

Using Python and Elasticsearch

Add, search, and delete information from Elasticsearch.

Conclusion

FAQ

What is Elasticsearch DSL and how does it work for searching data?

How can I use Elasticsearch DSL to rank search results effectively?

What are the main differences between Elasticsearch DSL and simple query strings?

How do I build complex search queries using Elasticsearch DSL?

What are common challenges when using Elasticsearch DSL for information retrieval and ranking?

Share

Related articles

How to Build a Secure Fintech App: Authentication, Compliance & Encryption

Cloud Platforms Comparison: AWS vs Azure vs Google Cloud

What Is the Average Cost of Digital Transformation Projects?

Wondering how to choose the right solution for your company?