November 21, 2024

Full text search: PostgreSQL and RavenDB out of the box capabilities comparison

Hello guys! Hope you are all ok and in good shape to learn something interesting. Recently, I was trying to acquire some NoSQL skills with RavenDB. I chose it because I worked with it on a project and it has really interesting features out of the box. One of the features I find interesting is the full text search capability. It really seems to work with little configuration necessary, according to their documentation.

Then I was also extending this thought of how to make it work for an app that is not using RavenDB. This is when I thought of PostgreSQL because it’s reliable, fast and can also contain JSON documents to emulate NoSQL capabilities. For more info about Postgres JSON columns check here. Then I checked what full text search capabilities PGSQL has to offer and it seems to have it covered as well.

SQL or NoSQL? Which one to the rescue?

As we’ll explore in this article, we’ll see that no matter if your app is more suitable to a SQL or NoSQL architecture, both databases can be used with full text search capabilities. Full text search is a nice asset to have in your app. My intention is to showcase the out of the box capabilities of full text search of both databases, so I will not go into much detail about what’s under the hood. I’ll provide instead links where you can read more.

What are we going to search for?

In order to exemplify I’m going to use the same data set for each of the database, let’s say a list of candidates for a certain position, doesn’t matter. It will be represented as rows for PostgreSQL and as documents for RavenDB. The generic one will look like this:

First NameLast nameDescription
JohnDoeProefficient individual looking towards the blazing stars
OlympiaKleithAuthor, mother and business owner at Star WEB
JohnnyPritchardNobel prize winner for individual recognitions strategies
MandyKleithActor, most renown for supporting role in Stargate SG1 and strong communication capabilities
NonaVergemeierBlogger and traveler, owns a flower making business Nona Flowers
ElenaMontiguaLawyer, advocate for human rights, holds a PhD in International Communications Strategy

Very well, now that we have the data, let’s start playing with PostgreSQL!

Prerequisites

I was about to forget to tell you about what you need to proceed. Of course you need both PostgreSQL and RavenDB up and running on your local machine or a server. I recommend to create a separate database for the purpose of this article, in order to avoid jeopardizing other data. Since RavenDB comes out of the box with an administration UI you won’t need another tool for it. You can use PgAdmin for PostgreSQL. Other tools are available, but I didn’t try any other.

Prepare the data

First step is to prepare the data.

PostgreSQL

Since we’re dealing with a SQL database, we need to create a table for our data:


CREATE TABLE public.candidates
(
    id integer NOT NULL GENERATED BY DEFAULT AS IDENTITY ( INCREMENT 1 START 1 MINVALUE 1 MAXVALUE 2147483647 CACHE 1 ),
    "description" text COLLATE pg_catalog."default",
    "firstName" text COLLATE pg_catalog."default",
    "lastName" text COLLATE pg_catalog."default",
    textsearchable_index_col tsvector GENERATED ALWAYS AS (to_tsvector('english'::regconfig, ((((COALESCE("firstName", ''::text) || ' '::text) || COALESCE("lastName", ''::text)) || ' '::text) || COALESCE(description, ''::text)))) STORED,
    CONSTRAINT candidates_pkey PRIMARY KEY (id)
)

As you can see we define the 3 fields we need for each candidate plus an id field and the searchable column which is defined as tsvector, a special Postgres type used to tokenize text for search. This one will be defined as a concatenation of the firstName, lastName and description fields.

Now the second step is to insert the data into the database:


INSERT INTO candidates ("firstName", "lastName", "description")
VALUES 
(
	'John', 'Doe', 'Proefficient individual looking towards the blazing stars'
),
(
	'Olympia', 'Kleith', 'Author, mother and business owner at Star WEB'
),
(
	'Johnny', 'Pritchard', 'Nobel prize winner for individual recognitions strategies'
),
(
	'Mandy', 'Kleith', 'Actor, most renown for supporting role in Stargate SG1 and strong communication capabilities'
),
(
	'Nona', 'Vergemeier', 'Blogger and traveler, owns a flower making business Nona Flowers'
),
(
	'Elena', 'Montigua', 'Lawyer, advocate for human rights, holds a PhD in International Communications Strategy'
)
)

RavenDB

It’s time now to create the data in our RavenDB server. Unfortunately, I don’t know exactly how to bulk add data from the RavenDB UI, so all of these documents have to be added manually. If you notice, there is a metadata field where the @collection property will group these documents into SearchDocs collection. We will need this since we’ll create an index from this collection in the next step.


{
    "firstName": "John",
    "lastName": "Doe",
    "description": "Proefficient individual looking towards the blazing stars",
    "@metadata": {
        "@collection": "SearchDocs"
    }
}
{
    "firstName": "Olympia",
    "lastName": "Kleith",
    "description": "Author, mother and business owner at Star WEB",
    "@metadata": {
        "@collection": "SearchDocs"
    }
}
{
    "firstName": "Johnny",
    "lastName": "Pritchard",
    "description": "Nobel prize winner for individual recognitions strategies",
    "@metadata": {
        "@collection": "SearchDocs"
    }
}
{
    "firstName": "Mandy",
    "lastName": "Kleith",
    "description": "Actor, most renown for supporting role in Stargate SG1 and strong communication capabilities",
    "@metadata": {
        "@collection": "SearchDocs"
    }
}
{
    "firstName": "Nona",
    "lastName": "Vergemeier",
    "description": "Blogger and traveler, owns a flower making business Nona Flowers",
    "@metadata": {
        "@collection": "SearchDocs"
    }
}
{
    "firstName": "Elena",
    "lastName": "Montigua",
    "description": "Lawyer, advocate for human rights, holds a PhD in International Communications Strategy",
    "@metadata": {
        "@collection": "SearchDocs"
    }
}

Once you’ve added those into the database, it’s time to set up a new index that will be used to search for matching candidates. Similarly to the PostgreSQL part, this index will concatenate the firstName, lastName and description fields. The definition however will be totally different:


map("SearchDocs", (doc) => {
    return {
        firstName: doc.firstName,
        lastName: doc.lastName,
        description: doc.description,
        searchField: doc.firstName + " " + doc.lastName + " " + doc.description
    }
})

RavenDB gives you the option to write indexes with a LINQ-like syntax or with Javascript. I prefer Javascript because it seems to be more concise. Before saving the index, make sure you enable the search capability over searchField:

Now you can save the index and we are ready for the big step: Searching candidates! Yuhuu!

Searching

The queries needed to search for candidates are different but we expect to get similar results for each database.

PostgreSQL query


WITH qConstants (humanSearch) as (
   values ('TERM')
)

SELECT "firstName", "lastName", "description", ts_rank_cd(textsearchable_index_col, query) AS rank
FROM candidates, qConstants, websearch_to_tsquery(humanSearch) query
WHERE textsearchable_index_col @@ query
ORDER BY rank DESC

In this query we define the humanSearch variable in the WITH block. Then we have a query that takes the humanSearch content and returns matching rows from the table also with the default ranking mechanism supported by PostgreSQL. You can replace TERM with whatever content you want to look for.

RavenDB query


from index 'CandidatesSearch' where search(searchField , 'TERM') 

In comparison to the PostgreSQL query, this one is a lot easier. We just specify to get records from CandidatesSearch index where the search method finds matches over the searchField property. Again, you can replace TERM with whatever you want to look for.

It is important to bear in mind that neither of the databases look for matches inside a word, they will match the whole word by default. For instance, if you search for “and” then “Andrei” will not be matched. To enable it:

  • PostgreSQL: replace websearch_to_tsquery in the above query with to_tsquery and append “:*” characters. The search term will be “and:*”
  • RavenDB: add an asterix at the end of the search term to produce something like “and*”. According to their documentation, this will slow down searches.

A couple of search results

Good, let’s do now a couple of searches with both databases and see what they return. As we will notice, there are a couple of differences between them.

TermResults – PostgreSQLResults – RavenDB
John1. “John” “Doe” “Proefficient individual looking towards the blazing stars”1. “John” “Doe” “Proefficient individual looking towards the blazing stars”
business1. “Olympia” “Kleith” “Author, mother and business owner at Star WEB”
2. “Nona” “Vergemeier” “Blogger and traveler, owns a flower making business Nona Flowers”
1. “Olympia” “Kleith” “Author, mother and business owner at Star WEB”
2. “Nona” “Vergemeier” “Blogger and traveler, owns a flower making business Nona Flowers”
blogno resultsno results
support1. “Mandy” “Kleith” “Actor, most renown for supporting role in Stargate SG1 and strong communication capabilities”no results
capability1. “Mandy” “Kleith” “Actor, most renown for supporting role in Stargate SG1 and strong communication capabilities”no results
travel1. “Nona” “Vergemeier” “Blogger and traveler, owns a flower making business Nona Flowers”no results
efficientno resultsno results
mostno results1. “Mandy” “Kleith” “Actor, most renown for supporting role in Stargate SG1 and strong communication capabilities”
communication1. “Mandy” “Kleith” “Actor, most renown for supporting role in Stargate SG1 and strong communication capabilities”
2. “Elena” “Montigua” “Lawyer, advocate for human rights, holds a PhD in International Communications Strategy”
1. “Mandy” “Kleith” “Actor, most renown for supporting role in Stargate SG1 and strong communication capabilities”
make1. “Nona” “Vergemeier” “Blogger and traveler, owns a flower making business Nona Flowers”no results

And the list can go on, but we have enough data to talk about some facts

What conclusions can we draw

We can notice that for words that are fully matched the results are the same. If there is a match inside a word, but not the full word, the result is not returned. This is how both PostgreSQL and RavenDB work in regards to full text search. But we can use the wildcard operator as explained above and this can be mitigated. I think PostgreSQL doesn’t allow it as prefix, only as suffix.

The next thing I wanted to highlight is that for terms that are contained as different forms they are returned by PostgreSQL. This is why in our table we have results returned by Postgres, whereas RavenDB returns nothing. PostgreSQL has some kind of dictionary for English (also available for certain other languages) that detects different forms of words like we can see below:

  1. support => supporting
  2. capability => capabilities
  3. travel => traveler
  4. communication => communications
  5. make => making

Still this doesn’t yield any result for “blog”. I would expect “blogger” to be matched. I think it’s quite a setback for RavenDB to not have such a feature. While it may exist, I think it can be mitigated by using wildcards in searches.

The last thing to notice is that common English words are ignored by PostgreSQL. That’s why if we search for “most”, nothing will be returned by Postgres, while RavenDB returns one result. So I would say that both databases offer very decent full text search capabilities by default and I would say I like the PostgreSQL implementation because it returns more relevant results. RavenDB offers it without much hassle but may lack in terms of customization. I’m sure every application has its own set of requirements regarding search and I can bet that PostgreSQL will handle them with a bit of customization headache, just that RavenDB will work for most scenarios just fine.

Well, I think that’s quite about it for now, Hope you found this interesting. Please subscribe and contact me if you are interested in more articles in the future 😉

Thanks for reading, I hope you found this article useful and interesting. If you have any suggestions don’t hesitate to contact me. If you found my content useful please consider a small donation. Any support is greatly appreciated! Cheers  😉

afivan

Enthusiast adventurer, software developer with a high sense of creativity, discipline and achievement. I like to travel, I like music and outdoor sports. Because I have a broken ligament, I prefer safer activities like running or biking. In a couple of years, my ambition is to become a good technical lead with entrepreneurial mindset. From a personal point of view, I’d like to establish my own family, so I’ll have lots of things to do, there’s never time to get bored 😂

View all posts by afivan →