December 22, 2024

RavenDB noSQL โ€“ group by SQL equivalent explained

Hello guys and thank you for your interest in a new article about RavenDB. Hope you are enjoying the hot summer to the maximum! Today I was envying the birds who can fly in the sky and thus avoiding the heat. They also get a nice overview of the places they fly over. Ravens, I particularly like them because they are smart and can predict behavior of other animals. Similar to us people, they group together …

Leaving biology aside, did I just say raven and group? Yes, this leads me to the question of how to perform a GROUP BY in RavenDB with indexes. Because there is a group statement in Raven, but behind the hood it uses an index to perform this operation, we will check out to see how it works. Might be confusing at first, but it makes sense once you get used to it. In this article I’m going to show you how to do it! Regarding manual index creation, I prefer Javascript since you have a greater deal of control and I’m working on an application that is written in Typescript/Javascript so it’s much easier to create indexes in JS ๐Ÿ˜€

GROUP BY example in SQL

Let’s see a sample GROUP BY clause in SQL (Postgres syntax). We have a table with people, each person with a name and location. We want to retrieve the number of people for each location. The SQL would look like this:


create table people (id integer, name varchar(100), location varchar(100));
  insert into people (id, name, location) values (1, 'Chris Borrington', 'IR');
  insert into people (id, name, location) values (2, 'Maria Muierescu', 'RO');
  insert into people (id, name, location) values (3, 'Jakamoto Sujicucu', 'JP');
  insert into people (id, name, location) values (4, 'Johnny Bravo', 'US');
  insert into people (id, name, location) values (5, 'Natzaki Sulaki', 'JP');
	
select location, COUNT(*) from people GROUP BY location;

Running the above query will give us the following results:

As you can see we have 2 people from JP and for the rest we have only 1 person. The GROUP BY query is very simple and returns expected results.

The group statement in RavenDB

Now to the RavenDB part: let’s assume we have the same data structure as with the SQL example. If we go to the query editor and write the following query, it will return the same results as the SQL counterpart. And the query is quite the same:

from 'People'
group by location
select count(), location

If we run the above query, we will the the very same results:

It’s interesting what happens behind the hood when it comes to RavenDB. If you have a larger data set, you may notice that the first run will take a bit longer but subsequent runs of this query will run much faster not because of the cache, but because RavenDB actually creates an automatic index to handle this request for you. If you go to the Indexes section of your database, you’ll see this new index called Auto/People/ByCountReducedByLocation:

If you open this index, you’ll see it’s some kind of visually defined index which takes data from the People collection, it reduces by the location key and applies the Count() result:

How to create the same index but manually

Now that we have this index automatically created, why even bother with a manual one? Well, in a real world scenario you’ll for sure need to implement a more advanced index to tackle business requirements. By using a manual index with Javascript, you’ll for sure be able to implement those requirements ๐Ÿ˜‰

Let’s now jump to the index creation by selecting New Index from the RavenDB Studio. This new index will have the name “PersonsCountByLocation” with both map and reduce which look like this:

// Map
map('People', p => {
    return {
        id: p['@metadata']['@id'],
        name: p.name,
        location: p.location
    };
})

// Reduce
groupBy(p => ({ location: p.location }))
.aggregate(g => { 
    return {
        location: g.key.location,
        count: g.values.length,
    };
})

You can now save this new index and then see the results by pressing the corresponding Query button. They would look like this:

As you can observe, we’ve got the same results, except that they are not sorted. We can add an order by count in the query, but we’ll get a message stating that “count” it’s not an indexed field. However, the “location” field is indexed by default since it’s part of a reduce operation. However, we can go back to the index to edit the indexed fields as below and the query with order by count will work:

Try to rerun the query and you can see the order by now works:

Conclusion

I really like RavenDB for its speed and reliability. It can handle any kind of workload in any situation and will not sit on laurels when times are harsh ๐Ÿ™‚ Being a noSQL database system, it’s clear that the paradigms are different than a traditional SQL database but Raven makes it as easy as possible to adapt to it. As proof, you saw how similar the group by query was. Still, the indexing part can be confusing so I totally recommend to understand it properly in order to use it effectively. So there you go, I hope you found this article useful ๐Ÿ˜

Thanks for reading, I hope you found this article useful and interesting. If you have any suggestions donโ€™t hesitate toย contact me. If you found my content useful please consider a small donation. Any support is greatly appreciated! Cheersย ย ๐Ÿ˜‰

afivan

Enthusiast adventurer, software developer with a high sense of creativity, discipline and achievement. I like to travel, I like music and outdoor sports. Because I have a broken ligament, I prefer safer activities like running or biking. In a couple of years, my ambition is to become a good technical lead with entrepreneurial mindset. From a personal point of view, Iโ€™d like to establish my own family, so Iโ€™ll have lots of things to do, thereโ€™s never time to get bored ๐Ÿ˜‚

View all posts by afivan →