December 21, 2024

Many to one field GROUP BY using RavenDB index

Performing a GROUP BY on a single field is pretty easy. The plan today is to leverage an advanced map reduce index to GROUP BY a many to one field. This will add complexity to the classic SQL query, but the RavenDB counterpart simply handles it really well.

As well as with the other tutorials from the series, I’m going to use the same app structure hosted in Github and the code for this tutorial will be pushed to the tutorial6-advanced-group-by branch. If you are unfamiliar with the concepts, please start from the first tutorial to familiarize with the concepts. You can also read more about indexes in the official RavenDB page.

The data we’re going to use

As with the previous tutorials, I’m going to use the same simple dataset which stores info about movies. A movie contains the name, year and a list of associated tags:

In order to make it easier for people with a SQL background, the equivalent table structure for PostgresSQL looks like this:

I guess the tags could also be stored as text arrays in Postgres, but I don’t know if other SQL offerings support that so I preferred to stay classy in this example.

What do we want to query?

I was contemplating about a single query which doesn’t seem like much still it can give some headaches: let’s say we want all the movies grouped by tags. Well how would we accomplish that in SQL? We know that we have some basic aggregate functions like COUNT or SUM, still they won’t help because we want a full list of those movies. So starting from this type of query we need to elaborate:

SELECT t.tag, COUNT(m) FROM "tags" t
INNER JOIN movies m ON m.id = t.movie_id
GROUP BY t.tag

Instead of using COUNT, we need some function that aggregates the rows into a JSON list. Luckily, we have in Postgres the JSON_AGG function which can be combined with JSON_BUILD_OBJECT in order to build a JSON object array. Hence, the new query would look like this:

SELECT t."tag", JSON_AGG(JSON_BUILD_OBJECT('id', t."movie_id", 'name', m."name", 'year', m."year")) as "movies" 
FROM "tags" t
INNER JOIN "movies" m ON m."id" = t."movie_id"
GROUP BY t."tag"

This will yield the expected movie list per each and every tag.

NOTE: your JSON aggregate function may be different depending on your SQL database, here I’m exemplifying with PostgreSQL. I’m sure all modern SQL systems have similar functions which you can leverage.

Doing it with map-reduce in RavenDB is no elbow grease

YES! You got that right! So applying the same in RavenDB is for sure different and requires another approach but it’s not complicated at all! If you followed through the last tutorial, practically half of the work is done.

Jumping straight into business, let’s code the MovieGroupByYearIndex:

export class MovieGroupByTagMap {
  public tag: string;
  public movies: MovieEntity[];
}

export class MovieGroupByTagIndex extends AbstractJavaScriptIndexCreationTask<
  MovieEntity,
  MovieGroupByTagMap
> {
  constructor() {
    super();

    this.map(new MovieEntity().collectionName, (doc) => {
      const tags = doc.tags || [];
      return tags.map(t => ({
        tag: t,
        movies: [doc],
        count: 1,
      }));
    });

    this.reduce((res) => {
      return res
        .groupBy((x) => x.tag)
        .aggregate((g) => ({
          tag: g.key,
          movies: g.values.reduce((movies, val) => [...movies, ...val.movies], []),
          count: g.values.length || 0,
        }));
    });
  }
}

In the map function, the trick is that you can return not just one document, but an array of documents. This way we return the same document multiple times depending on the tags it has. Then the reduce operation will group by each tag and append all the movies into a single array. I also added a count field just to have a count of each movies per tag. Don’t forget to add your index to the execution list in PersistenceService:

The next step is to add our index function call in the MovieRepo file which will look like this:

public async retrieveMoviesGroupByTag(): Promise<MovieGroupByTagMap[]> {
  const session = this.documentStore.openSession();

  const query = session.advanced
      .rawQuery<MovieGroupByTagMap>(
          `from index '${MovieGroupByTagIndex.name}'`,
      )
      .projection('FromIndex');

  const results = await query.all();

  results.forEach(r => r.movies.forEach(this.metadataRemove));

  session.dispose();

  // To remove @metadata which is unnecessary
  return results.map(this.metadataRemove);
}

Finally, the natural step is to add a new endpoint in MovieController and test it out with Postman:

@Get('by/tagMovies')
async getMoviesGroupByTag() {
  return await this.movieRepo.retrieveMoviesGroupByTag();
}

Conclusion

As you can see, performing SQL equivalents with map-reduce operations is not that scary at all, it just requires a bit of a change in the approach and mindset. Armed with these new acquired skill, you can tackle lots of interesting scenarios right off RavenDB, leveraging its unexplored capabilities. Hope you find this useful and I wait for your feedback in the comments section below. Alternatively, support this page by buying me a coffee and by subscribing to the newsletter. Cheers!

Thanks for reading, I hope you found this article useful and interesting. If you have any suggestions don’t hesitate to contact me. If you found my content useful please consider a small donation. Any support is greatly appreciated! Cheers  😉

afivan

Enthusiast adventurer, software developer with a high sense of creativity, discipline and achievement. I like to travel, I like music and outdoor sports. Because I have a broken ligament, I prefer safer activities like running or biking. In a couple of years, my ambition is to become a good technical lead with entrepreneurial mindset. From a personal point of view, I’d like to establish my own family, so I’ll have lots of things to do, there’s never time to get bored 😂

View all posts by afivan →