October 30, 2024

RavenDB many to many relationships – the most efficient way

In my last article I showcased a very straightforward way of implementing a one to many relationship in RavenDB with an index. This is the equivalent of a SQL join but in RavenDB “terms”, which can be further expanded to design a many-to-many relationship. Stay tuned as I explain a plain empiric way of building such a many-to-many relation but also the most efficient way that I can imagine with the power and flexibility of RavenDB indexing.

Like all the articles of this RavenNest series I will rely on the same simple Movies database structure which can be found here (RavenDB backup file). Since an actor can have multiple movies as well, it can be seen as a natural expansion of the previous article.

The plain empiric way

Many requirements can have multiple solutions, which may simply fall into empiric or heuristic categories. I definitely prefer the latter, since I enjoy having a challenge and I don’t give in so easily until I find the most efficient implementation of something. What I see as empiric in this scenario is the equivalent implementation in SQL transposed into our RavenDB database. Basically, in order to implement many-to-many relationships in SQL you would rely on a third table which acts like a bond between the other 2 connecting tables. SQL databases would rely on foreign keys for that join table to ensure data integrity. A schema of such a design would look like this:

You can notice the MovieActor join table which contains the movieId and actorId fields binding a movie to an actor and vice-versa.

What is the issue with this approach? Well, it clutters the tablespace with tables that are not relevant for the structure and it can also lead to performance problems on huge databases since JOINs are costly by nature.

Practically, nothing stops you to implement the same design in RavenDB with a join collection but I definitely believe it’s not a good approach. Why go noSQL if you use the same DB mindset? Going to noSQL databases requires a mindset switch or a blank slate.

The second plain empiric way

Hehe, I didn’t tell you from the beginning that there is another second empiric way of implementing it. Going back to the one-to-many article from last time you can observe that a movie can contain many actors, hence an actor can be in only one movie. Each movie in Movies collection has an actors array with the IDs of the actors who acted in the movie.

If we want to expand to a many-to-many relationship, why wouldn’t we add a movies array inside each actor entity as well? The matter would be sorted out that way and it’s not hard to implement and no other join collection is necessary right?

What is the issue with this approach? The problem wouldn’t be a performance problem but a data consistency issue if something goes wrong in code or in a problematic transaction. In this case we have to update data in 2 places hence the principle of single source of truth is broken. If we add a movie to an actor but we forget to do the same thing the other way around then we will encounter consistency issues so I don’t recommend this approach.

The most efficient way

What I expect from an efficient approach is to be:

  • Fast regardless of the database size
  • Single source of truth, meaning that same data should not be scattered about
  • No additional collection clutter, meaning that adding another entity collection just to store this kind of relationship is overkill
  • Flexible, in the sense that our data can be manipulated for the scope of presentation as effortlessly as possible

All of the above expectations fulfilled in this approach? I would say yes, so let’s proceed with the RavenDB index implementation 😀

Database structure

First of all, the database structure remains unchanged: each Movie entity will have this actors array with actor IDs. Two movies may have the same actor ID inside the actors array, which will enable the many-to-many relationship.

We store just the IDs because we don’t want to fill up unnecessary space and to ensure the retrieved data has the actor details up to date with the corresponding entry in the collection.

Our magic index construction

It’s actually amazing how powerful the RavenDB indexes are once you understand them conceptually (I use the Javascript indexes for better results). I see an index as some sort of transmutation of data in order to facilitate the query process and presentation capabilities of the application. Just like a ViewModel would do. Now let’s see how we can retrieve actors as well with the movies where they acted.

In our scenario I’m using an index with two maps and a reduce. RavenDB allows you to have as many maps as you like starting from different collections as long as their returned properties are the same. It also allows you to return an array of results for one map, which is very interesting and useful for our implementation. The reduce operation is the equivalent of GROUP BY in SQL and is used to group the results from our maps based on a property value and then aggregate the grouped results.

Map 1

map('Actors', (doc) => {
    return {
        id: doc['@metadata']['@id'],
        name: doc.name,
        movies: [],
    };
})

What happens here is that we take all actors from the Actors collection and return an object containing the id, name and an empty movies array. At this stage we don’t have movies and we initialize this property with an empty value just to conform to the RavenDB requirements of having the same type of results for each map.

Map 2

map('Movies', (doc) => {
    return doc.actors.map(a => { 
        return {
            id: a,
            name: null,
            movies: [{
                name: doc.name,
                tags: doc.tags,
                year: doc.year,
            }],
        }
    });
})

In this second map, we start from the Movies collection by mapping each actor in every movie to an object containing the actor ID, the actor name (which is null at this stage and will be filled by the reduce) and the movies array with current movie information.

The reduce operation

This is the stage where we conclude the data and join it to return our desired result:

groupBy(x => x.id).aggregate(g => {
    const movies = g.values.flatMap(a => a.movies);
    return {
        id: g.key,
        name: g.values.find(a => a.name !== null),
        movies: movies,
    };
})

From the Map 1 and 2 we observe that we have common properties but data is missing for either one of them except for the id:

  • Map 1 is missing the movies but has the actor name
  • Map 2 is missing the actor name but has the movies

The actor ID is the only common part between them. This is why we need the reduce stage, to bring results by actor ID in the same group and return the right result with all the complete data. In g.key we get the key by which the groupBy was determined (in our case actor ID) and g.values will return all the results from the Map 1 and Map 2 which have the same actor ID (key).

We use a flatMap to put all the movie data into a flat array and for the name we use a find since only the element returned by Map 1 will have the name. This is how we complete the data and return the final result. This is the whole index overview:

Testing the index

My philosophy is “Don’t implement anything without testing” so let’s actually see how this behaves. If you query the index you’ll notice all actors retrieved correctly, but only one movie for each one. This is because the database is small, I just didn’t add that many things there.

Let’s pick this famous Jason Momoa and add another movie for him. Just go to “New Document” and add this:

{
    "name": "Sweet Girl",
    "year": 2021,
    "tags": [
        "action",
        "drama"
    ],
    "actors": [
        "483660ed-e767-498f-b678-ac3210943712"
    ],
    "@metadata": {
        "@collection": "Movies",
        "@nested-object-types": {},
        "Raven-Node-Type": null
    }
}

You notice in actors array we have this unique ID in place. The other fields are just ordinary movie details. Save the document and query again the index, you should see something like this:

Okay, so there you go, we achieved a many-to-many relationship inside RavenDB 😎 🤝

Conclusion

It’s been nice to finally figure it out in the end. This many-to-many relationship has been bugging me for some time and I hope it can help in your case too. Should you have any questions or suggestions regarding the implementation just leave a comment or contact me!

Thanks for reading, I hope you found this article useful and interesting. If you have any suggestions don’t hesitate to contact me. If you found my content useful please consider a small donation. Any support is greatly appreciated! Cheers  😉

afivan

Enthusiast adventurer, software developer with a high sense of creativity, discipline and achievement. I like to travel, I like music and outdoor sports. Because I have a broken ligament, I prefer safer activities like running or biking. In a couple of years, my ambition is to become a good technical lead with entrepreneurial mindset. From a personal point of view, I’d like to establish my own family, so I’ll have lots of things to do, there’s never time to get bored 😂

View all posts by afivan →