patternMinor
Is there a fundamental CS problem in ORMs that leads to N+1?
Viewed 0 times
problemormsthatfundamentalthereleads
Problem
Many of us are familiar with N+1 problem when working with database queries. The problem was known before ORMs (Object-relational mapping frameworks) came around, but it seems that ORMs exacerbated it.
The problem goes like this. If your database has a table of cars and each car has a list of wheels (stored in another table), if you first query for all cars (1) and then for each car you query for its wheels (N) you get orders of magnitude less efficient query because you in fact have N+1 queries, whereas you could just have one.
I'll show you one scenario in which it is difficult for a less experienced developer to spot that they have an N+1 problem while using ORM.
Let's assume we have the following tables definition in Microsoft Sql Server:
There is nothing fancy here, the main thing is that we have two tables that have a foreign key relationship.
Now let's assume that we use EntityFramework, and somewhere in the code we have a line like this:
Now the scores list gets passed around a few methods, and somewhere else it's used like this:
This last line will cause a separate query to be executed for each Article object. When the number of objects are large, the overhead as we all know serious.
Full example that can be run and examine
The problem goes like this. If your database has a table of cars and each car has a list of wheels (stored in another table), if you first query for all cars (1) and then for each car you query for its wheels (N) you get orders of magnitude less efficient query because you in fact have N+1 queries, whereas you could just have one.
I'll show you one scenario in which it is difficult for a less experienced developer to spot that they have an N+1 problem while using ORM.
Let's assume we have the following tables definition in Microsoft Sql Server:
CREATE TABLE [dbo].[Article](
[Id] [int] IDENTITY(1,1) NOT NULL,
[Title] [nvarchar](max) NULL,
CONSTRAINT [PK_dbo.Article] PRIMARY KEY CLUSTERED ([Id] ASC)
)
GO
CREATE TABLE [dbo].[ArticleScore](
[Id] [int] IDENTITY(1,1) NOT NULL,
[ArticleId] [int] NOT NULL,
[ActualCity] [nvarchar](10) NULL,
[Score] [int] NOT NULL,
CONSTRAINT [PK_dbo.ArticleScore] PRIMARY KEY CLUSTERED ([Id] ASC)
)
GO
ALTER TABLE [dbo].[ArticleScore] WITH CHECK ADD CONSTRAINT [FK_dbo.ArticleScore_dbo.Article_ArticleId] FOREIGN KEY([ArticleId])
REFERENCES [dbo].[Article] ([Id])
GOThere is nothing fancy here, the main thing is that we have two tables that have a foreign key relationship.
Now let's assume that we use EntityFramework, and somewhere in the code we have a line like this:
var scores = context.ArticleScores.Where(x => x.Score < 100).ToList();Now the scores list gets passed around a few methods, and somewhere else it's used like this:
var filtered = scores.Where(x => x.Article.Title != "trash").ToList();This last line will cause a separate query to be executed for each Article object. When the number of objects are large, the overhead as we all know serious.
Full example that can be run and examine
Solution
Challenge #1: Imperative code
You've shown code that is in a nicely functional form. But in many languages, in many cases, the code won't be in that nice form. Imagine if instead of
the code is instead
Now you're hosed: you're stuck with n+1 queries, and there's no hope for the ORM to avoid that. In many (imperative) programming languages, you'd be lucky for the code to show up in the former form; it'll more typically be in the latter form.
Challenge #2: Static analysis of code
Let's go back to your original example:
How can the ORM transform this into a single SQL query? To do that, it would have to do non-trivial static analysis of the source code. This can't be handled just through a simple library.
From the ORM's perspective, it is passed a function that determines which articles should be kept; it knows nothing about what the function does -- it can just call the function. That's not enough to do the transformation you want. To do the transformation you want, one would have to inspect the code of that function and figure out what it is doing and then somehow turn that into a SQL query. That is decidedly non-trivial (and might require both compiler support and non-trivial static analysis algorithm).
Put another way, from the ORM's perspective, the above code is equivalent to
where the ORM knows nothing about what
Fixing this would require a way to take code written in the underlying language and compile it to SQL. That basically means building a second compiler for the underlying programming language, and it's not easy.
In some languages, the ORM library actually obtains more than just a black-box function
You've shown code that is in a nicely functional form. But in many languages, in many cases, the code won't be in that nice form. Imagine if instead of
var filtered = scores.Where(x => x.Article.Title != "trash").ToList();the code is instead
var filtered = new List();
for (x in scores)
if (x.Article.Title != "trash")
filtered.append(x);Now you're hosed: you're stuck with n+1 queries, and there's no hope for the ORM to avoid that. In many (imperative) programming languages, you'd be lucky for the code to show up in the former form; it'll more typically be in the latter form.
Challenge #2: Static analysis of code
Let's go back to your original example:
var filtered = scores.Where(x => x.Article.Title != "trash").ToList();How can the ORM transform this into a single SQL query? To do that, it would have to do non-trivial static analysis of the source code. This can't be handled just through a simple library.
From the ORM's perspective, it is passed a function that determines which articles should be kept; it knows nothing about what the function does -- it can just call the function. That's not enough to do the transformation you want. To do the transformation you want, one would have to inspect the code of that function and figure out what it is doing and then somehow turn that into a SQL query. That is decidedly non-trivial (and might require both compiler support and non-trivial static analysis algorithm).
Put another way, from the ORM's perspective, the above code is equivalent to
var filtered = scores.Where(f).ToList();where the ORM knows nothing about what
f does; all the ORM can do is invoke f on values of its choice. When you don't know what f does, there's no way to transform that into a single SQL query.Fixing this would require a way to take code written in the underlying language and compile it to SQL. That basically means building a second compiler for the underlying programming language, and it's not easy.
In some languages, the ORM library actually obtains more than just a black-box function
f: by using support for domain-specific languages or reflection, the ORM library might be able to obtain a parse tree for f. In this case, the ORM library has more to work with, and in principle the ORM library could use static analysis of the parse tree of f to try to compile it to a single SQL query. It will be language-dependent whether an ORM library can do this, purely as a library, without compiler support. So in some languages achieving what you want will be easier than in others.Code Snippets
var filtered = scores.Where(x => x.Article.Title != "trash").ToList();var filtered = new List();
for (x in scores)
if (x.Article.Title != "trash")
filtered.append(x);var filtered = scores.Where(x => x.Article.Title != "trash").ToList();var filtered = scores.Where(f).ToList();Context
StackExchange Computer Science Q#57044, answer score: 3
Revisions (0)
No revisions yet.