HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

A relational algebra extended to model the full DML ("CRUD") domain

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
thefulldomainextendedrelationaldmlalgebramodelcrud

Problem

There are multiple references about the relational algebra for modeling queries (SELECT) but I have found very very little on the expanded algebra that would include concepts in all of DML such as INSERT, UPDATE, DELETE and maybe even MERGE, DECLARE, SET (as defined by MS SQL Server, whenever I must choose). I'm looking for information about such an expanded algebra to help me reason about DML operations.

I assume that MS SQL Server, for example, must have some meta-domain model and/or type system that helps it validate SQL Scripts. Here are a few of SQL SERVER 2008's SYS.MESSAGES (issued for scripted DML operations) that suggest to me violations of a type or domain-rule nature, not syntactic:

286 - The logical tables INSERTED and DELETED cannot be updated.
417 - TOP is not allowed in an UPDATE or DELETE statement against a partitioned view.
10729 - A nested INSERT, UPDATE, DELETE, or MERGE statement is not allowed in a SELECT statement that is not the immediate source of rows for an INSERT statement.
354 - The target '%.*ls' of the INSERT statement cannot be a view or common table expression when the FROM clause contains a nested INSERT, UPDATE, DELETE, or MERGE statement.
4005 - Cannot update columns from more than one underlying table in a single update call.


The database engine is compiling and analyzing the scripted commands it is given and emitting these messages in response to violations against its domain.

An engine executing this following statement isn't merely tacking on something trivial to the join of Table1 and Table2. Beyond what it needed to SELECT, for the UPDATE it has to track more information throughout the statement such as the actual tables that ColB and ColC in the resulting join came from so that 1) it knows that they are from the same table (Error #4005) and 2) so it can actually write the results to somewhere permanent (not just a local transformation of the join):

```
UPDATE Table2
SET
Col

Solution

Excellent question, and since you referred to us ("jOOQ developers", which I am - working for the company behind jOOQ), I feel qualified to give a partial answer.

A bit of historic context first

Since the very beginning of software, there had been:

  • Theory (which is what "Computer Science", i.e. this Stack Exchange subsite is about)



  • Practice (more like Stack Overflow, i.e. "how can I get the compiler to stop barking at me")



The origins of set theory, relational algebra, and relational calculus were clearly on the theoretical side. Up to this day, popular RDBMS implementations have only managed to approximate this theory, mostly through SQL.

Although there had been competing languages before SQL was standardised mainly by Oracle and IBM, the power of SQL lay in the fact that it was a very practical language, which worked sufficiently well both for computer programs (e.g. written in COBOL, which was an equally practical language), software engineers, and most importantly: "other people", including analysts, project managers, and also managers. Everyone can quickly write up a SELECT * FROM person WHERE first_name = 'John' query. More theoretical languages like QUEL, which would have been closer to relational calculus, lacked the business vision that Oracle had.

Since then, SQL has emancipated from the outdated relational model and embraced a lot of alternative models, including procedural extensions (stored procedures), OLAP features, XML, JSON, etc. Not all of these enhancements were very well executed. To this day, the XML integration, for instance, is rather weak, poorly standardised and hard to grasp. I'm just speculating that the ANSI/ISO/IEC SQL standards committee and the w3c didn't work together closely enough to truly embed XQuery into SQL at the time. Unfortunately for w3c, as hardly anyone is using XQuery today, while everyone still uses SQL.

The historic irony here is that a lot of innovation came from academia, and I'd like to specifically mention Michael Stonebraker (Quel, Ingres, the original Postgres, Vertica, H-Store, and much more) who was recently awarded with the well-deserved turing award. Stonebraker has always been innovating into new areas. Most of the recent in-memory column store "hype" (also often referred to as "NewSQL") evolved around him. Stonebraker is a good example of the difference between academia and business. We've blogged about this some time ago here:

  • http://blog.jooq.org/2013/08/24/mit-prof-michael-stonebraker-the-traditional-rdbms-wisdom-is-all-wrong (very interesting talk by Stonebraker)



  • http://blog.jooq.org/2013/08/27/column-stores-teaching-an-old-elephant-new-tricks



When you hear the excellent talks by Oracle's Andrew Mendelsohn for instance, or other influencers from companies like Microsoft, SAP, IBM, you will notice that business has always embraced SQL for very good reasons, which were often not at all academic.

Context for your question

You have to differentiate between around four levels of abstraction:

  • Relational calculus



  • The SQL standard



  • The SQL Server specification



  • The SQL Server implementation



Your example is a particularly interesting one, because UPDATE .. SET .. FROM is not part of the latest SQL:2011 standard. To our knowledge, the UPDATE .. FROM clause is supported in at least:

  • SAP HANA



  • Ingres



  • Postgres



  • SQL Server



  • Sybase



Other databases allow for updating views, e.g.

  • UPDATE t1 JOIN t2 ON ... SET ...



  • UPDATE (SELECT * FROM t1 JOIN t2 ON ...) SET ...



In terms of Codd's Rules, the following two would apply:


Rule 6: The view updating rule:


All views that are theoretically updatable must be updatable by the system.


Rule 7: High-level insert, update, and delete:


The system must support set-at-a-time insert, update, and delete operators. This means that data can be retrieved from a relational database in sets constructed of data from multiple rows and/or multiple tables. This rule states that insert, update, and delete operations should be supported for any retrievable set rather than just for a single row in a single table.

So, Codd demands that an RDBMS can update views using sets, but he wouldn't want to specify how to actually do that (e.g. syntax-wise).

In fact, the concrete rules which apply to these kinds of statement are rather different from database to database. This has nothing to do with the SQL language, or with relational calculus per se, but simply with the individual implementations of each database - often also because of how the locking / concurrency model is implemented - something that is left open to implementations by both Codd's Rules as well as to some extent by the SQL standard. For instance, the standard does not explicitly refer to MVCC.

In the particular case of UPDATE .. FROM, think of it this way (assumptions only):

  • Some customer probably wanted to be able to write UPDATE statements that take into account more than one table



  • Some da

Context

StackExchange Computer Science Q#43672, answer score: 8

Revisions (0)

No revisions yet.