HiveBrain v1.2.0
Get Started
← Back to all entries
snippetModerate

Good example of MDX vs SQL for analytical queries

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
sqlexamplemdxforgoodanalyticalqueries

Problem

Can anybody show me a good example of MDX's advantages over regular SQL when doing analytical queries? I would like to compare an MDX query with an SQL query that gives similar results.

Wikipedia says:


While it is possible to translate some of these into traditional SQL,
it would frequently require the synthesis of clumsy SQL expressions
even for very simple MDX expressions.

But there is neither a citation nor example. I am fully aware the underlying data must be organized differently, and OLAP will require more processing and storage per insert. (My proposal is to move from an Oracle RDBMS to Apache Kylin + Hadoop)

Context: I am trying to convince my company that we should be querying an OLAP database instead of an OLTP database. Most SIEM queries make heavy use of group-by, sort, and aggregation. Besides the performance boost, I think OLAP (MDX) queries would be more concise and easier to read/write than the equivalent OLTP SQL. A concrete example would drive the point home, but I am not an expert at SQL, much less MDX...

If it helps, here is a sample SIEM-related SQL query for firewall events that happened in the past week:

```
SELECT 'Seoul Average' AS term,
Substr(To_char(idate, 'HH24:MI'), 0, 4)
|| '0' AS event_time ,
Round(Avg(tot_accept)) AS cnt
FROM (
SELECT *
FROM st_event_100_#yyyymm-1m#
WHERE idate BETWEEN trunc(sysdate, 'iw')-7 AND trunc(sysdate, 'iw')-3 #stat_monitor_group_query#
UNION ALL
SELECT *
FROM st_event_100_#yyyymm#
WHERE idate BETWEEN trunc(sysdate, 'iw')-7 AND trunc(sysdate, 'iw')-3 #stat_monitor_group_query# ) pm
GROUP BY substr(to_char(idate, 'HH24:MI'), 0, 4)
|| '0'
UNION ALL
SELECT 'today' AS term ,
substr(to_char(idate, 'HH24:MI'), 0, 4)
|| '0' AS event_time ,
ro

Solution

MDX and SQL are in no way the same, and often not even comparable, as they are querying multidimensional and relational databases respectively. You cannot query your existing relational database with MDX.

The main advantage of using a multidimensional model and using MDX to query it is that you are querying pre-aggregated data and that MDX is optimized to query in a statistical way rather than a relational way. You no longer query rows and tables to produce a flat result set but you are using tuples and sets to slice and aggregate a multidimensional cube.

Think of it like this: if you use a SQL query to get the total sales amount for a particular item group you would need to write a query that sums up all invoice lines for all items in the item group. If you are using a cube and have aggregations on the item group level the result is calculated during processing and the aggregations are stored for each item group, making queries instantaneous.

Multidimensional and MDX is an entirely different concept from relational set-based SQL.

Your example might become a lot simpler because you would be doing the transformations such as the date parsing during your data load process and your last month comparison could be a calculated measure. Your seoul average and today could be calculated members

If your cubes are well designed for your requirements I believe you could be slicing and dicing your example's data set without even needing to write queries but do it in a pivottable or another analysis tool.

Then again there is no "just rewriting SQL in MDX". It requires a fair bit of knowledge to do it right and a different mindset. Think venn-diagrams instead of result sets.

To provide you with an example using the adventureworks database, imagine the requirement to list the number of sales orders by customer in the category bikes.

If you did that using SQL you would need to write a query that counts the number of sales orders containing a line with a product that happens to be of the category bikes and join that to the customers table, so that would become a fairly complex query.

-- need distinct count, we're counting orders, not order lines
SELECT count(DISTINCT soh.salesorderid)
    ,pers.FirstName + ' ' + pers.LastName
FROM sales.SalesOrderDetail sod
-- we need product details to get to the category
INNER JOIN Production.Product p ON sod.ProductID = p.ProductID
-- but we need to pass via subcategories
INNER JOIN Production.ProductSubcategory psc ON p.ProductSubcategoryID = psc.ProductSubcategoryID
-- we finally get to the category
INNER JOIN Production.ProductCategory pc ON psc.ProductCategoryID = pc.ProductCategoryID
-- we also need the headers because that's where the customer is stored
INNER JOIN sales.SalesOrderHeader soh ON sod.SalesOrderID = soh.SalesOrderID
-- finally the customer, but we don't have his name here
INNER JOIN sales.Customer c ON soh.CustomerID = c.CustomerID
-- customers
INNER JOIN Person.Person pers ON c.PersonID = pers.BusinessEntityID
-- filter on bikes
WHERE pc.Name = 'bikes'
-- but the customers table doesn't contain the concatenated name
GROUP BY pers.FirstName + ' ' + pers.LastName;


In MDX (provided your cube is well designed for this requirement) you could just write because the logic and complexity has moved elsewhere:

SELECT [Measures].[Internet Order Count] ON COLUMNS,
[Customer].[Customer].Members ON ROWS
FROM [Adventure Works]
WHERE [Product].[Product Categories].[Category].[Bikes]

Code Snippets

-- need distinct count, we're counting orders, not order lines
SELECT count(DISTINCT soh.salesorderid)
    ,pers.FirstName + ' ' + pers.LastName
FROM sales.SalesOrderDetail sod
-- we need product details to get to the category
INNER JOIN Production.Product p ON sod.ProductID = p.ProductID
-- but we need to pass via subcategories
INNER JOIN Production.ProductSubcategory psc ON p.ProductSubcategoryID = psc.ProductSubcategoryID
-- we finally get to the category
INNER JOIN Production.ProductCategory pc ON psc.ProductCategoryID = pc.ProductCategoryID
-- we also need the headers because that's where the customer is stored
INNER JOIN sales.SalesOrderHeader soh ON sod.SalesOrderID = soh.SalesOrderID
-- finally the customer, but we don't have his name here
INNER JOIN sales.Customer c ON soh.CustomerID = c.CustomerID
-- customers
INNER JOIN Person.Person pers ON c.PersonID = pers.BusinessEntityID
-- filter on bikes
WHERE pc.Name = 'bikes'
-- but the customers table doesn't contain the concatenated name
GROUP BY pers.FirstName + ' ' + pers.LastName;
SELECT [Measures].[Internet Order Count] ON COLUMNS,
[Customer].[Customer].Members ON ROWS
FROM [Adventure Works]
WHERE [Product].[Product Categories].[Category].[Bikes]

Context

StackExchange Database Administrators Q#138311, answer score: 11

Revisions (0)

No revisions yet.