HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Is the MySQL JSON data type bad for performance for data retrieval?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
thebadtypemysqlforperformancejsondataretrieval

Problem

Let's say I have a MySQL JSON data type called custom_properties for a media table:

An example of the json data stored in the custom_properties column could be:

{
 "company_id": 1, 
 "uploaded_by": "Name", 
 "document_type": "Policy", 
 "policy_signed_date": "04/04/2018"
}


In my PHP Laravel app I would do something like this:

$media = Media::where('custom_properties->company_id', Auth::user()->company_id)->orderBy('created_at', 'DESC')->get();


This would fetch all media items belonging to company 1.

My question is that lets say we have 1 million media records, would this be a bad way to fetch records in terms of performance? Can anyone shed some light on how MySQL indexes JSON data types?

From the MySQL official docs:


JSON documents stored in JSON columns are converted to an internal
format that permits quick read access to document elements. When the
server later must read a JSON value stored in this binary format, the
value need not be parsed from a text representation. The binary format
is structured to enable the server to look up subobjects or nested
values directly by key or array index without reading all values
before or after them in the document.

Solution

So I've made some tests with big data; I created these two tables:

CREATE TABLE `json` (
  `data` json DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

CREATE TABLE `users` (
  `name` varchar(255) DEFAULT NULL,
  `relation_id` int DEFAULT NULL,
  `description` text
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci


the first table is storing its data in a JSON field, the other uses columns.

I filled each table with 1,000,000 records.

I ran these two queries:

select * from users where relation_id > 100 order by relation_id limit 100

select json_extract(data, '$.name') as name,
       json_extract(data, '$.relation_id') as relation_id
       from json
        where json_extract(data, '$.relation_id') > 100 order by relation_id
        limit 100


The first query, which queries the table that does not use JSON, results in

[2022-03-02 00:37:46] 100 rows retrieved starting from 1 in 854 ms (execution: 838 ms, fetching: 16 ms)

the second query which uses JSON, results in:

[2022-03-02 00:42:52] 100 rows retrieved starting from 1 in 1 s 10 ms (execution: 994 ms, fetching: 16 ms)

so obviously it is not that much different in simple cases.

now trying to run aggregate functions, these two queries:

select name, relation_id, description , max(relation_id) as max_relation
from users
where relation_id > 100
group by name, relation_id, description
order by relation_id
limit 100; # executed in 5 s 3 ms

select data -> '$.name'             as name,
       data -> '$.relation_id'      as relation_id,
       data -> '$.description'      as description,
       max(data -> '$.relation_id') as max_relation
from json
where json_extract(data, '$.relation_id') > 100
group by data -> '$.name', data -> '$.relation_id', data -> '$.description'
order by relation_id
limit 100 # executed in 6 s 238 ms


The first query results in:

[2022-03-02 15:23:13] 100 rows retrieved starting from 1 in 5 s 21 ms (execution: 5 s 3 ms, fetching: 18 ms)


The second query results in:

[2022-03-02 15:23:25] 100 rows retrieved starting from 1 in 6 s 238 ms (execution: 6 s 218 ms, fetching: 20 ms)


so looks like there is a noticeable performance difference when using aggregate functions.

Code Snippets

CREATE TABLE `json` (
  `data` json DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci


CREATE TABLE `users` (
  `name` varchar(255) DEFAULT NULL,
  `relation_id` int DEFAULT NULL,
  `description` text
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
select * from users where relation_id > 100 order by relation_id limit 100


select json_extract(data, '$.name') as name,
       json_extract(data, '$.relation_id') as relation_id
       from json
        where json_extract(data, '$.relation_id') > 100 order by relation_id
        limit 100
select name, relation_id, description , max(relation_id) as max_relation
from users
where relation_id > 100
group by name, relation_id, description
order by relation_id
limit 100; # executed in 5 s 3 ms

select data -> '$.name'             as name,
       data -> '$.relation_id'      as relation_id,
       data -> '$.description'      as description,
       max(data -> '$.relation_id') as max_relation
from json
where json_extract(data, '$.relation_id') > 100
group by data -> '$.name', data -> '$.relation_id', data -> '$.description'
order by relation_id
limit 100 # executed in 6 s 238 ms
[2022-03-02 15:23:13] 100 rows retrieved starting from 1 in 5 s 21 ms (execution: 5 s 3 ms, fetching: 18 ms)
[2022-03-02 15:23:25] 100 rows retrieved starting from 1 in 6 s 238 ms (execution: 6 s 218 ms, fetching: 20 ms)

Context

StackExchange Database Administrators Q#203037, answer score: 9

Revisions (0)

No revisions yet.