patternsqlMinor
Is the MySQL JSON data type bad for performance for data retrieval?
Viewed 0 times
thebadtypemysqlforperformancejsondataretrieval
Problem
Let's say I have a MySQL JSON data type called
An example of the json data stored in the
In my PHP Laravel app I would do something like this:
This would fetch all media items belonging to company 1.
My question is that lets say we have 1 million media records, would this be a bad way to fetch records in terms of performance? Can anyone shed some light on how MySQL indexes JSON data types?
From the MySQL official docs:
JSON documents stored in JSON columns are converted to an internal
format that permits quick read access to document elements. When the
server later must read a JSON value stored in this binary format, the
value need not be parsed from a text representation. The binary format
is structured to enable the server to look up subobjects or nested
values directly by key or array index without reading all values
before or after them in the document.
custom_properties for a media table:An example of the json data stored in the
custom_properties column could be:{
"company_id": 1,
"uploaded_by": "Name",
"document_type": "Policy",
"policy_signed_date": "04/04/2018"
}In my PHP Laravel app I would do something like this:
$media = Media::where('custom_properties->company_id', Auth::user()->company_id)->orderBy('created_at', 'DESC')->get();This would fetch all media items belonging to company 1.
My question is that lets say we have 1 million media records, would this be a bad way to fetch records in terms of performance? Can anyone shed some light on how MySQL indexes JSON data types?
From the MySQL official docs:
JSON documents stored in JSON columns are converted to an internal
format that permits quick read access to document elements. When the
server later must read a JSON value stored in this binary format, the
value need not be parsed from a text representation. The binary format
is structured to enable the server to look up subobjects or nested
values directly by key or array index without reading all values
before or after them in the document.
Solution
So I've made some tests with big data; I created these two tables:
the first table is storing its data in a JSON field, the other uses columns.
I filled each table with 1,000,000 records.
I ran these two queries:
The first query, which queries the table that does not use JSON, results in
the second query which uses JSON, results in:
so obviously it is not that much different in simple cases.
now trying to run aggregate functions, these two queries:
The first query results in:
The second query results in:
so looks like there is a noticeable performance difference when using aggregate functions.
CREATE TABLE `json` (
`data` json DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
CREATE TABLE `users` (
`name` varchar(255) DEFAULT NULL,
`relation_id` int DEFAULT NULL,
`description` text
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_cithe first table is storing its data in a JSON field, the other uses columns.
I filled each table with 1,000,000 records.
I ran these two queries:
select * from users where relation_id > 100 order by relation_id limit 100
select json_extract(data, '$.name') as name,
json_extract(data, '$.relation_id') as relation_id
from json
where json_extract(data, '$.relation_id') > 100 order by relation_id
limit 100The first query, which queries the table that does not use JSON, results in
[2022-03-02 00:37:46] 100 rows retrieved starting from 1 in 854 ms (execution: 838 ms, fetching: 16 ms)the second query which uses JSON, results in:
[2022-03-02 00:42:52] 100 rows retrieved starting from 1 in 1 s 10 ms (execution: 994 ms, fetching: 16 ms)so obviously it is not that much different in simple cases.
now trying to run aggregate functions, these two queries:
select name, relation_id, description , max(relation_id) as max_relation
from users
where relation_id > 100
group by name, relation_id, description
order by relation_id
limit 100; # executed in 5 s 3 ms
select data -> '$.name' as name,
data -> '$.relation_id' as relation_id,
data -> '$.description' as description,
max(data -> '$.relation_id') as max_relation
from json
where json_extract(data, '$.relation_id') > 100
group by data -> '$.name', data -> '$.relation_id', data -> '$.description'
order by relation_id
limit 100 # executed in 6 s 238 msThe first query results in:
[2022-03-02 15:23:13] 100 rows retrieved starting from 1 in 5 s 21 ms (execution: 5 s 3 ms, fetching: 18 ms)The second query results in:
[2022-03-02 15:23:25] 100 rows retrieved starting from 1 in 6 s 238 ms (execution: 6 s 218 ms, fetching: 20 ms)so looks like there is a noticeable performance difference when using aggregate functions.
Code Snippets
CREATE TABLE `json` (
`data` json DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
CREATE TABLE `users` (
`name` varchar(255) DEFAULT NULL,
`relation_id` int DEFAULT NULL,
`description` text
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ciselect * from users where relation_id > 100 order by relation_id limit 100
select json_extract(data, '$.name') as name,
json_extract(data, '$.relation_id') as relation_id
from json
where json_extract(data, '$.relation_id') > 100 order by relation_id
limit 100select name, relation_id, description , max(relation_id) as max_relation
from users
where relation_id > 100
group by name, relation_id, description
order by relation_id
limit 100; # executed in 5 s 3 ms
select data -> '$.name' as name,
data -> '$.relation_id' as relation_id,
data -> '$.description' as description,
max(data -> '$.relation_id') as max_relation
from json
where json_extract(data, '$.relation_id') > 100
group by data -> '$.name', data -> '$.relation_id', data -> '$.description'
order by relation_id
limit 100 # executed in 6 s 238 ms[2022-03-02 15:23:13] 100 rows retrieved starting from 1 in 5 s 21 ms (execution: 5 s 3 ms, fetching: 18 ms)[2022-03-02 15:23:25] 100 rows retrieved starting from 1 in 6 s 238 ms (execution: 6 s 218 ms, fetching: 20 ms)Context
StackExchange Database Administrators Q#203037, answer score: 9
Revisions (0)
No revisions yet.