patternsqlMinor
Storing arrays in MYSQL?
Viewed 0 times
mysqlstoringarrays
Problem
I've googled. I keep getting answers that have to do with a PHP function named "serialize". I won't be working with PHP, so that solution won't work for me sadly.
I worked with a NoSQL db (MongoDB I think) before, and I basically define a collection as such
I'm wanting to store multiple values in one row for the tags column (honestly I don't know if I said that right, just began searching about MySQL)
The only idea that's coming to me is if I were to use MySQL to read the row, then just split the tags by a comma, that'd work most likely, but I'd be working with many more "tags" possibly in the millions.
My question really is, is MySQL a good choice if you want to store multiple values in one column?
I worked with a NoSQL db (MongoDB I think) before, and I basically define a collection as such
images [
{"title": "Cute puppy", "file"="cute.png", "tags": ["cute", "puppy", "summer", "pool"]}
]I'm wanting to store multiple values in one row for the tags column (honestly I don't know if I said that right, just began searching about MySQL)
+------------+----------+--------------------------+
| title | file | tags |
+------------+----------+--------------------------+
| Cute Puppy | cute.png | cute,puppy,summer,pool |
+------------+----------+--------------------------+The only idea that's coming to me is if I were to use MySQL to read the row, then just split the tags by a comma, that'd work most likely, but I'd be working with many more "tags" possibly in the millions.
My question really is, is MySQL a good choice if you want to store multiple values in one column?
Solution
My question really is, is MySQL a good choice if you want to store multiple values in one column?Codd's Rules
-
Rule 2: The guaranteed access rule:
- Each and every datum (atomic value) in a relational data base is guaranteed to be logically accessible by resorting to a combination of table name, primary key value and column name.
Just a note: This second rule (actually the 3rd one - numbering starts, in true geek style, from zero) would suggest that arrays have no place in a relational system, however there is no reason that an array cannot be considered "atomic". A string (an array of characters in
C for example) can be considered atomic, why not an array of numbers (or other datatype).It should also be recognised that arrays can be very useful and do correspond to real-life entities, so inclusion of the array type and attendant functionality should come as no surprise to "realists". Codd's rules were derived from his studies of relational calculus and some say that a complete implementation would be too ivory tower for real-world RDBMSs. However, we're drifting out of scope...
The question:
I would suggest that the answer to this question is no - there is a better choice on all 3 levels for the treatment of arrays in database servers.
- Simple arrays:
==============
MySQL doesn't really have an array datatype - the closest thing they have is the SET datatype, the functionality of which is very limited. As you note in your question, a search leads to lots of links to PHP code which implements array functionality in the app rather than the db.
PostgreSQL has a an
ARRAY datatype with a large range of functions to deal with manipulating, comparing and finding elements in arrays. Of particular interest are the UNNEST() and STRING_TO_ARRAY()/REGEXP_SPLIT_TO_ARRAY() functions which provide convenient methods for moving data to and from arrays to relational constructs.- JSON:
=====
Nowadays, arrays have been largely superseded by JSON - hardly surprising since JSON is basially an array++ (but XML-- :-)). This is true for MySQL and for PostgreSQL.
There are a large number of JSON functions available on both systems - MySQL and PostgreSQL - but more on PostgreSQL. I'll use the terms JSON and JSONB (binary - example is for mongodb but applies to binary JSON generally) interchangeably. MySQL JSON is only binary, PostgreSQL has both - but JSONB is generally regarded as much more efficient. See this post for a good discussion of what PostgreSQL "NoSQL types" are good for. The "NoSQL" is a misnomer; you can do SQL
JOINs using these types.With respect to indexing - MySQL essentially implements this by means of a GENERATED column hack whereas PostgreSQL implements 3 sorts of index (Gin, Btree and Hash - see section
8.14.4. jsonb Indexing here) for JSONB.Joins can be done between JSON and tables in both MySQL and PostgreSQL.
Overall, it is fair to say that JSON has been around longer and is more mature in PostgreSQL than in MySQL.
There are reports that PostgreSQL can actually perform even better than MongoDB with JSON documents - there is a caveat here in that the research was done by companies (EnterpriseDB and Ongres) which are PostgreSQL providers, however the tests that they performed were open-sourced and are therefore open to scrutiny.
Classic SQL:
You can of course implement some "multi-element" functionality by using a classic Associative Entity approach. Associative entities are otherwise known as
joining, bridging, many-to-many or m-to-n tables. This strategy is outlined here with an example using students enrolling in university courses.1 course can have many students and 1 student can attend many courses. The joining table contains the
PRIMARY KEY from both the course and student tables and the PK of the joining table itself is the two together - a student can't be in the same course twice!In favour of the classic approach, Erwin Brandtstetter (a guy well worth listening to about databases!) pointed out in his answer referring to a JSON solution: (
Aside: A normalized DB design with basic data types would be way more efficient for this.). JSON is not a panacea and judgement must be exercised in its use. There's been a lot of bandwagon-hopping in recent years with the NoSQL "[Cambrian Explosion][23]" from ~ 2008 - 2015 (Mongodb especially springs to mind). Now virtually all of the NoSQL systems have, or are trying to, put SQL interfaces and instrumentation into their systems. For my money, if your arrays are relatively simple, stick with the classic approach, otherwise go with JSONB.SQL wasn't designed for and is not good at manipulating comma-separated lists of values. In any case they are a breach of Rule 2 of Codd's Rules which states that
Each and every datum (atomic value) in a relational data base is guaranteed to be logically accessible by resorting to a combination of table name, primary key value and column name.. With .csv lists a furtheContext
StackExchange Database Administrators Q#252554, answer score: 8
Revisions (0)
No revisions yet.