patternsqlMinor
Matching single column against multiple values without self-joining table in MySQL
Viewed 0 times
withoutcolumntableagainstmysqlsinglemultiplevaluesmatchingjoining
Problem
We have a table that we use to store answers to questions. We need to be able to find users that have certain answers to particular questions. So, if our table consists of the following data:
and we want to find users who answer 'Pooch' for question 1 and 'Peach' for question 2, the following SQL will (obviously) not worK:
My first thought was to self-join the table for each answer we are looking for:
This works, but since we allow an arbitrary number of search filters, we need to find something much more efficient. My next solution was something like this:
However, we want users to be able to take the same questionnaire twice, so they could potentially have two answers to question 1 in the answers table.
So, now I'm at a loss. What's the best way to approach this? Thanks!
user_id question_id answer_value
Sally 1 Pooch
Sally 2 Peach
John 1 Pooch
John 2 Dukeand we want to find users who answer 'Pooch' for question 1 and 'Peach' for question 2, the following SQL will (obviously) not worK:
select user_id
from answers
where question_id=1
and answer_value = 'Pooch'
and question_id=2
and answer_value='Peach'My first thought was to self-join the table for each answer we are looking for:
select a.user_id
from answers a, answers b
where a.user_id = b.user_id
and a.question_id=1
and a.answer_value = 'Pooch'
and b.question_id=2
and b.answer_value='Peach'This works, but since we allow an arbitrary number of search filters, we need to find something much more efficient. My next solution was something like this:
select user_id, count(question_id)
from answers
where (
(question_id=2 and answer_value = 'Peach')
or (question_id=1 and answer_value = 'Pooch')
)
group by user_id
having count(question_id)>1However, we want users to be able to take the same questionnaire twice, so they could potentially have two answers to question 1 in the answers table.
So, now I'm at a loss. What's the best way to approach this? Thanks!
Solution
I have found a clever way to do this query without a self join.
I ran these commands in MySQL 5.5.8 for Windows and got the following results:
This display reveals that John gave two different answers to question 2 and Sally gave two different answers to question 1.
To catch which questions were answered differently by all users, just place the above query in a subquery and check for a comma in the list of given answers to get the count of distinct answers as follows:
I got this:
Now just filter out rows where multianswer_count = 1 using another subquery:
This is what I got:
Essentially, I performed three table scans: 1 on the main table, 2 on the small subqueries. NO JOINS !!!
Give it a Try !!!
I ran these commands in MySQL 5.5.8 for Windows and got the following results:
use test
DROP TABLE IF EXISTS answers;
CREATE TABLE answers (user_id VARCHAR(10),question_id INT,answer_value VARCHAR(20));
INSERT INTO answers VALUES
('Sally',1,'Pouch'),
('Sally',2,'Peach'),
('John',1,'Pooch'),
('John',2,'Duke');
INSERT INTO answers VALUES
('Sally',1,'Pooch'),
('Sally',2,'Peach'),
('John',1,'Pooch'),
('John',2,'Duck');
SELECT user_id,question_id,GROUP_CONCAT(DISTINCT answer_value) given_answers
FROM answers GROUP BY user_id,question_id;
+---------+-------------+---------------+
| user_id | question_id | given_answers |
+---------+-------------+---------------+
| John | 1 | Pooch |
| John | 2 | Duke,Duck |
| Sally | 1 | Pouch,Pooch |
| Sally | 2 | Peach |
+---------+-------------+---------------+This display reveals that John gave two different answers to question 2 and Sally gave two different answers to question 1.
To catch which questions were answered differently by all users, just place the above query in a subquery and check for a comma in the list of given answers to get the count of distinct answers as follows:
SELECT user_id,question_id,given_answers,
(LENGTH(given_answers) - LENGTH(REPLACE(given_answers,',','')))+1 multianswer_count
FROM (SELECT user_id,question_id,GROUP_CONCAT(DISTINCT answer_value) given_answers
FROM answers GROUP BY user_id,question_id) A;I got this:
+---------+-------------+---------------+-------------------+
| user_id | question_id | given_answers | multianswer_count |
+---------+-------------+---------------+-------------------+
| John | 1 | Pooch | 1 |
| John | 2 | Duke,Duck | 2 |
| Sally | 1 | Pouch,Pooch | 2 |
| Sally | 2 | Peach | 1 |
+---------+-------------+---------------+-------------------+Now just filter out rows where multianswer_count = 1 using another subquery:
SELECT * FROM (SELECT user_id,question_id,given_answers,
(LENGTH(given_answers) - LENGTH(REPLACE(given_answers,',','')))+1 multianswer_count
FROM (SELECT user_id,question_id,GROUP_CONCAT(DISTINCT answer_value) given_answers
FROM answers GROUP BY user_id,question_id) A) AA WHERE multianswer_count > 1;This is what I got:
+---------+-------------+---------------+-------------------+
| user_id | question_id | given_answers | multianswer_count |
+---------+-------------+---------------+-------------------+
| John | 2 | Duke,Duck | 2 |
| Sally | 1 | Pouch,Pooch | 2 |
+---------+-------------+---------------+-------------------+Essentially, I performed three table scans: 1 on the main table, 2 on the small subqueries. NO JOINS !!!
Give it a Try !!!
Code Snippets
use test
DROP TABLE IF EXISTS answers;
CREATE TABLE answers (user_id VARCHAR(10),question_id INT,answer_value VARCHAR(20));
INSERT INTO answers VALUES
('Sally',1,'Pouch'),
('Sally',2,'Peach'),
('John',1,'Pooch'),
('John',2,'Duke');
INSERT INTO answers VALUES
('Sally',1,'Pooch'),
('Sally',2,'Peach'),
('John',1,'Pooch'),
('John',2,'Duck');
SELECT user_id,question_id,GROUP_CONCAT(DISTINCT answer_value) given_answers
FROM answers GROUP BY user_id,question_id;
+---------+-------------+---------------+
| user_id | question_id | given_answers |
+---------+-------------+---------------+
| John | 1 | Pooch |
| John | 2 | Duke,Duck |
| Sally | 1 | Pouch,Pooch |
| Sally | 2 | Peach |
+---------+-------------+---------------+SELECT user_id,question_id,given_answers,
(LENGTH(given_answers) - LENGTH(REPLACE(given_answers,',','')))+1 multianswer_count
FROM (SELECT user_id,question_id,GROUP_CONCAT(DISTINCT answer_value) given_answers
FROM answers GROUP BY user_id,question_id) A;+---------+-------------+---------------+-------------------+
| user_id | question_id | given_answers | multianswer_count |
+---------+-------------+---------------+-------------------+
| John | 1 | Pooch | 1 |
| John | 2 | Duke,Duck | 2 |
| Sally | 1 | Pouch,Pooch | 2 |
| Sally | 2 | Peach | 1 |
+---------+-------------+---------------+-------------------+SELECT * FROM (SELECT user_id,question_id,given_answers,
(LENGTH(given_answers) - LENGTH(REPLACE(given_answers,',','')))+1 multianswer_count
FROM (SELECT user_id,question_id,GROUP_CONCAT(DISTINCT answer_value) given_answers
FROM answers GROUP BY user_id,question_id) A) AA WHERE multianswer_count > 1;+---------+-------------+---------------+-------------------+
| user_id | question_id | given_answers | multianswer_count |
+---------+-------------+---------------+-------------------+
| John | 2 | Duke,Duck | 2 |
| Sally | 1 | Pouch,Pooch | 2 |
+---------+-------------+---------------+-------------------+Context
StackExchange Database Administrators Q#1775, answer score: 9
Revisions (0)
No revisions yet.