patternphpMinor
Personalized, weighted recommendations - Rank all content with PHP
Viewed 0 times
rankallwithweightedphppersonalizedcontentrecommendations
Problem
First of all, I apologize for the length of this but I'm trying to show the data I'm working with and that I attempted to figure it out on my own.
I'm building a social site for musicians. I would like to take the whole list of songs from the database and rank each one based on its relevancy to the logged in user. One reason for this is that I'd like there to always be 10 songs recommended, even if the user signed up 45 seconds ago.
The factors I'm using are:
The (sub)genres belong to more general genres, so I figure I can either weight a song higher if the logged in user and the song's (sub)genre are the same, or a little less higher if they at least belong to the same general genre.
I'm not sure if I should be aiming for an Apriori or Bayesian algorithm... It seems to be somewhere in between. I have a couple of lines borrowed from explanations of the Hacker News algorithm that I've tweaked. I'm not quite sure how to put the final results together to get my ranking.
Here's the code I've written so far (for all intents and purposes, a studio is a song). I know queries in loops like this are never a good idea, but I've done it this way because there's some memcaching schemes I use and this will probably also be batched:
```
function studio_relevance($user_id, $user_genre) {
global $cxn; //MySQL connection
$query = "SELECT * FROM studio WHERE project_status='active'";
$result = mysqli_query($cxn, $query) or die($query.': '.mysqli_error($cxn));
$data = array();
while ($studio = mysqli_fetch_object($result)) {
//Find similarities in social connections and band
$query_b = "SELECT * FROM band
I'm building a social site for musicians. I would like to take the whole list of songs from the database and rank each one based on its relevancy to the logged in user. One reason for this is that I'd like there to always be 10 songs recommended, even if the user signed up 45 seconds ago.
The factors I'm using are:
- The band members of songs (all would be members of the site, might have all quit the song)
- The logged in user's member connections (may be none)
- The most recent update in the song (will at least be the day the song was "created")
- The (sub)genre of the song (will always be set)
- The (sub)genre of the user (will always be set)
The (sub)genres belong to more general genres, so I figure I can either weight a song higher if the logged in user and the song's (sub)genre are the same, or a little less higher if they at least belong to the same general genre.
I'm not sure if I should be aiming for an Apriori or Bayesian algorithm... It seems to be somewhere in between. I have a couple of lines borrowed from explanations of the Hacker News algorithm that I've tweaked. I'm not quite sure how to put the final results together to get my ranking.
Here's the code I've written so far (for all intents and purposes, a studio is a song). I know queries in loops like this are never a good idea, but I've done it this way because there's some memcaching schemes I use and this will probably also be batched:
```
function studio_relevance($user_id, $user_genre) {
global $cxn; //MySQL connection
$query = "SELECT * FROM studio WHERE project_status='active'";
$result = mysqli_query($cxn, $query) or die($query.': '.mysqli_error($cxn));
$data = array();
while ($studio = mysqli_fetch_object($result)) {
//Find similarities in social connections and band
$query_b = "SELECT * FROM band
Solution
Your function is way too long. This is a lot going on in one function and very likely has some good candidates for smaller, single-purpose functions to be abstracted out of it. This is very likely the first thing that I would work on, it would make the flow of your function a little better and would likely make your function a lot easier to read.
I also noticed that you are highly susceptible to SQL injection attacks. You should be, at the very least, escaping the user input before you put it into the database and would be even better if you decided to use prepared statements. I put in a code example below that details how you would go about changing one of your queries to use prepared statements.
We'll be swapping over to using prepared statements so you need to make sure that you have the mysqlnd extension properly installed.
So, as an example of using this new function to get a prepared statement:
So, now you're using prepared statements for this query and it is not as susceptible to SQL injection. Also, notice that I explicitly defined that I only want
Some other things that I've noticed:
There's some things that could be done to make the function more maintainable and duplication but I think the priority right now is swapping over to prepared statements and getting rid of your SQL Injection vulnerability.
I also noticed that you are highly susceptible to SQL injection attacks. You should be, at the very least, escaping the user input before you put it into the database and would be even better if you decided to use prepared statements. I put in a code example below that details how you would go about changing one of your queries to use prepared statements.
We'll be swapping over to using prepared statements so you need to make sure that you have the mysqlnd extension properly installed.
function getPreparedStatement($sql, array $parameters = array()) {
global $cxn;
if (!is_string($sql)) {
trigger_error('The SQL query passed must be in the form of a string.', E_USER_WARNING);
return false;
}
$stmt = mysqli_prepare($cxn, $sql);
if (!$stmt) {
trigger_error(mysqli_error($cxn), E_USER_WARNING);
return false;
}
// we are subtracting 1 from this because the parameters passed should include the
// 'xxx' string denoting data type
$numPassedParameters = count($parameters) - 1;
$stmtParamCount = mysqli_stmt_param_count($stmt);
if ($stmtParamCount !== $numPassedParameters) {
trigger_error('The number of parameters passed must match the number of parameters in the statement.', E_USER_WARNING);
return false;
}
if ($stmtParamCount > 0) {
$dataTypes = array_shift($parameters);
$refArray = array();
$refArray[] = $cxn;
$refArray[] = $datatypes;
// we are doing this because mysqli_stmt_bind_param is expecting a reference
foreach ($parameters as $key => $value) {
$refArray[] =& $parameters[$key];
}
return call_user_func_array('mysqli_stmt_bind_param', $refArray));
} else {
return $stmt;
}
}So, as an example of using this new function to get a prepared statement:
$query_b = "SELECT user_id FROM band WHERE studio_id=?";
$query_b_parameters = array('i', $studio->studio_id); // assume this int change 'i' to 's' if a string
$stmt_b = getPreparedStatement($query_b, $query_b_parameters);
$result_b = mysqli_stmt_execute($cxn, $stmt_b);
if (!$result_b) {
trigger_error(mysqli_error($cxn));
return false;
}
$the_band = array();
while ($people = mysqli_fetch_array($result_b, MYSQLI_ASSOC)) {
$the_band[] = $people['user_id'];
}So, now you're using prepared statements for this query and it is not as susceptible to SQL injection. Also, notice that I explicitly defined that I only want
user_id from the query and not all the things. You should be able to figure out from there how to replace the other SQL queries with prepared statements.Some other things that I've noticed:
- Under
$queryyou have$result_bagain and are requerying$query_b. Likely you meant to just send$query
- Not necessarily an error but the
*in SQL statements. If you aren't really getting all the data try to explicitly state what you want.
- Try to find a way to abstract getting the results from a query in the form of an array or an object into a couple functions that take a
mysqli_stmtand returns the appropriate data type for the result.
- Try to abstract out the business logic of comparing genres and finding the most recent activity into different functions so that your main function here simply controls the order in which queries get executed and then passes the appropriate queries off to functions that get the right data.
There's some things that could be done to make the function more maintainable and duplication but I think the priority right now is swapping over to prepared statements and getting rid of your SQL Injection vulnerability.
Code Snippets
function getPreparedStatement($sql, array $parameters = array()) {
global $cxn;
if (!is_string($sql)) {
trigger_error('The SQL query passed must be in the form of a string.', E_USER_WARNING);
return false;
}
$stmt = mysqli_prepare($cxn, $sql);
if (!$stmt) {
trigger_error(mysqli_error($cxn), E_USER_WARNING);
return false;
}
// we are subtracting 1 from this because the parameters passed should include the
// 'xxx' string denoting data type
$numPassedParameters = count($parameters) - 1;
$stmtParamCount = mysqli_stmt_param_count($stmt);
if ($stmtParamCount !== $numPassedParameters) {
trigger_error('The number of parameters passed must match the number of parameters in the statement.', E_USER_WARNING);
return false;
}
if ($stmtParamCount > 0) {
$dataTypes = array_shift($parameters);
$refArray = array();
$refArray[] = $cxn;
$refArray[] = $datatypes;
// we are doing this because mysqli_stmt_bind_param is expecting a reference
foreach ($parameters as $key => $value) {
$refArray[] =& $parameters[$key];
}
return call_user_func_array('mysqli_stmt_bind_param', $refArray));
} else {
return $stmt;
}
}$query_b = "SELECT user_id FROM band WHERE studio_id=?";
$query_b_parameters = array('i', $studio->studio_id); // assume this int change 'i' to 's' if a string
$stmt_b = getPreparedStatement($query_b, $query_b_parameters);
$result_b = mysqli_stmt_execute($cxn, $stmt_b);
if (!$result_b) {
trigger_error(mysqli_error($cxn));
return false;
}
$the_band = array();
while ($people = mysqli_fetch_array($result_b, MYSQLI_ASSOC)) {
$the_band[] = $people['user_id'];
}Context
StackExchange Code Review Q#8559, answer score: 2
Revisions (0)
No revisions yet.