snippetphpMinor
73 Lines of Mayhem - Parse, Sort and Save to CSV in PHP CLI
Viewed 0 times
phpcsvparsesavemayhemcliandsortlines
Problem
Inside of a folder named
```
function writeFile($fileName, $fileData)
{
$writeFileOpen = fopen('csv/' . $fileName, 'w');
fwrite($writeFileOpen, $fileData) or die('Unable to write file: ' . $fileName);
fclose($writeFileOpen);
}
function openFiles()
{
$addressList = array();
$preventRepeat = array();
if ($handle = opendir('txt')) {
while (false !== ($file = readdir($handle))) {
if ($file != '.' && $file != '..') {
$newList = explode("\n", trim(file_get_contents('txt/' . $file)));
foreach ($newList as $key => $val) {
$val = str_replace(array(',', '"'), '', $val);
if (in_array($val, $preventRepeat) || !strpos($val, '@') || !$val) {
unset($newList[$key]);
}
$preventRepeat[] = $val;
}
if (empty($addressList)) {
$addressList = $newList;
} else {
$addressList = array_merge($addressList, $newList);
}
unset($newList);
}
}
closedir($handle);
} else {
echo 'Unable to Read Directory';
}
$lineNum = 1;
$fileNum = 1;
$fileData = '"Email Address"' . "\n";
sor
txt I have 138 text files (totaling 349MB) full of email addresses. I have no idea (yet) how many addresses there are. They are separated from one another by line breaks. I created the following script to read all of these files into an array, dismiss the duplicates, then sort alphabetically and save in groups of 10K per csv file. It works correctly, but it has also been running for over 8 hours (dual core i3 w/ 4 gigabizzles of ram, sata 7200 hdd) which seems excessive to me. Top also tells me that my program's CPU usage is 100% and it's been like that the whole while it's been running. Give my script a looksie and advise me on where I've gone so terribly wrong.```
function writeFile($fileName, $fileData)
{
$writeFileOpen = fopen('csv/' . $fileName, 'w');
fwrite($writeFileOpen, $fileData) or die('Unable to write file: ' . $fileName);
fclose($writeFileOpen);
}
function openFiles()
{
$addressList = array();
$preventRepeat = array();
if ($handle = opendir('txt')) {
while (false !== ($file = readdir($handle))) {
if ($file != '.' && $file != '..') {
$newList = explode("\n", trim(file_get_contents('txt/' . $file)));
foreach ($newList as $key => $val) {
$val = str_replace(array(',', '"'), '', $val);
if (in_array($val, $preventRepeat) || !strpos($val, '@') || !$val) {
unset($newList[$key]);
}
$preventRepeat[] = $val;
}
if (empty($addressList)) {
$addressList = $newList;
} else {
$addressList = array_merge($addressList, $newList);
}
unset($newList);
}
}
closedir($handle);
} else {
echo 'Unable to Read Directory';
}
$lineNum = 1;
$fileNum = 1;
$fileData = '"Email Address"' . "\n";
sor
Solution
This will be much more efficient:
$result = array();
if (($handle = opendir('./txt/')) !== false)
{
set_time_limit(0);
ini_set('memory_limit', -1);
while (($file = readdir($handle)) !== false)
{
if (($file != '.') && ($file != '..'))
{
if (is_resource($file = fopen('./txt/' . $file, 'rb')) === true)
{
while (($email = fgets($file)) !== false)
{
$email = trim(str_replace(array(',', '"'), '', $email));
if (filter_var($email, FILTER_VALIDATE_EMAIL) !== false)
{
$result[strtolower($email)] = true;
}
}
fclose($file);
}
}
}
closedir($handle);
if (empty($result) !== true)
{
ksort($result);
foreach (array_chunk($result, 10000, true) as $key => $value)
{
file_put_contents('./emailList-' . ($key + 1) . '.csv', implode("\n", array_keys($value)), LOCK_EX);
}
}
echo 'Done!';
}Code Snippets
$result = array();
if (($handle = opendir('./txt/')) !== false)
{
set_time_limit(0);
ini_set('memory_limit', -1);
while (($file = readdir($handle)) !== false)
{
if (($file != '.') && ($file != '..'))
{
if (is_resource($file = fopen('./txt/' . $file, 'rb')) === true)
{
while (($email = fgets($file)) !== false)
{
$email = trim(str_replace(array(',', '"'), '', $email));
if (filter_var($email, FILTER_VALIDATE_EMAIL) !== false)
{
$result[strtolower($email)] = true;
}
}
fclose($file);
}
}
}
closedir($handle);
if (empty($result) !== true)
{
ksort($result);
foreach (array_chunk($result, 10000, true) as $key => $value)
{
file_put_contents('./emailList-' . ($key + 1) . '.csv', implode("\n", array_keys($value)), LOCK_EX);
}
}
echo 'Done!';
}Context
StackExchange Code Review Q#1393, answer score: 3
Revisions (0)
No revisions yet.