patterncppMinor
Identify files within folder structure that have common file sizes
Viewed 0 times
fileidentifywithinsizesfilesfolderstructurethatcommonhave
Problem
The code takes in two arguments, a path to search and an output file name. It searches the supplied path (including sub directories) and writes out all files that have the same size as any other files in the path. This will form part of a multi-step process to identify duplicate files within the file system.
Example output, with 3 files 29 bytes long, 3 files 113 bytes long and 2 files 114 bytes long:
`#include "stdafx.h"
#include
#include
#include
#include
#include
using namespace std::experimental::filesystem;
std::map> build_filesize_map(const std::string &pathToSearch);
void write_possible_duplicate_files(const std::string &outputFileName, const std::map> &sizeMap);
int main(int argc, char *argv[])
{
if (argc != 3) {
std::cout ";
return EXIT_FAILURE;
}
std::string pathToSearch(argv[1]);
std::string outputFileName(argv[2]);
auto sizeMap = build_filesize_map(pathToSearch);
// Write out a list of all files that share their size with other files
write_possible_duplicate_files(outputFileName, sizeMap);
}
// Search directory, any files with size > 0 are considered. They are added to a set
// within the map, which is keyed on the file size
std::map> build_filesize_map(const std::string &pathToSearch) {
auto sizeMap = std::map>();
for (recursive_directory_iterator next(pathToSearch), end; next != end; ++next) {
auto currentPath = next->path();
auto currentF
Example output, with 3 files 29 bytes long, 3 files 113 bytes long and 2 files 114 bytes long:
29 c:\Program Files\Git\mingw64\libexec\git-core\mergetools\gvimdiff
29 c:\Program Files\Git\mingw64\libexec\git-core\mergetools\gvimdiff2
29 c:\Program Files\Git\mingw64\libexec\git-core\mergetools\gvimdiff3
113 c:\Program Files\Git\mingw64\lib\tcl8.6\tzdata\Etc\GMT-7
113 c:\Program Files\Git\mingw64\lib\tcl8.6\tzdata\Etc\GMT-8
113 c:\Program Files\Git\mingw64\lib\tcl8.6\tzdata\Etc\GMT-9
114 c:\Program Files\Git\mingw64\lib\tcl8.6\tzdata\Etc\GMT+4
114 c:\Program Files\Git\mingw64\lib\tcl8.6\tzdata\Etc\GMT+5
`#include "stdafx.h"
#include
#include
#include
#include
#include
using namespace std::experimental::filesystem;
std::map> build_filesize_map(const std::string &pathToSearch);
void write_possible_duplicate_files(const std::string &outputFileName, const std::map> &sizeMap);
int main(int argc, char *argv[])
{
if (argc != 3) {
std::cout ";
return EXIT_FAILURE;
}
std::string pathToSearch(argv[1]);
std::string outputFileName(argv[2]);
auto sizeMap = build_filesize_map(pathToSearch);
// Write out a list of all files that share their size with other files
write_possible_duplicate_files(outputFileName, sizeMap);
}
// Search directory, any files with size > 0 are considered. They are added to a set
// within the map, which is keyed on the file size
std::map> build_filesize_map(const std::string &pathToSearch) {
auto sizeMap = std::map>();
for (recursive_directory_iterator next(pathToSearch), end; next != end; ++next) {
auto currentPath = next->path();
auto currentF
Solution
Clean nice code, reasonable data structure. Good job!
Don't worry too much about returning the
See: https://en.wikipedia.org/wiki/Return_value_optimization
iterator naming
Since you asked about naming convention I have to admit I wouldn't name this iterator
It especially caught my eyes that "current path" is "next path" which is kind of strange.
const
My opinion is that there's never enough
std::map::operator[]
Most of the time I am frustrated by map
could be done as
Reason is that in case key given as argument to
See: http://en.cppreference.com/w/cpp/container/map/operator_at
iterating over map keys > 1
Since you are using an ordered map you might take advantage of that with
See: http://en.cppreference.com/w/cpp/container/map/lower_bound
could be done like this
Don't worry too much about returning the
std::map as even historically RVO would help you. See: https://en.wikipedia.org/wiki/Return_value_optimization
iterator naming
Since you asked about naming convention I have to admit I wouldn't name this iterator
next. More common name might be it but something more explicit like dir_entry would be probably even better.It especially caught my eyes that "current path" is "next path" which is kind of strange.
for (recursive_directory_iterator next(pathToSearch), end; next != end; ++next) {
auto currentPath = next->path();const
My opinion is that there's never enough
const. I would define all variables not intended to be changed later as const.const auto currentPath = next->path();
const auto currentFileSize = is_regular_file(currentPath) ? file_size(currentPath) : 0;
if (0 != currentFileSize) {
const auto existingEntry = sizeMap.find(currentFileSize);std::map::operator[]
Most of the time I am frustrated by map
operator[] behavior. You on the other hand are lucky as it might simplify your code.auto existingEntry = sizeMap.find(currentFileSize);
if (existingEntry == sizeMap.end()) {
std::set fileSet;
fileSet.emplace(currentPath);
sizeMap.emplace(currentFileSize, fileSet);
}
else {
existingEntry->second.emplace(currentPath);
}could be done as
sizeMap[currentFileSize].emplace(currentPath)Reason is that in case key given as argument to
std::map::operator[] is not found new pair is inserted as {key, default value of value_type} which in your case means empty set. See: http://en.cppreference.com/w/cpp/container/map/operator_at
iterating over map keys > 1
Since you are using an ordered map you might take advantage of that with
std::map::lower_bound() and skip the first part with count = 1.See: http://en.cppreference.com/w/cpp/container/map/lower_bound
for (auto sizeIterator = sizeMap.begin(); sizeIterator != sizeMap.end(); ++sizeIterator) {
if (sizeIterator->second.size() == 1) continue;
for (auto pathIteror = sizeIterator->second.begin(); pathIteror != sizeIterator->second.end(); ++pathIteror) {could be done like this
for (auto sizeIterator = sizeMap.lower_bound(1); sizeIterator != sizeMap.end(); ++sizeIterator) {
for (auto pathIteror = sizeIterator->second.begin(); pathIteror != sizeIterator->second.end(); ++pathIteror) {Code Snippets
for (recursive_directory_iterator next(pathToSearch), end; next != end; ++next) {
auto currentPath = next->path();const auto currentPath = next->path();
const auto currentFileSize = is_regular_file(currentPath) ? file_size(currentPath) : 0;
if (0 != currentFileSize) {
const auto existingEntry = sizeMap.find(currentFileSize);auto existingEntry = sizeMap.find(currentFileSize);
if (existingEntry == sizeMap.end()) {
std::set<path> fileSet;
fileSet.emplace(currentPath);
sizeMap.emplace(currentFileSize, fileSet);
}
else {
existingEntry->second.emplace(currentPath);
}sizeMap[currentFileSize].emplace(currentPath)for (auto sizeIterator = sizeMap.begin(); sizeIterator != sizeMap.end(); ++sizeIterator) {
if (sizeIterator->second.size() == 1) continue;
for (auto pathIteror = sizeIterator->second.begin(); pathIteror != sizeIterator->second.end(); ++pathIteror) {Context
StackExchange Code Review Q#132927, answer score: 8
Revisions (0)
No revisions yet.