patternpythonMinor
Finding all non-empty directories and their files on an SFTP server with Paramiko
Viewed 0 times
sftpallnonserverwithemptyparamikofilesfindingand
Problem
The purpose of the following function is to find all non-empty directories, and the files in those non-empty directories. It recursively checks each directory on an SFTP server to see if it has any files, and if it does, adds it to a default dict using the path as the key. The function uses
Prereqsuite information
The function in question:
Use:
If we have an SFTP server that looks like this:
The function will return a dictionary like so:
paramiko.SFTPClient and stat. I am specifically concerned about the performance; it is rather slow.Prereqsuite information
sftp.listdir_attrreturns a list ofSFTPAttributes which represent either files, directories, symlinks, etc., and contain ast_mode, which is used to determine if it is a directory or file. This can throw an IOException for example if you don't have permissions to inspect the path.
stat.S_ISDIRwill inspect the mode to determine if its a directory
The function in question:
def recursive_ftp(sftp, path='.', files=None):
if files is None:
files = defaultdict(list)
# loop over list of SFTPAttributes (files with modes)
for attr in sftp.listdir_attr(path):
if stat.S_ISDIR(attr.st_mode):
# If the file is a directory, recurse it
recursive_ftp(sftp, os.path.join(path,attr.filename), files)
else:
# if the file is a file, add it to our dict
files[path].append(attr.filename)
return filesUse:
import paramiko
import stat
transport = paramiko.Transport((host, port))
transport.connect(username=username, password=password)
sftp = paramiko.SFTPClient.from_transport(transport)
files = recursive_ftp(sftp)If we have an SFTP server that looks like this:
/foo
----a.csv
----b.csv
/bar
----c.csv
/bazThe function will return a dictionary like so:
{
'./foo': ['a.csv', 'b.csv'],
'./bar': ['c.csv']
}Solution
There is nothing obviously wrong with your implementation that could explain a slow behaviour. The slowest part here being the use of
That being said, there are a few changes you can do to improve a bit on your end:
I'm also wondering whether you really want to list everything that is not a directory or only regular files (i.e. no symlinks, no block devices, etc.) You can change the proposed list-comprehension accordingly.
Proposed improvements
You can adapt easily to include back the optional
listdir_attr, you might want to check with other means if its speed matches what your network has to offer.That being said, there are a few changes you can do to improve a bit on your end:
- use a helper function so
fileswill not be both a return value and modified in place;
- use
paramikosimulation of a working directory to remove the need foros.path;
- use list-comprehension to remove the need for
defaultdict.
I'm also wondering whether you really want to list everything that is not a directory or only regular files (i.e. no symlinks, no block devices, etc.) You can change the proposed list-comprehension accordingly.
Proposed improvements
def _sftp_helper(sftp, files):
stats = sftp.listdir_attr('.')
files[sftp.getcwd()] = [attr.filename for attr in stats if stat.S_ISREG(attr.st_mode)]
for attr in stats:
if stat.S_ISDIR(attr.st_mode): # If the file is a directory, recurse it
sftp.chdir(attr.filename)
_sftp_helper(sftp, files)
sftp.chdir('..')
def filelist_recursive(sftp):
files = {}
_sftp_helper(sftp, files)
return filesYou can adapt easily to include back the optional
path parameter into filelist_recursive.Code Snippets
def _sftp_helper(sftp, files):
stats = sftp.listdir_attr('.')
files[sftp.getcwd()] = [attr.filename for attr in stats if stat.S_ISREG(attr.st_mode)]
for attr in stats:
if stat.S_ISDIR(attr.st_mode): # If the file is a directory, recurse it
sftp.chdir(attr.filename)
_sftp_helper(sftp, files)
sftp.chdir('..')
def filelist_recursive(sftp):
files = {}
_sftp_helper(sftp, files)
return filesContext
StackExchange Code Review Q#127180, answer score: 5
Revisions (0)
No revisions yet.