HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythonModerate

Parsing the lsblk output

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
lsblktheparsingoutput

Problem

I am a Python beginner learning Python 3. I have written two small functions that parse the lsblk output and return Linux physical and logical disks. Here is the first function:

from subprocess import run, PIPE

def physical_drives():
    """
    Gets all physical drive names.

    Gets all physical drive names on a Linux system,
    parsing the lsblk utility output.

    Parameters
    ----------

    Returns
    -------
    list
        A list of strings representing drive names.

    """

    command = ['lsblk -d -o name -n']
    output = run(command, shell=True, stdout=PIPE)

    output_string = output.stdout.decode('utf-8')
    output_string = output_string.strip()

    results = output_string.split('\n')
    return results

def main():
    print(physical_drives())

if __name__ == '__main__':
    main()


The second function:

from subprocess import run, PIPE

def partitions(disk):
    """
    Gets all partitions for a given physical disk.

    Gets all partitions present on a physical disk
    on a Linux system.
    The function parses the lsblk utility output.

    Parameters
    ----------
    disk : string
        A string containing a disk name such as 'sda'

    Returns
    -------
    list
        A list of strings representing partitions.

    """

    command = ['lsblk -o name -n -s -l']
    output = run(command, shell=True, stdout=PIPE)

    output_string = output.stdout.decode('utf-8')
    output_string = output_string.strip()

    results = list()
    results.extend(output_string.split('\n'))
    results = [x for x in results if x != disk and disk in x]

    return results

def main():

    from disks import physical_drives

    for drive in physical_drives():

        print(drive)
        parts = partitions(drive)

        for partition in parts:
            print('\t' + partition)

if __name__ == '__main__':
    main()


The functions are in two different files in the same directory. I would appreciate a quick review on anything tha

Solution

lsblk

The -s option to lsblk was introduced to util-linux rather recently, in release 2.22. You may experience compatibility issues on slightly older GNU/Linux installations.

But I don't see why you would want the -s option at all — it just gives you an inverted device tree. For example, on my machine:

$ lsblk -o name -n -s -l
sda1
sda
sda2
sda
sr0
vg-root
sda3
sda
vg-var
sda3
sda
vg-data
sda3
sda


In the output, sda appears multiple times. To understand the output, you need to drop the -l flag so that the list appears in tree form:

$ lsblk -o name -n -s
sda1
└─sda
sda2
└─sda
sr0
vg-root
└─sda3
  └─sda
vg-var
└─sda3
  └─sda
vg-data
└─sda3
  └─sda


Now, it's more apparent that the -s option isn't helpful. If you drop it, then the output makes more sense:

$ lsblk -o name -n
sda
├─sda1
├─sda2
└─sda3
  ├─vg-root
  ├─vg-var
  └─vg-data
sr0
$ lsblk -o name -n -l
sda
sda1
sda2
sda3
vg-root
vg-var
vg-data
sr0


To list the devices on sda, it would be better to run lsblk -o name -n -l /dev/sda — that would immediately drop sr0 from consideration, for example. Note that LVM volumes (such as vg-root above) would still appear in the output. I don't think that doing a substring search (if x != disk and disk in x in your code) is a reliable filter. It could be fooled if there are more than 26 physical disks: the 27th disk would be named sdaa. It might also be fooled by exceptionally tricky naming of LVM volumes.

Subprocess execution

Whenever practical, I recommend avoiding the shell when executing subprocesses. The shell introduces a set of potential security vulnerabilities — for example, shenanigans with the PATH environment variable. Best practice would be to run the command with a specific executable and pre-parsed command-line options:

run('/bin/lsblk -o name -n -s -l'.split(), stdout=PIPE)


Alternative solution

I actually wouldn't bother with parsing the output of lsblk at all. After all, lsblk is just a way to report the contents of the sysfs filesystem. You would be better off inspecting /sys directly.

from glob import glob
from os.path import basename, dirname

def physical_drives():
    drive_glob = '/sys/block/*/device'
    return [basename(dirname(d)) for d in glob(drive_glob)]

def partitions(disk):
    if disk.startswith('.') or '/' in disk:
        raise ValueError('Invalid disk name {0}'.format(disk))
    partition_glob = '/sys/block/{0}/*/start'.format(disk)
    return [basename(dirname(p)) for p in glob(partition_glob)]

Code Snippets

$ lsblk -o name -n -s -l
sda1
sda
sda2
sda
sr0
vg-root
sda3
sda
vg-var
sda3
sda
vg-data
sda3
sda
$ lsblk -o name -n -s
sda1
└─sda
sda2
└─sda
sr0
vg-root
└─sda3
  └─sda
vg-var
└─sda3
  └─sda
vg-data
└─sda3
  └─sda
$ lsblk -o name -n
sda
├─sda1
├─sda2
└─sda3
  ├─vg-root
  ├─vg-var
  └─vg-data
sr0
$ lsblk -o name -n -l
sda
sda1
sda2
sda3
vg-root
vg-var
vg-data
sr0
run('/bin/lsblk -o name -n -s -l'.split(), stdout=PIPE)
from glob import glob
from os.path import basename, dirname

def physical_drives():
    drive_glob = '/sys/block/*/device'
    return [basename(dirname(d)) for d in glob(drive_glob)]

def partitions(disk):
    if disk.startswith('.') or '/' in disk:
        raise ValueError('Invalid disk name {0}'.format(disk))
    partition_glob = '/sys/block/{0}/*/start'.format(disk)
    return [basename(dirname(p)) for p in glob(partition_glob)]

Context

StackExchange Code Review Q#152486, answer score: 11

Revisions (0)

No revisions yet.