HiveBrain v1.2.0
Get Started
← Back to all entries
patternbashMinor

Directory's disk usage list

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
usagedirectorydisklist

Problem

For my classes, I had to finish this task:


Directory's disk usage list


For indicated directory print a list of files and subdirectories in
descending order according to their total disk usage. Disk usage of a
simple file is equivalent to its size, for a subdirectory - a sum of
the sizes of all files in its filesystem branch (subtree). Printed
list format:


In the preceding print format
is in case of a file not being a directory an owner, otherwise, the
holder of a directory is the owner of the files having greatest total
disk usage in it. `` is a letter: d for a directory, - for a
standard file, l - for a symbolic link etc.


Below file disk usage list, print a summary:


Total disk usage of the files in directory's subtree, the list of
total disk usage of files in directory's subtree with respect to the
owners (sums over owners). CAUTION: during recursive directories
listing do not dereference symbolic links.

Here is my code:

#!/bin/bash

cd "$1"
ls -lA | awk 'NR != 1{
        name=$8
        for ( i=9; i/dev/null"
                while ( find_proc | getline )
                {
                        sum+=$1;usr_sum[$2]+=$1;
                        sum_all+=$1;usr_sum_all[$2]+=$1;      
                }
                close(find_proc)

                owner=""
                owner_sum=-1
                for ( iowner in usr_sum ) if ( usr_sum[iowner] > owner_sum ) {owner = iowner; owner_sum=usr_sum[iowner]}
                print sum " " owner " d " name
        }
}
END {
        print "Space taken: " sum_all
        for ( iowner in usr_sum_all ) print "\t user: " iowner " " usr_sum_all[iowner]
}' | sort -gr


Is it ok? Should I make any changes?

Solution

I just copied your code, pasted it in to a terminal, ran it, and it did exactly what it is supposed to do. Now, how to make it better?

You don't mention it in the question, but using ls, awk, find, and sort together somehow seems unlikely to be the answer your tutor expected. You don't mention what your course is in, but I would have expected there to be a less 'cobbled together' solution, using perhaps bash only, or perl, python, or something else.

I certainly would not have solved this problem this way (I would have used perl).

Still, assuming your code meets the expectations for tool use (as in, you are supposed to use awk), there are some things that are off.
Single-line multi-statements

You have a number of 1-liners that you should have expanded. You already have the massive awk code-block, why mix the compact 1-liner and block statements like you have done?

for ( iowner in usr_sum ) if ( usr_sum[iowner] > owner_sum ) {owner = iowner; owner_sum=usr_sum[iowner]}


The above should be:

for ( iowner in usr_sum )
{
    if ( usr_sum[iowner] > owner_sum )
    {
         owner = iowner
         owner_sum=usr_sum[iowner]
    }
}


Note, the use of block {} braces for the single if statement, and not using the semi-colon ; to separate statements on a single line.

These other statements should also be on their own lines (and the for statement should have a {} block):

for ( i=9; i<=NF; i++) name=name " " $i
...
sum=0;delete usr_sum;
...
sum+=$1;usr_sum[$2]+=$1;
sum_all+=$1;usr_sum_all[$2]+=$1;


Negative if

Your main if-block would be better if done as a positive check, not a negative check. your code is:

if (type!="d")
{
    ... do not a directory stuff
}
else
{
    ... do directory stuff
}


but would be more readable as:

if (type == "d")
{
    ... do directory stuff
}
else
{
    ... do not a directory stuff
}


Breathing space

Your code is suffocating since it has little breathing space. Many of your statements are compressed around the operators and other symbols. Consider lines like:

print $5,$3,type,name
          sum_all+=$5
          usr_sum_all[$3]+=$5


Which should be written as:

print $5, $3, type, name
            sum_all += $5
            usr_sum_all[$3] += $5


Performance

The multiple calls to find may be hurting your performance. In general I can't help but think there's a better way to do this without all the nested OS calls.

Still, it ran fast enough for me on my machine, so it is probably not a real problem.

Code Snippets

for ( iowner in usr_sum ) if ( usr_sum[iowner] > owner_sum ) {owner = iowner; owner_sum=usr_sum[iowner]}
for ( iowner in usr_sum )
{
    if ( usr_sum[iowner] > owner_sum )
    {
         owner = iowner
         owner_sum=usr_sum[iowner]
    }
}
for ( i=9; i<=NF; i++) name=name " " $i
...
sum=0;delete usr_sum;
...
sum+=$1;usr_sum[$2]+=$1;
sum_all+=$1;usr_sum_all[$2]+=$1;
if (type!="d")
{
    ... do not a directory stuff
}
else
{
    ... do directory stuff
}
if (type == "d")
{
    ... do directory stuff
}
else
{
    ... do not a directory stuff
}

Context

StackExchange Code Review Q#27603, answer score: 3

Revisions (0)

No revisions yet.