HiveBrain v1.2.0
Get Started
← Back to all entries
patternpythongitModerate

Avoiding shell injection when calling Git in Python

Submitted by: @import:stackexchange-codereview··
0
Viewed 0 times
shellgitpythonwheninjectioncallingavoiding

Problem

I'm developing a command line application in Python which processes information that users provide. I've been reading the Python documentation on this topic and found that the built-in module subprocess provides the call method, which executes a string as if it was run on a command line. The disadvantage of this approach is that it can lead to shell injection.

The solution I came up with uses a set of illegal substrings, such as rm or -rf, and then uses regex to check whether the provided string contains a substring from the set.

import re
from string import ascii_letters, punctuation

# Generate a set of characters used for string sanitization.
punctuation_minus = set(punctuation) - set(('.', '_', '-'))
substrings = set(("rm", "&&", "-f", "null", "/dev/null", "2>&1", "-rf",
              "rf"))
illegal = [re.escape(item) for item in (punctuation_minus | substrings)]
illegal_as_str = "|".join(illegal)

def sanitize_string(s):
    """Cleans a string from illegal characters."""
    # Get all legal strings.
    operations = re.split(illegal_as_str, s)
    # Remove trailing spaces
    no_trailings = [x.strip() for x in operations]
    # Remove empty strings.
    list_of_operations = filter(None, no_trailings)
    # Generate a safe string to return.
    operations_str = " ".join(list_of_operations)
    try:
        operations_str.decode("utf-8")
        return operations_str
    except UnicodeDecodeError as detail:
        print("UnicodeDecodeError: " + str(detail))
        return False


The inherent issue with this approach is that it heavily depends on the set of illegal characters I'm using. Is there any way I can sanitize a string taking into account the common attacks? And if not, how can I improve my code?

The user is expected to provide git commands, such as git add or git push. Each one of these git commands needs certain parameters that the user needs to provide. The next example shows how all the illegal substrings are removed from a user

Solution

Your defense strategy is known as "enumerating badness". As a rule, any strategy based on enumerating badness is doomed to fail. For example, you prohibit rm -rf /, but have you considered find / -delete, which basically does the same thing? How about some use of the dd if=/dev/zero? What if a user legitimately wants to perform an operation on a file named normal.py or even rm.c? Why are you silently altering the command to do something else?

If you ever need to execute an external program with user-supplied parameters, sensible practices would be to use subprocess.run() or subprocess.Popen()

  • with the command line as a list already split into words,



  • with the shell=False parameter (the best way to avoid shell injection is to avoid using the shell at all!),



  • with the command given using an absolute path (which is essential with shell=False),



  • with -- preceding the user-supplied filenames, so that filenames that start with - are treated literally.



Do not attempt to prohibit any "bad" words; you want your code to work for any legitimate filenames. Do check that the filenames are legitimate, though — for example, not containing too many ../../../...

Better yet, use a Python module that takes care of interacting with Git so that you don't have to deal with the problem yourself.

Context

StackExchange Code Review Q#155891, answer score: 12

Revisions (0)

No revisions yet.