patternpythongitModerate
Avoiding shell injection when calling Git in Python
Viewed 0 times
shellgitpythonwheninjectioncallingavoiding
Problem
I'm developing a command line application in Python which processes information that users provide. I've been reading the Python documentation on this topic and found that the built-in module
The solution I came up with uses a set of illegal substrings, such as
The inherent issue with this approach is that it heavily depends on the set of illegal characters I'm using. Is there any way I can sanitize a string taking into account the common attacks? And if not, how can I improve my code?
The user is expected to provide git commands, such as
subprocess provides the call method, which executes a string as if it was run on a command line. The disadvantage of this approach is that it can lead to shell injection.The solution I came up with uses a set of illegal substrings, such as
rm or -rf, and then uses regex to check whether the provided string contains a substring from the set.import re
from string import ascii_letters, punctuation
# Generate a set of characters used for string sanitization.
punctuation_minus = set(punctuation) - set(('.', '_', '-'))
substrings = set(("rm", "&&", "-f", "null", "/dev/null", "2>&1", "-rf",
"rf"))
illegal = [re.escape(item) for item in (punctuation_minus | substrings)]
illegal_as_str = "|".join(illegal)
def sanitize_string(s):
"""Cleans a string from illegal characters."""
# Get all legal strings.
operations = re.split(illegal_as_str, s)
# Remove trailing spaces
no_trailings = [x.strip() for x in operations]
# Remove empty strings.
list_of_operations = filter(None, no_trailings)
# Generate a safe string to return.
operations_str = " ".join(list_of_operations)
try:
operations_str.decode("utf-8")
return operations_str
except UnicodeDecodeError as detail:
print("UnicodeDecodeError: " + str(detail))
return FalseThe inherent issue with this approach is that it heavily depends on the set of illegal characters I'm using. Is there any way I can sanitize a string taking into account the common attacks? And if not, how can I improve my code?
The user is expected to provide git commands, such as
git add or git push. Each one of these git commands needs certain parameters that the user needs to provide. The next example shows how all the illegal substrings are removed from a userSolution
Your defense strategy is known as "enumerating badness". As a rule, any strategy based on enumerating badness is doomed to fail. For example, you prohibit
If you ever need to execute an external program with user-supplied parameters, sensible practices would be to use
Do not attempt to prohibit any "bad" words; you want your code to work for any legitimate filenames. Do check that the filenames are legitimate, though — for example, not containing too many
Better yet, use a Python module that takes care of interacting with Git so that you don't have to deal with the problem yourself.
rm -rf /, but have you considered find / -delete, which basically does the same thing? How about some use of the dd if=/dev/zero? What if a user legitimately wants to perform an operation on a file named normal.py or even rm.c? Why are you silently altering the command to do something else?If you ever need to execute an external program with user-supplied parameters, sensible practices would be to use
subprocess.run() or subprocess.Popen()- with the command line as a list already split into words,
- with the
shell=Falseparameter (the best way to avoid shell injection is to avoid using the shell at all!),
- with the command given using an absolute path (which is essential with
shell=False),
- with
--preceding the user-supplied filenames, so that filenames that start with-are treated literally.
Do not attempt to prohibit any "bad" words; you want your code to work for any legitimate filenames. Do check that the filenames are legitimate, though — for example, not containing too many
../../../...Better yet, use a Python module that takes care of interacting with Git so that you don't have to deal with the problem yourself.
Context
StackExchange Code Review Q#155891, answer score: 12
Revisions (0)
No revisions yet.