I'm working on modernizing Rosetta Code's infrastructure. Starting with communications. Please accept this time-limited open invite to RC's Slack.. --Michael Mol (talk) 20:59, 30 May 2020 (UTC)

Sanitize user input

From Rosetta Code

"Never trust user input." If the Super Mario Bros. 3 Wrong Warp or [Bobby Tables] have taught programmers anything, it's that user input can be dangerous in unexpected ways.

In general, the task of preventing errors such as the above are best left to the built-in security features of the language rather than a filter of your own creation. This exercise is to test your ability to think about all the possible ways user input could break your program.

Task

Create a function that takes a list of 20 first and last names, and copies them to a record or struct. Ten of them must be typical input, (i.e. consist of only letters of the alphabet and punctuation), but the other ten must be deliberately chosen to cause problems with a program that expects only letters and punctuation. A few examples:

  • ASCII control codes such as NUL, CR, LF
  • Code for the language you are using that can result in damage (e.g. -rm -rf, delete System32, DROP TABLE, etc.)
  • Numbers, symbols, foreign languages, emojis, etc.


(There were already solutions provided before the requirement that ten names are "normal" and ten are potentially harmful was added. Those answers satisfied the task requirements at the time they were submitted.)

Related tasks


Phix[edit]

As noted there is no magic "one size fits all" solution, and in the specific case of sql the use of sqlite3_prepare() and sqlite3_bind_text() is strongly recommended in preference to sqlite3_exec() or sqlite3_get_table(), at least for any questionable input. Using sqlite3_bind_text() there is no problem whatsoever with having a student named (say) "Robert'); DROP TABLE students;--".

Given some suspect [Phix] source code to be run, it is simply not practical to cover cases such as system(rot13(reverse("se- ze"))) or any of the other myriad ways in which harmful content could be disguised. In case you have not guessed, that would execute "rm -rf", assuming the code also contains a working rot13() implementation.

Of course you could block all use, even legitimate, of things like system(), as covered by Safe_mode and Untrusted_environment, or whitelist as per the Raku entry below.

The inverse problem recently arose in p2js, whereby otherwise perfectly valid code on desktop/Phix could and would generate invalid HTML/Javascript if and when we tried to self-host (an effort which is still very much in progress, albeit not apace, btw):

with javascript_semantics
string header = """
<!DOCTYPE html>
<html lang="en" >
 <head>
  <title>%%s</title>%s
 </head>
 <body>
  <scr!ipt src="p2js.js"></scr!ipt>%%s%s
"""
-- ...
header = substitute(header,"scr!ipt","script")

puts(1,header)  -- (make the example runnable)

In other words I had to "sanitize" a constant in the source code, in this particular case, and I could have gone further and done something similar with all the other tags, but in practice there was no need to because the generated JavaScript was already always inside a script tag.

Raku[edit]

It would be helpful if the task author would be a little more specific about what he is after. How user inputs must be "sanitized" entirely depends on how the data is going to be used.

For internal usage, in Raku, where you are simply storing data into an internal data structure, it is pretty much a non issue. Variables in Raku aren't executed without specific instructions to do so. Full stop.

Your name is a string of 2.6 million null bytes? Ok. Good luck typing that in.

You're called 'rm -rf /'? Wow. sucks to be you.

Now, it may be a good idea to check for a maximum permitted length; (2.6e6 null bytes) but Raku would handle it with no problem.

The problem mostly comes in when you need to interchange data with some 3rd party library / system; but every different system is going to have it's own quirks and caveats. There is no "one size fits all" solution.

In general, when it comes to sanitizing user input, the best way to go about it is: don't. It's a losing game.

Instead either validate input to make sure it follows a certain format, whitelist input so only a know few commands are permitted, or if those aren't possible, use 3rd party tools the 3rd party system provides to make arbitrary input "safe" to run. Which one of these is used depends on what system you need to interact with.

For the case given, (Bobby Tables), where you are presumably putting names into some 3rd party data storage (nominally a database of some kind), you would use bound parameters to automatically "make safe" any user input. See the Raku entry under the Parametrized SQL statement task.

Validating is making sure the the input matches some predetermined format, usually with some sort of regular expression. For names, you probably want to allow a fixed maximum (and minimum!) number of: any word or digit character, space and period characters and possibly some small selection of non-word characters. It is a careful balance between too restrictive and too permissive. You need to avoid falling into pre-conceived assumptions about: names, time, gender, addresses, phone numbers... the list goes on.

When passing a user command to the operating system, you probably want to use whitelisting. Only a very few commands from a predetermined list are allowed to be used.

   if $command ∈ <ls time cd df> then { execute $command }

or some such. What the whitelist contains and how to determine if the input matches is a whole article in itself.

Unfortunately, this is very vague and hand-wavey due to the vagueness of the task description. Really, any language could copy/paste 95% or better of the above, change the language name, and be done with it. But until the task description is made a little more focused, it will have to do.

Wren[edit]

Library: Wren-ioutil
Library: Wren-pattern
Library: Wren-str
Library: Wren-trait


The following assumes that names are only valid if they contain ASCII letters, hyphens or apostrophes. However, the first or last character of a name can't be a punctuation character and a name must be between 1 and 20 characters long. A single character name is allowed to cater for an initial where the full name is not known. People are given a chance to abbreviate their names if they are too long.

No other characters are allowed including control characters, spaces, symbols, emojis and non-English letters. Names which include them are simply rejected.

Furthermore, there is a blacklist of unacceptable names though in practice this would probably be longer or more sophisticated than the one I've used here, depending on what will be done with the records later.

import "/ioutil" for Input
import "/pattern" for Pattern
import "/str" for Str
import "/trait" for Indexed
 
class Person {
construct new(firstName, lastName) {
_firstName = firstName
_lastName = lastName
}
 
firstName { _firstName }
lastName { _lastName }
 
toString { _firstName + " " + _lastName }
}
 
var persons = []
var blacklist = [
"drop", "delete", "erase", "kill", "wipe", "remove",
"file", "files", "directory", "directories",
"table", "tables", "record", "records", "database", "databases",
"system", "system32", "system64", "rm", "rf", "rmdir", "format", "reformat"
]
 
var punct = "'-" // allowable punctuation
var i = Pattern.letter + punct
var p = Pattern.new("+1&i", Pattern.whole, i)
 
var sanitizeInput = Fn.new { |name|
var ok = p.isMatch(name) && !(punct.contains(name[0]) || punct.contains(name[-1]))
if (!ok) return "Sorry, your name contains unacceptable characters."
name = Str.lower(name)
if (blacklist.contains(name)) return "Sorry, your name is unacceptable."
return ""
}
 
for (i in 1..20) {
var names = List.filled(2, null)
var outer = false
for (se in Indexed.new(["first", "last "])) {
var name = Input.text("Enter your %(se.value) name : ", 1, 20)
var msg = sanitizeInput.call(name)
if (msg != "") {
System.print(msg + "\n")
outer = true
break
}
names[se.index] = name
}
if (outer) continue
persons.add(Person.new(names[0], names[1]))
System.print()
}
var count = persons.count
System.print("The following %(count) person(s) have been added to the database:")
for (person in persons) System.print(person)
Output:

Sample (abridged) input/output. The ninth person's name contains a tab character.

Enter your first name : Mickey_mouse
Sorry, your name contains unacceptable characters.

Enter your first name : Bobby
Enter your last  name : Tables
Sorry, your name is unacceptable.

Enter your first name : Fred
Enter your last  name : rm -rf/
Sorry, your name contains unacceptable characters.

Enter your first name : David
Enter your last  name : Wipe
Sorry, your name is unacceptable.

Enter your first name : Beyoncé
Sorry, your name contains unacceptable characters.

Enter your first name : A-12
Sorry, your name contains unacceptable characters.

Enter your first name : 'Andrew-
Sorry, your name contains unacceptable characters.

Enter your first name : 👨👨‍👩‍👦
Sorry, your name contains unacceptable characters.

Enter your first name : Don     ald
Sorry, your name contains unacceptable characters.

Enter your first name : Eric
Enter your last  name : Schäfer        
Sorry, your name contains unacceptable characters.

Enter your first name : Blaine
Enter your last  name : Wolfeschlegelsteinhausenbergerdorff
Must have a length between 1 and 20 characters, try again.
Enter your last  name : Wolfeschlegelstein'f 

Enter your first name : Marilyn
Enter your last  name : Monroe

Enter your first name : Bridget
Enter your last  name : O'Riley

... (plus another 7 acceptable people)

The following 10 person(s) have been added to the database:
Blaine Wolfeschlegelstein'f 
Marilyn Monroe
Bridget O'Riley
... (plus 7 more)