Cross-Site Scripting (XSS)

Cross-site scripting (XSS) is a vulnerability that permits an attacker to inject code (typically HTML or Javascript) into contents of a website not under the attacker's control. When a victim views such a page, the injected code executes in the victim's browser. Thus, the attacker has bypassed the browser's same origin policy and can steal victim's private information associated with the website in question.
In a reflected XSS attack, the attack is in the request itself (frequently the URL) and the vulnerability occurs when the server inserts the attack in the response verbatim or incorrectly escaped or sanitized. The victim triggers the attack by browsing to a malicious URL created by the attacker. In a stored XSS attack, the attacker stores the attack in the application (e.g., in a snippet) and the victim triggers the attack by browsing to a page on the server that renders the attack, by not properly escaping or sanitizing the stored data.

More details

To understand how this could happen: suppose the URL http://www.google.com/search?q=flowers returns a page containing the HTML fragment

<p>Your search for 'flowers'
returned the following results:</p>

that is, the value of the query parameter q is inserted verbatim into the page returned by Google. If www.google.com did not do any validation or escaping of q (it does), an attacker could craft a link that looks like this:

http://www.google.com/search?q=flowers+%3Cscript%3Eevil_script()%3C/script%3E

and trick a victim into clicking on this link. When a victim loads this link, the following page gets rendered in the victim's browser:

<p>Your search for 'flowers<script>evil_script()</script>'
returned the following results:</p>

And the browser executes evil_script(). And since the page comes from www.google.com, evil_script() is executed in the context of www.google.com and has access to all the victim's browser state and cookies for that domain.
Note that the victim does not even need to explicitly click on the malicious link. Suppose the attacker owns www.evil.example.com, and creates a page with an <iframe> pointing to the malicious link; if the victim visits www.evil.example.com, the attack will silently be activated.

XSS Challenges

Typically, if you can get Javascript to execute on a page when it's viewed by another user, you have an XSS vulnerability. A simple Javascript function to use when hacking is the alert() function, which creates a pop-up box with whatever string you pass as an argument.
You might think that inserting an alert message isn't terribly dangerous, but if you can inject that, you can inject other scripts that are more malicious. It is not necessary to be able to inject any particular special character in order to attack. If you can inject alert(1) then you can inject arbitrary script using eval(String.fromCharCode(...)).
Your challenge is to find XSS vulnerabilities in Gruyere. You should look for vulnerabilities both in URLs and in stored data. Since XSS vulnerabilities usually involve applications not properly handling untrusted user data, a common method of attack is to enter random text in input fields and look at how it gets rendered in the response page's HTML source. But before we do that, let's try something simpler.

File Upload XSS

Can you upload a file that allows you to execute arbitrary script on the google-gruyere.appspot.com domain?

Hint

You can upload HTML files and HTML files can contain script.

Exploit and Fix

To exploit, upload a .html file containing a script like this:

<script>
alert(document.cookie);
</script>

To fix, host the content on a separate domain so the script won't have access to any content from your domain. That is, instead of hosting user content on example.com/username we would host it at username.usercontent.example.com or username.example-usercontent.com. (Including something like "usercontent" in the domain name avoids attackers registering usernames that look innocent like wwww and using them for phishing attacks.)

Reflected XSS

There's an interesting problem here. Some browsers have built-in protection against reflected XSS attacks. There are also browser extensions like NoScript that provide some protection. If you're using one of those browsers or extensions, you may need to use a different browser or temporarily disable the extension to execute these attacks.
At the time this codelab was written, the two browsers which had this protection were IE and Chrome. To work around this, Gruyere automatically includes a X-XSS-Protection: 0 HTTP header in every response which is recognized by IE and will be recognized by future versions of Chrome. (It's available in the developer channel now.) If you're using Chrome, you can try starting it with the --disable-xss-auditor flag by entering one of these commands:

Windows: "C:\Documents and Settings\USERNAME\Local Settings\Application Data\Google\Chrome\Application\chrome.exe" --disable-xss-auditor
Mac: /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --disable-xss-auditor
GNU/Linux: /opt/google/chrome/google-chrome --disable-xss-auditor

If you're using Firefox with the NoScript extension, add google-gruyere.appspot.com to the allow list. If you still can't get the XSS attacks to work, try a different browser. You may think that you don't need to worry about XSS if the browser protects against it. The truth is that the browser protection can't be perfect because it doesn't really know your application and therefore there may be ways for a clever hacker to circumvent that protection. The real protection is to not have an XSS vulnerability in your application in the first place.

Find a reflected XSS attack. What we want is a URL that when clicked on will execute a script.

Hint 1

What does this URL do?

http://google-gruyere.appspot.com/123/invalid

Hint 2

The most dangerous characters in a URL are < and >. If you can get an application to directly insert what you want in a page and can get those characters through, then you can probably get a script through. Try these:

http://google-gruyere.appspot.com/123/%3e%3c
http://google-gruyere.appspot.com/123/%253e%253c
http://google-gruyere.appspot.com/123/%c0%be%c0%bc
http://google-gruyere.appspot.com/123/%26gt;%26lt;
http://google-gruyere.appspot.com/123/%26amp;gt;%26amp;lt;
http://google-gruyere.appspot.com/123/\074\x3c\u003c\x3C\u003C\X3C\U003C
http://google-gruyere.appspot.com/123/+ADw-+AD4-

This tries > and < in many different ways that might be able to make it through the URL and get rendered incorrectly using: verbatim (URL %-encoding), double %-encoding, bad UTF-8 encoding, HTML &-encoding, double &-encoding, and several different variations on C-style encoding. View the resulting source and see if any of those work. (Note: literally typing >< in the URL is identical to %3e%3c because the browser automatically %-encodes those character. If you are trying to want a literal > or < then you will need to use a tool like curl to send those characters in URL.)

Exploit and Fix

To exploit, create a URL like the following and get a victim to click on it:

http://google-gruyere.appspot.com/123/<script>alert(1)</script>

To fix, you need to escape user input that is displayed in error messages. Error messages are displayed using error.gtl, but are not escaped in the template. The part of the template that renders the message is {{message}} and it's missing the modifier that tells it to escape user input. Add the :text modifier to escape the user input:

<div class="message">{{_message:text}}</div>

This flaw would have been best mitigated by a design that escapes all output by default and only displays raw HTML when explicitly tagged to do so. There are also autoescaping features available in many template systems.

Stored XSS

Now find a stored XSS. What we want to do is put a script in a place where Gruyere will serve it back to another user.
The most obvious place that Gruyere serves back user-provided data is in a snippet (ignoring uploaded files which we've already discussed.)

Hint 1

Put this in a snippet and see what you get:

<script>alert(1)</script>

There are many different ways that script can be embedded in a document.

Hint 2

Hackers don't limit themselves to valid HTML syntax. Try some invalid HTML and see what you get. You may need to experiment a bit in order to find something that will work. There are multiple ways to do this.

Exploit and Fix

To exploit, enter any of these as your snippet (there are certainly more methods):

(1) <a onmouseover="alert(1)" href="#">read this!</a>

(2) <p <script>alert(1)</script>hello

(3) </td <script>alert(1)</script>hello

Notice that there are multiple failures in sanitizing the HTML. Snippet 1 worked because onmouseover was inadvertently omitted from the list of disallowed attributes in sanitize.py. Snippets 2 and 3 work because browsers tend to be forgiving with HTML syntax and the handling of both start and end tags is buggy.
To fix, we need to investigate and fix the sanitizing performed on the snippets. Snippets are sanitized in _SanitizeTag in the sanitize.py file. Let's block snippet 1 by adding "onmouseover" to the list of disallowed_attributes.
Oops! This doesn't completely solve the problem. Looking at the code that was just fixed, can you find a way to bypass the fix?

Hint

Take a close look at the code in _SanitizeTag that determines whether or not an HTML attribute is allowed or not.

Exploit and Fix

The fix was insufficient because the code that checks for disallowed attributes is case sensitive and HTML is not. So this still works:

(1') <a ONMOUSEOVER="alert(1)" href="#">read this!</a>

Correctly sanitizing HTML is a tricky problem. The _SanitizeTag function has a number of critical design flaws:

It does not validate the well-formedness of the input HTML. As we see, badly formed HTML passes through the sanitizer unchanged. Since browsers typically apply very lenient parsing, it is very hard to predict the browser's interpretation of the given HTML unless we exercise strict control on its format.
It uses blacklisting of attributes, which is a bad technique. One of our exploits got past the blacklist simply by using an uppercase version of the attribute. There could be other attributes missing from this list that are dangerous. It is always better to whitelist known good values.
The sanitizer does not do any further sanitization of attribute values. This is dangerous since URI attributes like href and src and the style attribute can all be used to inject javascript.

The right approach to HTML sanitization is to:

Parse the input into an intermediate DOM structure, then rebuild the body as well-formed output.
Use strict whitelists for allowed tags and attributes.
Apply strict sanitization of URL and CSS attributes if they are permitted.

Whenever possible it is preferable to use an already available known and proven HTML sanitizer.

Stored XSS via HTML Attribute

You can also do XSS by injecting a value into an HTML attribute. Inject a script by setting the color value in a profile.

Hint 1

The color is rendered as style='color:color'. Try including a single quote character in your color name.

Hint 2

You can insert an HTML attribute that executes a script.

Exploit and Fixes

To exploit, use the following for your color preference:

red' onload='alert(1)' onmouseover='alert(2)

You may need to move the mouse over the snippet to trigger the attack. This attack works because the first quote ends the style attribute and the second quote starts the onload attribute.
But this attack shouldn't work at all. Take a look at home.gtl where it renders the color. It says style='{{color:text}}' and as we saw earlier, the :text part tells it to escape text. So why doesn't this get escaped? In gtl.py, it calls cgi.escape(str(value)) which takes an optional second parameter that indicates that the value is being used in an HTML attribute. So you can replace this with cgi.escape(str(value),True). Except that doesn't fix it! The problem is that cgi.escape assumes your HTML attributes are enclosed in double quotes and this file is using single quotes. (This should teach you to always carefully read the documentation for libraries you use and to always test that they do what you want.)
You'll note that this attack uses both onload and onmouseover. That's because even though W3C specifies that onload events is only supported on body and frameset elements, some browsers support them on other elements. So if the victim is using one of those browsers, the attack always succeeds. Otherwise, it succeeds when the user moves the mouse. It's not uncommon for attackers to use multiple attack vectors at the same time.
To fix, we need to use a correct text escaper, that escapes single and double quotes too. Add the following function to gtl.py and call it instead of cgi.escape for the text escaper.

def _EscapeTextToHtml(var):
  """Escape HTML metacharacters.

  This function escapes characters that are dangerous to insert into
  HTML. It prevents XSS via quotes or script injected in attribute values.

  It is safer than cgi.escape, which escapes only <, >, & by default.
  cgi.escape can be told to escape double quotes, but it will never
  escape single quotes.
  """
  meta_chars = {
      '"': '&quot;',
      '\'': '&#39;',  # Not &apos;
      '&': '&amp;',
      '<': '&lt;',
      '>': '&gt;',
      }
  escaped_var = ""
  for i in var:
    if i in meta_chars:
      escaped_var = escaped_var + meta_chars[i]
    else:
      escaped_var = escaped_var + i
  return escaped_var

Oops! This doesn't completely solve the problem. Even with the above fix in place, the color value is still vulnerable.

Hint 1

Some browsers allow you to include script in stylesheets.

Hint 2

The easiest browser to exploit in this way is Internet Explorer which supports dynamic CSS properties.

Another Exploit and Fix

Internet Explorer's dynamic CSS properites (aka CSS expressions) make this attack particularly easy.
To exploit, use the following for your color preference:

expression(alert(1))

While other browsers don't support CSS expressions, there are other dangerous CSS properties, such as Mozilla's -moz-binding.
To fix, we need to sanitize the color as a color. The best thing to do would be to add a new output sanitizing form to gtl, i.e., we would write {{foo:color}} which makes sure foo is safe to use as a color. This function can be used to sanitize:

SAFE_COLOR_RE = re.compile(r"^#?[a-zA-Z0-9]*$")

def _SanitizeColor(color):
  """Sanitizes a color, returning 'invalid' if it's invalid.

  A valid value is either the name of a color or # followed by the
  hex code for a color (like #FEFFFF). Returning an invalid value
  value allows a style sheet to specify a default value by writing
  'color:default; color:{{foo:color}}'.
  """

  if SAFE_COLOR_RE.match(color):
    return color
  return 'invalid'

Colors aren't the only values we might want to allow users to provide. You should do similar sanitizing for user-provided fonts, sizes, urls, etc. It's helpful to do input validation, so that when a user enters an invalid value, you'll reject it at that time. But only doing input validation would be a mistake: if you find an error in your validation code or a new browser exposes a new attack vector, you'd have to go back and scrub all previously entered values. Or, you could add the output validation which you should have been doing in the first place.

Stored XSS via AJAX

Find an XSS attack that uses a bug in Gruyere's AJAX code. The attack should be triggered when you click the refresh link on the page.

Hint 1

Run curl on http://google-gruyere.appspot.com/123/feed.gtl and look at the result. (Or browse to it in your browser and view source.) You'll see that it includes each user's first snippet into the response. This entire response is then evaluated on the client side which then inserts the snippets into the document. Can you put something in your snippet that will be parsed differently than expected?

Hint 2

Try putting some quotes (") in your snippet.

Exploit and Fixes

To exploit, Put this in your snippet:

all <span style=display:none>"
+ (alert(1),"")
+ "</span>your base

The JSON should look like

_feed(({..., "Mallory": "snippet", ...}))

but instead looks like this:

_feed({..., "Mallory": "all <span style=display:none>"
+ (alert(1),"")
+ "</span>your base", ...})

Each underlined part is a separate expression. Note that this exploit is written to be invisible both in the original page rendering (because of the <span style=display:none>) and after refresh (because it inserts only an empty string). All that will appear on the screen is all your base. There are bugs on both the server and client sides which enable this attack. To fix, first, on the server side, the text is incorrectly escaped when it is rendered in the JSON response. The template says {{snippet.0:html}} but that's not enough. This text is going to be inserted into the innerHTML of a DOM node so the HTML does have to be sanitized. However, that sanitized text is then going to be inserted into Javascript and single and double quotes have to be escaped. That is, adding support for {{...:js}} to GTL would not be sufficient; we would also need to support something like {{...:html:js}}.
To escape quotes, use \x27 and \x22 for single and double quote respectively. Replacing them with  and " is incorrect as those are not recognized in Javascript strings and will break quotes around HTML attribute.
Second, in the browser, Gruyere converts the JSON by using Javascript's eval. In general, eval is very dangerous and should rarely be used. If it used, it must be used very carefully, which is hardly the case here. We should be using the JSON parser which ensures that the string does not include any unsafe content. The JSON parser is available at json.org.

Reflected XSS via AJAX

Find a URL that when clicked on will execute a script using one of Gruyere's AJAX features.

Hint 1

When Gruyere refreshes a user snippets page, it uses

http://google-gruyere.appspot.com/123/feed.gtl?uid=value

and the result is the script

_feed((["user", "snippet1", ... ]))

Hint 2

This uses a different vulnerability, but the exploit is very similar to the previous reflected XSS exploit.

Exploit and Fixes

To exploit, create a URL like the following and get a victim to click on it:

http://google-gruyere.appspot.com/123/feed.gtl?uid=<script>alert(1)</script>
http://google-gruyere.appspot.com/123/feed.gtl?uid=%3Cscript%3Ealert(1)%3C/script%3E

This renders as

_feed((["<script>alert(1)</script>"]))

which surprisingly does execute the script. The bug is that Gruyere returns all gtl files as content type text/html and browsers are very tolerant of what HTML files they accept. To fix, you need to make sure that your JSON content can never be interpreted as HTML. Even though literal < and > are allowed in Javascript strings, you need to make sure they don't appear literally where a browser can misinterpret them. Thus, you'd need to modify {{...:js}} to replace them with the Javascript escapes \x3c and \x3e. It is always safe to write '\x3c\x3e' in Javscript strings instead of '<>'. (And, as noted above, using the HTML escapes < and > is incorrect.)
You should also always set the content type of your responses, in this case serving JSON results as application/javascript. This alone doesn't solve the problem because browsers don't always respect the content type: browsers sometimes do "sniffing" to try to "fix" results from servers that don't provide the correct content type.
But wait, there's more! Gruyere doesn't set the content encoding either. And some browsers try to guess what the encoding type of a document is or an attacker may be able to embed content in a document that defines the content type. So, for example, if an attacker can trick the browser into thinking a document is UTF-7 then it could embed a script tag as +ADw-script+AD4- since +ADw- and +AD4- are alternate encodings for < and >. So always set both the content type and the content encoding of your responses, e.g., for HTML:

Content-Type: text/html; charset=utf-8

More about XSS

In addition to the XSS attacks described above, there are quite a few more ways to attack Gruyere with XSS. Collect them all!
XSS is a difficult beast. On one hand, a fix to an XSS vulnerability is usually trivial and involves applying the correct sanitizing function to user input when it's displayed in a certain context. On the other hand, if history is any indication, this is extremely difficult to get right. US-CERT reports dozens of publicly disclosed XSS vulnerabilities involving multiple companies.
Though there is no magic defense to getting rid of XSS vulnerabilities, here are some steps you should take to prevent these types of bugs from popping up in your products:

First, make sure you understand the problem.
Wherever possible, do sanitizing via templates features instead of calling escaping functions in source code. This way, all of your escaping is done in one place and your product can benefit from security technologies designed for template systems that verify their correctness or actually do the escaping for you. Also, familiarize yourself with the other security features of your template system.
Employ good testing practices with respect to XSS.
Don't write your own template library :)

Cross-Site Scripting (XSS)