Cross-Site Scripting (XSS)
Cross-site scripting (XSS) is a vulnerability that permits an attacker to inject code (typically HTML or Javascript) into contents of a website not under the attacker's control. When a victim views such a page, the injected code executes in the victim's browser. Thus, the attacker has bypassed the browser's same origin policy and can steal victim's private information associated with the website in question.In a reflected XSS attack, the attack is in the request itself (frequently the URL) and the vulnerability occurs when the server inserts the attack in the response verbatim or incorrectly escaped or sanitized. The victim triggers the attack by browsing to a malicious URL created by the attacker. In a stored XSS attack, the attacker stores the attack in the application (e.g., in a snippet) and the victim triggers the attack by browsing to a page on the server that renders the attack, by not properly escaping or sanitizing the stored data.
More details
To understand how this could happen: suppose the
URL
Note that the victim does not even need to explicitly click on the malicious link. Suppose the attacker owns
http://www.google.com/search?q=flowers
returns a page
containing the HTML fragment
<p>Your search for 'flowers' returned the following results:</p>that is, the value of the query parameter
q
is inserted
verbatim into the page returned by
Google. If www.google.com
did not do any validation or
escaping of q
(it does), an attacker could craft a link
that looks like this:http://www.google.com/search?q=flowers+%3Cscript%3Eevil_script()%3C/script%3Eand trick a victim into clicking on this link. When a victim loads this link, the following page gets rendered in the victim's browser:
<p>Your search for 'flowers<script>evil_script()</script>' returned the following results:</p>And the browser executes
evil_script()
. And since
the page comes
from www.google.com
, evil_script()
is
executed in the context of www.google.com
and has access
to all the victim's browser state and cookies for that domain.
Note that the victim does not even need to explicitly click on the malicious link. Suppose the attacker owns
www.evil.example.com
, and creates a page with an
<iframe>
pointing to the malicious link; if the
victim visits www.evil.example.com
, the attack will
silently be activated.
XSS Challenges
Typically, if you can get Javascript to execute on a page when it's viewed by another user, you have an XSS vulnerability. A simple Javascript function to use when hacking is thealert()
function, which creates a pop-up box with whatever string you pass as
an argument.
You might think that inserting an alert message isn't terribly dangerous, but if you can inject that, you can inject other scripts that are more malicious. It is not necessary to be able to inject any particular special character in order to attack. If you can inject
alert(1)
then you can inject arbitrary script
using eval(String.fromCharCode(...))
.
Your challenge is to find XSS vulnerabilities in Gruyere. You should look for vulnerabilities both in URLs and in stored data. Since XSS vulnerabilities usually involve applications not properly handling untrusted user data, a common method of attack is to enter random text in input fields and look at how it gets rendered in the response page's HTML source. But before we do that, let's try something simpler.
File Upload XSS
Can you upload a file that allows you to execute arbitrary script on thegoogle-gruyere.appspot.com
domain?
Hint
You can upload HTML files and HTML files can contain script.
Exploit and Fix
To exploit, upload a
.html
file containing a script like this:
<script> alert(document.cookie); </script>To fix, host the content on a separate domain so the script won't have access to any content from your domain. That is, instead of hosting user content on
example.com/username
we
would host it at username.usercontent.example.com
or username.example-usercontent.com
. (Including
something like "usercontent
" in the domain name avoids
attackers registering usernames that look innocent
like wwww
and using them for phishing attacks.)
Reflected XSS
There's an interesting problem here. Some browsers have built-in protection against reflected XSS attacks. There are also browser extensions like NoScript that provide some protection. If you're using one of those browsers or extensions, you may need to use a different browser or temporarily disable the extension to execute these attacks.At the time this codelab was written, the two browsers which had this protection were IE and Chrome. To work around this, Gruyere automatically includes a X-XSS-Protection: 0 HTTP header in every response which is recognized by IE and will be recognized by future versions of Chrome. (It's available in the developer channel now.) If you're using Chrome, you can try starting it with the --disable-xss-auditor flag by entering one of these commands:
- Windows: "C:\Documents and Settings\USERNAME\Local Settings\Application Data\Google\Chrome\Application\chrome.exe" --disable-xss-auditor
- Mac: /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --disable-xss-auditor
- GNU/Linux: /opt/google/chrome/google-chrome --disable-xss-auditor
Find a reflected XSS attack. What we want is a URL that when clicked on will execute a script.
Hint 1
What does this URL do?
http://google-gruyere.appspot.com/123/invalid
Hint 2
The most dangerous characters in a URL are
<
and >
. If you can get an application to directly
insert what you want in a page and can get those characters through,
then you can probably get a script through. Try these:
http://google-gruyere.appspot.com/123/%3e%3c http://google-gruyere.appspot.com/123/%253e%253c http://google-gruyere.appspot.com/123/%c0%be%c0%bc http://google-gruyere.appspot.com/123/%26gt;%26lt; http://google-gruyere.appspot.com/123/%26amp;gt;%26amp;lt; http://google-gruyere.appspot.com/123/\074\x3c\u003c\x3C\u003C\X3C\U003C http://google-gruyere.appspot.com/123/+ADw-+AD4-This tries
>
and <
in many different
ways that might be able to make it through the URL and get rendered
incorrectly using: verbatim (URL %-encoding), double %-encoding, bad
UTF-8 encoding, HTML &-encoding, double &-encoding, and
several different variations on C-style encoding. View the resulting
source and see if any of those work. (Note: literally
typing ><
in the URL is identical
to %3e%3c
because the browser automatically %-encodes
those character. If you are trying to want a literal >
or <
then you will need to use a tool like curl to
send those characters in URL.)
Exploit and Fix
To exploit, create a URL like the following and get a
victim to click on it:
http://google-gruyere.appspot.com/123/<script>alert(1)</script>To fix, you need to escape user input that is displayed in error messages. Error messages are displayed using
error.gtl
,
but are not escaped in the template. The part of the template that
renders the message is {{message}}
and it's missing the
modifier that tells it to escape user input. Add
the :text
modifier to escape the user input:
<div class="message">{{_message:text}}</div>This flaw would have been best mitigated by a design that escapes all output by default and only displays raw HTML when explicitly tagged to do so. There are also autoescaping features available in many template systems.
Stored XSS
Now find a stored XSS. What we want to do is put a script in a place where Gruyere will serve it back to another user.The most obvious place that Gruyere serves back user-provided data is in a snippet (ignoring uploaded files which we've already discussed.)
Hint 1
Put this in a snippet and see what you get:
<script>alert(1)</script>There are many different ways that script can be embedded in a document.
Hint 2
Hackers don't limit themselves to valid HTML syntax. Try some invalid
HTML and see what you get. You may need to experiment a bit in order
to find something that will work. There are multiple ways to do this.
Exploit and Fix
To exploit, enter any of these as your snippet (there
are certainly more methods):
To fix, we need to investigate and fix the sanitizing performed on the snippets. Snippets are sanitized in
Oops! This doesn't completely solve the problem. Looking at the code that was just fixed, can you find a way to bypass the fix?
(1) <a onmouseover="alert(1)" href="#">read this!</a> (2) <p <script>alert(1)</script>hello (3) </td <script>alert(1)</script>helloNotice that there are multiple failures in sanitizing the HTML. Snippet 1 worked because
onmouseover
was
inadvertently omitted from the list of disallowed attributes
in sanitize.py
. Snippets
2 and 3 work because browsers tend to be forgiving with HTML syntax
and the handling of both start and end tags is buggy.
To fix, we need to investigate and fix the sanitizing performed on the snippets. Snippets are sanitized in
_SanitizeTag
in the sanitize.py
file. Let's block snippet 1 by adding "onmouseover"
to
the list of disallowed_attributes
.
Oops! This doesn't completely solve the problem. Looking at the code that was just fixed, can you find a way to bypass the fix?
Hint
Take a close look at the code in
_SanitizeTag
that
determines whether or not an HTML attribute is allowed or not.
Exploit and Fix
The fix was insufficient because the code that checks for disallowed
attributes is case sensitive and HTML is not. So this still works:
(1') <a ONMOUSEOVER="alert(1)" href="#">read this!</a>Correctly sanitizing HTML is a tricky problem. The
_SanitizeTag
function has a number of
critical design flaws:
- It does not validate the well-formedness of the input HTML. As we see, badly formed HTML passes through the sanitizer unchanged. Since browsers typically apply very lenient parsing, it is very hard to predict the browser's interpretation of the given HTML unless we exercise strict control on its format.
- It uses blacklisting of attributes, which is a bad technique. One of our exploits got past the blacklist simply by using an uppercase version of the attribute. There could be other attributes missing from this list that are dangerous. It is always better to whitelist known good values.
- The sanitizer does not do any further sanitization of attribute
values. This is dangerous since URI attributes like
href
andsrc
and thestyle
attribute can all be used to inject javascript.
- Parse the input into an intermediate DOM structure, then rebuild the body as well-formed output.
- Use strict whitelists for allowed tags and attributes.
- Apply strict sanitization of URL and CSS attributes if they are permitted.
Stored XSS via HTML Attribute
You can also do XSS by injecting a value into an HTML attribute. Inject a script by setting the color value in a profile.Hint 1
The color is rendered as
style='color:color'
.
Try including a single quote character in your color name.
Hint 2
You can insert an HTML attribute that executes a script.
Exploit and Fixes
To exploit, use the following for your color
preference:
But this attack shouldn't work at all. Take a look at
You'll note that this attack uses both
To fix, we need to use a correct text escaper, that escapes single and double quotes too. Add the following function to
red' onload='alert(1)' onmouseover='alert(2)You may need to move the mouse over the snippet to trigger the attack. This attack works because the first quote ends the
style
attribute and the second quote starts the
onload attribute.
But this attack shouldn't work at all. Take a look at
home.gtl
where
it renders the color. It says style='{{color:text}}'
and
as we saw earlier, the :text
part tells it to escape
text. So why doesn't this get escaped?
In gtl.py
, it
calls cgi.escape(str(value))
which takes an optional
second parameter that indicates that the value is being used in an
HTML attribute. So you can replace this
with cgi.escape(str(value),True)
. Except that doesn't fix
it! The problem is that cgi.escape
assumes your HTML
attributes are enclosed in double quotes and this file is using single
quotes. (This should teach you to always carefully read the
documentation for libraries you use and to always test that they do
what you want.)
You'll note that this attack uses both
onload
and onmouseover
. That's because even though W3C specifies
that onload events is only supported on body
and frameset
elements, some browsers support them on
other elements. So if the victim is using one of those browsers, the
attack always succeeds. Otherwise, it succeeds when the user moves the
mouse. It's not uncommon for attackers to use multiple attack vectors
at the same time.
To fix, we need to use a correct text escaper, that escapes single and double quotes too. Add the following function to
gtl.py
and call it instead
of cgi.escape
for the text
escaper.
def _EscapeTextToHtml(var): """Escape HTML metacharacters. This function escapes characters that are dangerous to insert into HTML. It prevents XSS via quotes or script injected in attribute values. It is safer than cgi.escape, which escapes only <, >, & by default. cgi.escape can be told to escape double quotes, but it will never escape single quotes. """ meta_chars = { '"': '"', '\'': ''', # Not ' '&': '&', '<': '<', '>': '>', } escaped_var = "" for i in var: if i in meta_chars: escaped_var = escaped_var + meta_chars[i] else: escaped_var = escaped_var + i return escaped_varOops! This doesn't completely solve the problem. Even with the above fix in place, the color value is still vulnerable.
Hint 1
Some browsers allow you to include script in stylesheets.
Hint 2
The easiest browser to exploit in this way is Internet Explorer which
supports dynamic CSS properties.
Another Exploit and Fix
Internet Explorer's dynamic CSS properites (aka CSS expressions) make
this attack particularly easy.
To exploit, use the following for your color preference:
To fix, we need to sanitize the color as a color. The best thing to do would be to add a new output sanitizing form to gtl, i.e., we would write
To exploit, use the following for your color preference:
expression(alert(1))While other browsers don't support CSS expressions, there are other dangerous CSS properties, such as Mozilla's
-moz-binding
.
To fix, we need to sanitize the color as a color. The best thing to do would be to add a new output sanitizing form to gtl, i.e., we would write
{{foo:color}}
which makes
sure foo
is safe to use as a color. This function can be
used to sanitize:
SAFE_COLOR_RE = re.compile(r"^#?[a-zA-Z0-9]*$") def _SanitizeColor(color): """Sanitizes a color, returning 'invalid' if it's invalid. A valid value is either the name of a color or # followed by the hex code for a color (like #FEFFFF). Returning an invalid value value allows a style sheet to specify a default value by writing 'color:default; color:{{foo:color}}'. """ if SAFE_COLOR_RE.match(color): return color return 'invalid'Colors aren't the only values we might want to allow users to provide. You should do similar sanitizing for user-provided fonts, sizes, urls, etc. It's helpful to do input validation, so that when a user enters an invalid value, you'll reject it at that time. But only doing input validation would be a mistake: if you find an error in your validation code or a new browser exposes a new attack vector, you'd have to go back and scrub all previously entered values. Or, you could add the output validation which you should have been doing in the first place.
Stored XSS via AJAX
Find an XSS attack that uses a bug in Gruyere's AJAX code. The attack should be triggered when you click the refresh link on the page.Hint 1
Run
curl
on http://google-gruyere.appspot.com/123/feed.gtl
and look at
the result. (Or browse to it in your browser and view source.) You'll
see that it includes each user's first snippet into the response. This
entire response is then evaluated on the client side which then
inserts the snippets into the document. Can you put something in your
snippet that will be parsed differently than expected?
Hint 2
Try putting some quotes (
"
) in your snippet.
Exploit and Fixes
To exploit, Put this in your snippet:
To escape quotes, use
Second, in the browser, Gruyere converts the JSON by using Javascript's
all <span style=display:none>" + (alert(1),"") + "</span>your baseThe JSON should look like
_feed(({..., "Mallory": "snippet", ...}))but instead looks like this:
_feed({..., "Mallory": "all <span style=display:none>" + (alert(1),"") + "</span>your base", ...})Each underlined part is a separate expression. Note that this exploit is written to be invisible both in the original page rendering (because of the
<span style=display:none>
) and after refresh (because it inserts only an empty string). All that will appear on the screen
is all your base. There are bugs on both the server
and client sides which enable this attack.
To fix, first, on the server side, the text is incorrectly
escaped when it is rendered in the JSON response. The template
says {{snippet.0:html}}
but that's not enough. This text
is going to be inserted into the innerHTML of a DOM node so the HTML
does have to be sanitized. However, that sanitized text is then going
to be inserted into Javascript and single and double quotes have to be
escaped. That is, adding support for {{...:js}}
to GTL
would not be sufficient; we would also need to support something
like {{...:html:js}}
.
To escape quotes, use
\x27
and \x22
for single and double quote respectively. Replacing them
with 
and "
is incorrect
as those are not recognized in Javascript strings and will break
quotes around HTML attribute.
Second, in the browser, Gruyere converts the JSON by using Javascript's
eval
. In general, eval
is very
dangerous and should rarely be used. If it used, it must be used very
carefully, which is hardly the case here. We should be using the JSON
parser which ensures that the string does not include any unsafe
content. The JSON parser is available
at json.org.
Reflected XSS via AJAX
Find a URL that when clicked on will execute a script using one of Gruyere's AJAX features.Hint 1
When Gruyere refreshes a user snippets page, it
uses
http://google-gruyere.appspot.com/123/feed.gtl?uid=valueand the result is the script
_feed((["user", "snippet1", ... ]))
Hint 2
This uses a different vulnerability, but the exploit is very similar
to the previous reflected XSS exploit.
Exploit and Fixes
To exploit, create a URL like the following and get a victim to click on it:
You should also always set the content type of your responses, in this case serving JSON results as
But wait, there's more! Gruyere doesn't set the content encoding either. And some browsers try to guess what the encoding type of a document is or an attacker may be able to embed content in a document that defines the content type. So, for example, if an attacker can trick the browser into thinking a document is
http://google-gruyere.appspot.com/123/feed.gtl?uid=<script>alert(1)</script> http://google-gruyere.appspot.com/123/feed.gtl?uid=%3Cscript%3Ealert(1)%3C/script%3EThis renders as
_feed((["<script>alert(1)</script>"]))which surprisingly does execute the script. The bug is that Gruyere returns all gtl files as content type
text/html
and browsers are very tolerant of what HTML files they accept.
To fix, you need to make sure that your JSON content can never
be interpreted as HTML. Even though literal <
and >
are allowed in Javascript strings, you need to
make sure they don't appear literally where a browser can misinterpret
them. Thus, you'd need to modify {{...:js}}
to replace
them with the Javascript escapes \x3c
and \x3e
. It is always safe to
write '\x3c\x3e'
in Javscript strings instead
of '<>'
. (And, as noted above, using the HTML
escapes <
and >
is incorrect.)
You should also always set the content type of your responses, in this case serving JSON results as
application/javascript.
This alone doesn't solve the
problem because browsers don't always respect the content type:
browsers sometimes do "sniffing" to try to "fix" results from servers
that don't provide the correct content type.
But wait, there's more! Gruyere doesn't set the content encoding either. And some browsers try to guess what the encoding type of a document is or an attacker may be able to embed content in a document that defines the content type. So, for example, if an attacker can trick the browser into thinking a document is
UTF-7
then it could embed a script tag as +ADw-script+AD4-
since +ADw-
and +AD4-
are alternate
encodings for <
and >
. So always set
both the content type and the content encoding of your
responses, e.g., for HTML:Content-Type: text/html; charset=utf-8
More about XSS
In addition to the XSS attacks described above, there are quite a few more ways to attack Gruyere with XSS. Collect them all!XSS is a difficult beast. On one hand, a fix to an XSS vulnerability is usually trivial and involves applying the correct sanitizing function to user input when it's displayed in a certain context. On the other hand, if history is any indication, this is extremely difficult to get right. US-CERT reports dozens of publicly disclosed XSS vulnerabilities involving multiple companies.
Though there is no magic defense to getting rid of XSS vulnerabilities, here are some steps you should take to prevent these types of bugs from popping up in your products:
- First, make sure you understand the problem.
- Wherever possible, do sanitizing via templates features instead of calling escaping functions in source code. This way, all of your escaping is done in one place and your product can benefit from security technologies designed for template systems that verify their correctness or actually do the escaping for you. Also, familiarize yourself with the other security features of your template system.
- Employ good testing practices with respect to XSS.
- Don't write your own template library :)
Post a Comment