Friday, December 29, 2006

A Quick Look at Cross Site Scripting

Here is one you might not have heard of: cross site scripting. With just a bit of JavaScript, a malicious attacker can use it to cause all sorts of problems. To find out more about what it is, and how to prevent your website from becoming a victim, keep reading.


Introduction

The question keeps spinning in our minds, just like a ball bouncing deeply inside the brain: is our website really secure? Surely, that’s a very tough topic to answer. But one thing is true in all cases: there are not any websites “completely” safe from attacks. Given the uncontrolled and anonymous nature of the Internet, the concept of a bulletproof website is merely a pipe dream.

More specifically, Web servers are inherently public machines, being accessible by many people around the world, and clearly exposed to several well-known attack techniques. The value of the information stored on servers varies widely, depending on what kind of sites they are hosting, but it’s always appealing to potential attackers. However, there is a lot that we can do about securing our website.

We are well aware of many attack methods which might end up exposing, modifying, or deleting sensitive data, so our site is well assured against them. Also, we have updated our software accordingly, stopped unnecessary services on the server, closed unused TCP ports, encrypted data, and the like. What else could be vulnerable? Many times, it’s not properly considered or ignored: assumptions made by developers.

Designers and programmers need to make many assumptions. Hopefully, they will document their assumptions and usually be right. Sometimes thought, developers will make poor assumptions. These might include that input data will be valid, will not include unusual characters or will be a fixed length. That brings us almost immediately to the well-known “SQL Injections,” widely documented in several articles on the Web, in conjunction with Cross Site Scripting attacks. Here is where this article comes in.

In the rest of the article, I'll cover what Cross Site Scripting is, how it works and how it can be avoided, increasing our site’s security level and, hopefully, bringing an overall improvement to our security strategy.

What is Cross Site Scripting?


To understand what Cross Site Scripting is, let’s see a usual situation, common to many sites. Let’s say we are taking some information passed in on a querystring (the string after the (?) character within a URL), with the purpose of displaying the content of a variable, for example, the visitor’s name:

http://www.yourdomain.com/welcomedir/welcomepage.php?name=John

As we can see in this simple querystring, we are passing the visitor’s name as a parameter in the URL, and then displaying it on our “welcomepage.php” page with the following PHP code:


echo ‘Welcome to our site ’ . stripslashes($_GET[‘name’]);

?>

The result of this snippet is shown below:

Welcome to our site John

This is pretty simple and straightforward. We’re displaying the content of the “name” variable, by using the $_GET superglobal PHP array, as we have done probably hundreds of times. Everything seems to be fine. Now, what’s wrong with this code? Nothing really. But let’s modify the querystring by replacing our visitor’s name passed in the URL:

http://www.yourdomain.com/welcomedir/
welcomepage.php?name=John

with something like this:

http://www.yourdomain.com/welcomedir/
welcomepage.php?name=
script language=javascript alert
(‘Hey, you are going to be hijacked!’); /script>

Do you remember the PHP code included in our “welcome.php” page? Yes, you’re correct. When we modify the querystring, the following code is executed:


echo ‘Welcome to our site ‘ .
script language=javascript alert(‘Hey, you are going
to be hijacked!’); /script
?>

The output of this code is an alert JavaScript box telling you “Hey, you are going be hijacked!” after the “Welcome to our site” phrase.

Very ugly stuff, right? That’s a simple example of the Cross Site Scripting vulnerability. This means that any pasted JavaScript code into the URL will be executed happily with no complaints at all.

Going deeper into JavaScript

Following the same concept above described, we might build a new URL for achieving more dangerous and annoying effects. It’s just a matter of including a little bit of JavaScript.

For instance:

http://www.yourdomain.com/welcomedir/welcomepage.php?
name= script language=javascript>window.location=
”http://www.evilsite.com”; /script>

It’s getting more complex now. As we can appreciate, a JavaScript redirection will take place to “www.evilsite.com”, just by including the above URL in the browser location bar. At first glance, it’s not as bad as it seems. After all, we haven’t seen anything that could significantly harm our website. But, is it really true? Let’s present a new example, which might quickly change your mind.

We’ll demonstrate how easy is to manipulate URLs and inject JavaScript into them, for malicious purposes.

For example:

http://www.yourdomain.com/welcomedir/welcomepage.php?
name= script language=javascript>setInterval
("window.open('http://www.yourdomain.com/','innerName')",100);
/script>

Now, let’s explain in detail what’s going on here. We have inserted JavaScript code to making a request for the http://www.yourdomain.com index page every 100 milliseconds. The setInterval() method is taking care of the task, but other JavaScript methods, such as setTimeout() with a recursive implementation would do the trick too. The code could either heavily overload the Web server where our site is located or generate a Denial of Service condition by denying access to other visitors requesting the same page (or other pages), and inflict noticeable damage to the server performance. On the other hand, it would be harmful to our website’s reputation, just because other users cannot get access to it. Not very good, huh?

Please note that a similar attack effect might be achieved by manipulating sockets with PHP or any other programming language, but that’s another huge subject, out of the scope of this article. Anyway, keeping your sharp eyes open to unusual levels of traffic is a must. So, don’t ever forget to take a look at your site’s logs files and use software for monitoring traffic and real time statistics.

Unfortunately, there are a huge number of ways to attack websites using Cross Site Scripting, embedding JavaScript code into the URL. From relatively innocent and harmless scripts, to risky and harmful code, we have to try to prevent or avoid them.

If this is not enough, we’ll see another common Cross Site Scripting technique: hiding JavaScript code within links.

The hidden link

Adding JavaScript code into querystrings is a quite easy stuff to get done, so the same concept is applied to regular links. This is easily deductible, since all of the previous examples presented have manipulated absolute links directly from the location bar. Thus, relative and absolute links within documents or email messages can be tampered too.

An example is useful to properly understand how this technique works:

a href=”http://www.yourdomain.com/welcomedir/
welcomepage.php?name= script language=javascript>window.location=’
http://www.evilsite.com’; /script>”>healthy food /a

If we take a deeper look at the code above listed, we can see clearly what’s going on. Within the regular link, the JavaScript code is inserted to redirect users to a completely different site. The expression seems to be an apparently innocent link, but it’s in fact hiding something else, the JavaScript embedded in the link.

We might send out this link to someone else, so our unworried recipient would click the link to find out a little more about healthy food, and instead being redirected to a different site location, getting something he or she would never expect to see.

Our site’s reputation could be seriously wounded, as we can fairly imagine, if someone is taking care of sending around our URL with the JavaScript code embedded in the link, to numerous recipients. That would result in the nasty redirecting effect previously described. And recipients wouldn’t be happy about it at all!

Having presented the most commonly used Cross Site Scripting techniques, we need to tackle a proper solution to avoid their ugly effects and prevent ourselves from becoming victims of them.

Let’s see how the problem can be solved.

Preventing Cross Site Scripting

First off, we need to follow simple and straight rules, applicable to common scenarios, where user input is always involved.

Always, all the time, and constantly (pick your term), check to ensure what’s coming from POST and GET requests. However obvious, you should never pass by these steps.

If a specific and particular type of data is expected, check to ensure that it’s a really valid type and that its of the expected length. Whatever programming language you’re using will give you the possibility and the power to do that easily.

Whenever possible, use client-side validation for adding extra functionality to user input checking. Please note that JavaScript validation cannot be used on its own for checking data validity, but it may help to discourage some evil-minded visitors from entering malicious data while providing useful assistance to other well-intended users.

Remove conflicting characters from user input. Search for < and > characters and make sure they're quickly removed. Single and double quotes must be escaped properly too. Many professional websites fail when dealing with character escaping. I hope you won’t.

We might go on endlessly, with numerous tips about validating user data, but you can get a lot more from just checking some other useful tutorials and articles. For the sake of this article, we’ll show an example to prevent Cross Site Scripting using PHP.

Coding for our safety

Let’s define a simple function to prevent the querysting from being tampered with external code. The function “validateQueryString()” is the following:


function validateQueryString ( $queryString , $min=1,
$max=32 ) {
if ( !preg_match ( "/^([a-zA-Z0-9]{".$min.",".$max."}=[a-zA-Z0-9]{".$min.",".$max."}&?)
+$/", $queryString ) ) {
return false;
}
return true;
}

?>

Once we have defined this function, we call it this way:


$queryString = $_SERVER[‘QUERY_STRING’];
if ( !validateQueryString ( $queryString ) ) {
header( ‘Location:errorpage.php’ );
}
else {
echo ‘Welcome to our site!’;
}

?>

Let’s break down the code to see it in detail.

The function performs pattern matching to the querystring passed as a parameter, checking to see if it matches the standard format of a querystring, including GET variable names that only contain the numbers 0-9 and valid letters either in lowercase or uppercase. Any other characters will be considered as invalid. Also, we have specified as a default value that variables can be from 1 to 32 characters long. If matches are not found, the function returns false. Otherwise, it will return true.

Next, we have performed validation on the querystring by calling the function. If it returns false -- that is, the querystring contains invalid characters -- the user will be taken to an error page, or whatever you like to do. If the function returns true, we just display a welcome message.

Of course, most of the time, we really know what variables to expect, so our validation function can be significantly simplified.

Given the previous URL,

http://www.yourdomain.com/welcomedir/
welcomepage.php?name=John

where the “name” variable is expected, we might write the new “validateAlphanum()” function:


function validateAlphanum( $value , $min = 1 , $max =
32 ) {
if ( !preg_match( "/^[a-zA-Z0-9]{".$min.",".$max."}
$/", $value ) ) {
return false;
}
return true;
}

?>

and finally validate the value like this:


$name = $_GET[‘name’];
if ( !validateAlphanum ( $name ) ) {
header( ‘Location:errorpage.php’ );
}
else {
echo ‘Welcome to our site!’;
}
?>

The concept is the same as explained above. The only noticeable difference is that we’re taking in the “name” variable as the parameter for the “validateAlphanum()” function and checking if it contains only the allowed characters 0-9, a-z and A-Z. Anything else will be considered an invalid input.

If you’re a strong advocate of object oriented programming, as I am, we might easily include this function as a new method for an object that performs user data validation. Something similar to this:


$name = $_GET[‘name’];
// get variable value
$dv = &new dataValidator();
// instantiate new data
validator object
if ( !$dv->validateAlphanum( $name ) ) {
// execute validation method
header( ‘Location:errorpage.php’ );
}
else {
echo ‘Welcome to our site!’;
}

?>


Pretty simple, isn’t it?

In order to avoid Cross Site Scripting, several approaches can be taken, whether procedural or object-oriented programming is your personal taste.

In both cases, we’ve developed specific functions to validate querystrings and avoid tampered or unexpected user input data, demonstrating that Cross Site Scripting can be prevented easily with some help coming from our favorite server-side language.

Conclusion

As usually, dealing with user input data is a very sensitive issue, and Cross Site Scripting falls under this category. It is a serious problem that can be avoided with some simple validation techniques, as we have seen through this article.

Building up robust applications that won’t make poor assumptions about visitor’s input is definitely the correct way to prevent Cross Site Scripting attacks and other harmful techniques. Client environments must always be considered as a pretty unsafe and unknown territory. So, for the sake of your website’s sanity and yours, keep your eyes open.