Web Application Vulnerability Detection with Code Review
Web application source code, independent of languages and platforms, is a major source for vulnerabilities. One of the CSI surveys on vulnerability distribution suggests that 64% of the time, a vulnerability crops up due to programming errors and 36% of the time, due to configuration issues. According to IBM labs, there is a possibility of at least one security issue contained in every 1,500 lines of code. One of the challenges a security professional faces when assessing and auditing web applications is to identify vulnerabilities while simultaneously performing a source code review.
Several languages are popular for web applications, including Active Server Pages (ASP), PHP, and Java Server Pages (JSP). Every programmer has his own way of implementing and writing objects. Each of these languages has exposed several APIs and directives to make a programmer's life easy. Unfortunately, a programming language cannot offer any guarantee on security. It is the programmer's responsibility to ensure that his own code is secure against various attack vectors, some of which may be malicious in nature. On the other side, it is imperative to get the developed code assessed from a security standpoint, externally or in-house, prior to deploying the code on production systems. It's impossible to use only one tool to determine vulnerabilities residing in the source code, given the customized nature of applications and the many ways in which programmers can code. Source code review requires a combination of tools and intellectual analysis to determine exposure. The source code may be voluminous, running into thousands or millions of lines in some cases. It is not possible to go through each line of code manually in a short time span. This is where tools come into play. A tool can only help in determining information; it is the intellect--with a security mindset--that must link this information together. This dual approach is the one normally advocated for a source code review. To demonstrate automated review, I present a sample web application written in ASP.NET. I've produced a sample Python script as a tool for source code analysis. This approach can work to analyze any web application written in any language. It is also possible to write your own tool using any programming language. I've divided my method for approaching a code review exercise into several logical steps with specific objectives: Prior to commencing a code review exercise, you must understand the entire architecture and dependencies of the code. This understanding provides better overview and focus. One of the key objectives of this phase is to determine clear dependencies and to link them to the next phase. Figure 1 shows the overall architecture of a web shop in the case study under review. The application has several dependencies: With this information in place, you are in a better position to understand the code. To reiterate, the entire application is coded in C# and is hosted on a web server running IIS. This is the target. The next step is to identify entry points to the application. The objective of this phase is to identify entry points to the web application. A web application can be accessed from various sources (Figure 2). It is important to evaluate every source; each has an associated risk. These entry points provide information to an application. These values hit the database, LDAP servers, processing engines, and other components in the application. If these values are not guarded, they can open up potential vulnerabilities in the application. The relevant entry points are: These are the important entry points to the application in the case study. It is possible to grab certain key patterns in the submitted data using regular expressions from multiple files to trace and analyze patterns. scancode.py is a source code-scanning utility. It is simple Python script that automates the review process. This Python scanner has three functions with specific objectives: The Using the program is easy; it takes several switches to activate the previously described functions. The scanner script first imports Python's regex module: Importing this module makes it possible to run regular expressions against the target file: This line defines a regular expression--in this case, a search for the Now use scancode.py to scan the details.aspx file for possible entry points in the target code. Use the This is the entry point to the application, the place where the code stores Here is the function that grabs this information from the code: The code snippet shows the file being opened and the Discovering entry points narrows the focus for threat mapping and After locating these entry points to the application, you need to trace them and search for vulnerabilities. The previous scan found a Running the script with the This assigned a value from Here's another iteration; tracing Finally, this is the end of the trace. This example has shown multiple traces of a single page, but it is possible to traverse multiple pages across the application. Figure 3 shows the complete output. As the source code and figure show, there is no validation of input in the source. There is a SQL injection vulnerability: The application accepts Similarly, another line exposes a cross-site scripting (XSS) vulnerability: Throwing back the (unvalidated) The scripts This code review approach takes minimal effort by detecting entry points, vulnerabilities, and variable tracing. After you have identified a vulnerability, the next step is to mitigate the threat. There are various ways to do this, depending on your deployment. For example, it's possible to mitigate SQL injection by adding a rule to the web application firewall to bypass a certain set of characters such as single and double quotes. The best way to mitigate this issue is by applying secure coding practices--providing proper input validation before consuming the variable at the code level. At the SQL level, it is important to use either prepared statements or stored procedures to avoid SQL Code review is a very powerful tool for detecting vulnerabilities and getting to their actual source. This is the "whitebox" approach. Dependency determination, entry point identification, and threat mapping help detect vulnerability. All of these steps need architecture and code reviews. The nature of code is complex, so no single tool can meet all of your needs. As a professional, you need to write tools on the fly when doing code review and put them into action when the code base is very large. It is not feasible to go through each line of code. In this scenario, one of the methods is to start with entry points, as discussed earlier in this article. You can build complex scripts or programs in any language to grab various patterns in voluminous source code and link them together. Tracing the variable or function is the key that can show up the entire traversal and greatly help in determining vulnerabilities. http://www.oreillynet.com/pub/a/sysadmin/2006/11/02/webapp_security_scans.html?page=3Problem Domain
Assumption
Method and Approach
Dependency determination
Figure 1. Architecture for web application [webshop.example.com]Entry point identification
Figure 2. Web application entry pointsHTTP_REFERER
, etc). The ASPX application consumes this data through the Request
object. During a code review exercise, look for this object's usage.Scanning the code with Python
scanfile
function scans the entire file for specific security-related regex patterns:".*.[Rr]equest.*[^\n]\n" # Look for request object calls
".*.select .*?[^\n]\n|.*.SqlCommand.*?[^\n]\n" # Look for SQL execution points
".*.FileStream .*?[^\n]\n|.*.StreamReader.*?[^\n]\n" # Look for file system access
".*.HttpCookie.*?[^\n]\n|.*.session.*?[^\n]\n" # Look for
cookie and session information
"" # Look for dependencies in the application
".*.[Rr]esponse.*[^\n]\n" # Look for response object calls
".*.write.*[^\n]\n" # Look for information going back to browser
".*catch.*[^\n]\n" # Look for exception handlingscan4request
function scans the file for entry points to the application using the ASP.NET Request
object. Essentially, it runs the pattern ".*.[Rr]equest.*[^\n]\n"
.scan4trace
function helps analyze the traversal of a variable in the file. Pass the name of a variable to this function and get the list of lines where it is used. This function is the key to detecting application-level vulnerabilities.D:\PYTHON\scancode>scancode.py
Cannot parse the option string correctly
Usage:
scancode -
flag -sG : Global match
flag -sR : Entry points
flag -t : Variable tracing
Variable is only needed for -t option
Examples:
scancode.py -sG details.aspx
scancode.py -sR details.aspx
scancode.py -t details.aspx pro_id
D:\PYTHON\scancode>import re
p = re.compile(".*.[Rr]equest.*[^\n]\n")
Request
object. With this regex, the match()
method collects all possible instances of regex patterns in the file:m = p.match(line)
Looking for entry points
-sR
switch to identify entry points. Running it on the details.aspx page produces the following results:D:\PYTHON\scancode>scancode.py -sR details.aspx
Request Object Entry:
22 : NameValueCollection nvc=Request.QueryString;QueryString
information into the NameValue
collection set.def scan4request(file):
infile = open(file,"r")
s = infile.readlines()
linenum = 0
print 'Request Object Entry:'
for line in s:
linenum += 1
p = re.compile(".*.[Rr]equest.*[^\n]\n")
m = p.match(line)
if m:
print linenum,":",m.group()request
object grabbed using a specific regex pattern. This same approach can capture all other entry points. For example, here's a snippet to identify cookie- and session-related entry points:# Look for cookie and session management
p = re.compile(".*.HttpCookie.*?[^\n]\n|.*.session.*?[^\n]\n")
m = p.match(line)
if m:
print 'Session Object Entry:'Threat mapping and vulnerability detection
print linenum,":",m.group()
vulnerability detection. An entry point is essential to a trace. It is
important to unearth where this variable goes (execution flow) and its
impact on the application.Request
object entry in the application:22 : NameValueCollection nvc=Request.QueryString;
-t
option will help to trace the variables. (For full coverage, trace it right through to the end, using all possible iterations).D:\PYTHON\scancode>scancode.py -t details.aspx nvc
Tracing variable:nvc
NameValueCollection nvc=Request.QueryString;
String[] arr1=nvc.AllKeys;
String[] sta2=nvc.GetValues(arr1[0]);nvc
to sta2
, so that also needs a trace:D:\PYTHON\scancode>scancode.py -t details.aspx sta2
Tracing variable:sta2
String[] sta2=nvc.GetValues(arr1[0]);
pro_id=sta2[0];pro_id
:D:\PYTHON\scancode>scancode.py -t details.aspx pro_id
Tracing variable:pro_id
String pro_id="";
pro_id=sta2[0];
String qry="select * from items where product_id=" + pro_id;
response.write(pro_id);
Figure 3. Vulnerability detection with tracingString qry="select * from items where product_id=" + pro_id;
pro_id
and passes it as is to the SELECT
statement. It is possible to manipulate this statement and inject SQL payload.response.write(pro_id);
pro_id
to the browser provides a position for an attacker to inject JavaScript to be executed in the victim's browser.-sG
option executes the global search routine. This routine looks for file objects, cookies, exceptions, etc. Each has potential vulnerabilities, and this scan can help you to identify them and map them to the respective threats:D:\shreeraj_docs\perlCR>scancode.py -sG details.aspx
Dependencies:
13 :
Request Object Entry:
22 : NameValueCollection nvc=Request.QueryString;
SQL Object Entry:
49 : String qry="select * from items where product_id=" + pro_id;
SQL Object Entry:
50 : SqlCommand mycmd=new SqlCommand(qry,conn);
Response Object Entry:
116 : response.write(pro_id);
XSS Check:
116 : response.write(pro_id);
Exception handling:
122 : catch(Exception ex)Mitigation and Countermeasure
SELECT
statement injection. For mitigation of XSS vulnerabilities, it is imperative to filter out characters such as greater than (>) and less than (<) prior to serving any content to the end-client. These steps provide threat mitigation to the overall web application.Conclusion