As security consultants, we act as hired guns by our clients to perform black-box security testing of applications. Oftentimes we have to assess the security of applications that use their own proprietary schemes for communication, instead of relying on conventional protocols such as HTTP.
Recently we were faced with a short-term engagement that involved testing the security of a custom-built application that had its own communication protocol, wrapped around a SSL/TLS tunnel.
This post aims to describe the process of how we got around the technical limitations and time constraints imposed by this engagement and managed to successfully test and find bugs in the application via mutation-based fuzzing.
What are proprietary protocols?
Roughly speaking, a proprietary protocol is a protocol that is not defined by an open standard or controlled by an individual or company. In many cases there is no official documentation about their inner workings, making it difficult to communicate with applications that use closed protocols.
Needless to say, protocols vary in the degree of complexity and some are more complicated than others: for instance, a given protocol may be pretty straightforward — based on ASCII characters, easy to understand, and with additional fields that can be easily correlated to other elements of the message.
On the other hand, more complex protocols exist. These may involve different data representations, binary or serialized data, checksums, and value fields that are not obvious at first sight.
Despite being proprietary, some protocols have their specifications documented either by official documentation provided by the vendor (such as FInancial eXchange – FIX, widely used for stock trading) or have been reverse-engineered in the past by enthusiasts and open source advocates — a prime example of this is AOL’s OSCAR protocol for instant messaging.
Whilst reverse engineering of network protocols is possible, it is a cumbersome task that involves high doses of patience, skills, and, most importantly, time. Having all these elements in place is not always feasible, so we needed an alternative to solve the problem we had at hand.
Automated software security testing with fuzzing
Fuzzing is an area that has gained a lot of attention in the past few years and several more advanced approaches have emerged in both academic circles and industry.
Earlier fuzzing techniques can be roughly divided into mutation and generation based, although more advanced and modern approaches exist, such as evolutionary fuzzing, that taps on the feedback of code coverage and generates better inputs with the aid of genetic algorithms, and other techniques that rely on constraint solvers, concolic and symbolic execution, etc.
Mutation-based fuzzing is often referred to as “dumb fuzzing”, as what it does is to perform random mutations of the input and spit out mangled data as result. However, don’t be fooled by its name: dumb fuzzing can be very effective and has claimed responsibility for finding numerous bugs in popular software.
Another popular strategy is known as generation-based fuzzing. This technique involves prior knowledge of the protocol or file format being fuzzed and it generates test cases that mostly comply with the protocol, but with some fields containing data known to cause unexpected behavior, such as large strings, malicious input containing shell metacharacters, negative, very long or subnormal numbers and more.
Benefits and shortcomings
Mutation-based fuzzing has the advantage of being quick to set up and can actually give good results; however, it may not excel at achieving high code coverage (your test samples will play a key role here), or in situations where checksums need to be adjusted before being sent to the application for processing — the application will most likely reject the rest of the input altogether as checksums will fail, not reaching potentially vulnerable areas of the code.
Generation-based is more likely to have greater code coverage, at the expense of being more time-consuming to be created. Also, the creativity of the fuzzing oracle and how clever the mutation and data mangling heuristics are will determine its success at finding bugs.
Our case and the approach we took
We were lucky the protocol we had to deal with, albeit proprietary, was ASCII-based and did not seem to be overly complicated. However, spending time to understand it and create a generation-based fuzzer to spit out test cases was out of question given time limitations. Moreover, why spend precious time writing our own mutation engine when there are very good ones out there?
Also related to the protocol itself: as there was no need to calculate checksums or to comply with other strict protocol structures, we opted to fuzz using a mutation-based approach.
For our engagement, all we had in hand were packet captures (PCAPs) of plaintext traffic between the client and the application we wanted to test.
We took care to make sure we would get several sample PCAPs for different functionalities and usages of the application; this way we can guarantee our fuzzer will stress, with weird test cases, different areas of the available attack surface of the application, increasing code coverage and consequently our chances to find bugs.
As in this case, all traffic between the client and the application was wrapped in TLS/SSL, it was very important to ask for captures of non-encrypted traffic, as to fuzz the application we required application-level communication only, not transport or upper layer traffic.
Additionally, the protocol we had to fuzz worked as some sort of a state machine: after the handshake message (which could be fuzzed too), it would expect a few other messages, in some order, for further processing.
That means that if we sent the handshake and subsequently sent a malformed message when it was expecting, for instance, a login message, we would not be able to fuzz anything past the login, as it was rejected prior.
To get around this we added some more randomness to our fuzzer: only a certain percentage of the payloads within the PCAP would be mutated. This would give us more permutations and the ability to send clean handshake and N-1 messages, but with only the last (N message) fuzzed; or fuzzed handshake and clean messages afterward, or clean handshake and malformed login, and clean further messages, etc.
Tools of the trade
In order to put together a Python script that could perform the task described above, we used:
- Scapy: a library that serves as a swiss-knife for all things related to network and can parse, read, write and replay data from PCAPs.
- radamsa: a popular mutation-based fuzzing tool and weapon of choice of many security researchers.
Technical bits and pieces: putting it together
After outlining the general idea of what we wanted to do, which fuzzing strategy best suits our case and the tools required for the task, it’s now time to do a rough breakdown of the steps we used to perform simple dumb fuzzing of the application:
- Step 0: Define a fuzz factor — example, only 20% of the packets will be mutated
- Step 1: Parse the PCAP looking for client -> server communication only
- Step 2: Out of these captures, find the ones that contain an actual payload, as PCAPs will sometimes contain communication that isn’t relevant for our
purposes, such as TCP 3-way handshakes and other signaling
- Step 3: Enter in an infinite loop to:
- Step 3.1: Extract the payload, and randomly decide if we should fuzz it based on the fuzz factor; if so, pipe it into the mutation engine
- Step 3.2: Get the mutated data
- Step 3.3: Send the mangled payload (or clean payload, depending on if it was fuzzed or not) to the target application
- Step 3.4: Wash, rinse, and repeat. Leave it overnight.
- Step 4: Watch it for crashes and perform crash triaging, if applicable
In fact, step 4 was not within our reach: as the engagement took place from a pure black-box perspective, there was no instrumentation and not even observation for crashes on our part. The client was kind enough to verify for crashes themselves, and our fuzzing script checked connectivity to the target after sending each mangled test case. While it was slow, it gave us some degree of visibility of whether the target application crashed or not.
Later on, during the engagement we were informed by the client that the fuzzer triggered four crashes; three of them were unique. We did not have enough time to perform any sort of crash analysis, but the fuzzed test cases triggered unhandled exceptions and could severely impact the availability of a service that is expected to be always online.
An open-source implementation of the steps outlined above can be found in our Github. It is important to say it serves only to illustrate the ideas discussed in this blog post. The sample code can hardly be used out of the box against actual applications, but with some modification, it may suit different needs.
Even in its simplest form, fuzzing can be a very useful tool for uncovering vulnerabilities and should be in the repertoire of every information security engineer.
For testing applications that use proprietary or non-documented protocols, sample PCAPs of different forms of usage of the target application are very useful to test a larger attack surface and improve the chances to find bugs. A good fuzzing engine that has creative and unconventional heuristics is also necessary to mutate sample test cases that may trigger subtle bugs and reach corner cases buried deep within the code’s logic.
2. AOL’s Oscar Protocol
3. Babysitting an army of monkeys – presentation at CanSecWest 2010 by Charlie Miller