Malware/Software Analysis
From CSRRT-LU
| Table of contents |
Software analysis
Analyzing software is strongly related with reverse engineering.
Description of software
Two major software categories can be defined
- standalone
- distributed
Each category can be subdivided into
- Straightforward code, simple machine instructions. Some typical example are COM files on DOS systems.
- Compressed code. Some binaries are compressed. During execution they first decompress data which contains code that is executed afterwards. A typical example is UPX [1] (http://upx.sf.net).
- Polymorphic code
- Code with junk code
- Code included in PE Code
- Code included in ELF
This project focuses on standalone win32 binaries wrapped in PE.
Behavior of a piece of software
How to determine what a piece of software is doing? A piece of software can modify some data, create some data, and remove some data at a given time. Another behavior is that it can only perform some operations based on conditions.
Techniques to analyze software
- simulate
- straightforward
For the simulation of software what happens if the initial conditions are not fulfilled? The code is not executed and no behavior can be shown. For the straightforward analysis what happens when a piece of software is encoded? It cannot be understood by looking plainly at the instructions. Even there are some obstacles to find the useful code in executable files, headers can be corrupted and other problems can merge. Another question is how to detect whether a binary is compressed or encoded? Does the software changes?
These are only a few questions for software analysis so you can imagine to analyze software manually is a tedious, experimental, error-prone task. Furthermore binary data is not very human readable. There are a lot of tools to help analyze software, but each piece of software has its specialization. In order to simplify this task a process can be imagined that tries to help the analyst to understand the behavior of a piece of software. Furthermore to discover a process that can be used in daily life, some software has to be analyzed manually, discussed, and understood. The different ways that has been explored are documented on this wiki. Finally a part of the process can be implemented.
Which approach should be taken?
The static analysis and the dynamic analysis have advantages and disadvantages. As it is described on http://lida.sf.net [2] (http://lida.sf.net) a disassembler can be easily fooled. In that case the disassembler shows instructions that are not executed. The misinterpretation or information lost is quite high. In case of packed or encoded binaries the output and drawn conclusions are completely wrong. When some code is executed based on an initial condition a plain simulation gives wrong results. So the two approaches can be combined, the static analysis for detecting the conditions and the simulation for the behavior.

