Automating Pikabot’s String Deobfuscation
Technical Analysis
Strings obfuscation
The steps for decrypting a Pikabot string are relatively simple. Each string is decrypted only when required (in other words, Pikabot does not decrypt all strings at once). Pikabot follows the steps below to decrypt a string:
- Pushes on the stack the encrypted string array.
- Initializes the RC4 encryption algorithm. The RC4 key is different for each string (with very few exceptions).
- Pikabot takes the decrypted RC4 output, decodes it using Base64 after replacing all instances of the character ‘
_
’ (underscore) with ‘=
’ (equal) and decrypts it using the AES-CBC algorithm. The AES key and initialization vector (IV) are the same for all strings.
ANALYST NOTE: There are encrypted strings, which are encrypted only with the RC4 algorithm.
Figure 1 shows the code used to decrypt the string, Kernel32.dll
.
Figure 1: Example Pikabot string decryption for Kernel32.dll
.
Figure 2 shows the function that first decrypts the AES key and IV. The RC4 decrypted string passed to the function is then Base64 decoded, and is finally decrypted using AES.
Figure 2: Pikabot Base64 decoding and AES decryption function.
Decrypting Pikabot strings
The following information is required to decrypt a Pikabot string:
- The AES key and IV of a binary sample.
- The RC4 encrypted array of each string.
- The RC4 key of each encrypted string.
- The string’s size.
Our approach relies on IDA’s microcode. This decision helped us with several problems such as:
- IDA’s microcode converts the assignment/copy of the RC4 key into a
strcpy
function. In the assembly level, this could either be multiplemov
orrep
instructions. As a result, it would make the detection and extraction harder and more challenging. - Extracting the RC4 encrypted array. Since IDA reconstructs the stack, it makes it much easier to search and extract the encrypted array.
IDA’s microcode brings other limitations (for example, decompilation failure for a function) but no such issues were encountered for the parts of the code we wanted to analyze.
In the sections below, we describe how each component was extracted.
Extracting the AES key/IV
For the extraction of the AES key and IV, we iterate all analyzed functions and discard any function, whose size is not in the range of 600 and 1,600 bytes.
Next, we scan the functions for the following patterns:
- Existence of RC4 encryption. This is the same heuristic we use for detecting encrypted RC4 strings.
- Existence of values 0x3D and 0x5F (used before Base64 decoding the string) that are used with microcode opcodes
m_stx
andm_jnz
respectively.
Lastly, if all of the patterns above match, then the handler for decrypting a Pikabot string is invoked. For the classification of the key and the IV, we apply the following checks:
- The number of decrypted strings from the identified function must be two. Otherwise, the identified function is incorrect.
- The longest string is marked as the AES key (by taking the first 32-bytes) and the remaining decrypted string as the IV (by taking the first 16-bytes).
Extracting the RC4 encrypted array
Pikabot constructs the RC4 encrypted array by pushing it onto the stack and then decrypting it. Our approach involves the following steps for detecting each part of the array:
- Use the detected RC4 encryption block address as a starting point.
- Search for the microcode opcode
m_add
in the decryption instruction. The detected microcode holds the starting stack offset of the encrypted array. - Start iterating backwards and search for the microcode opcodes
m_mov/m_call
, the second opcode is used in case the data is copied via astrcpy
ormemcpy
instruction. If the stack offset matches, then we save the data and update the stack offset. This process is repeated until the reconstructed encrypted array has the expected size.
Extracting the RC4 encrypted array size
The length of the encrypted array is extracted in a similar way as the encrypted array. The detection pattern is:
- Use the detected RC4 encryption block address as a starting point.
- Search for the microcode opcodes
m_jb
,m_jae
, andm_setb
, and use the immediate constant number in the instruction as a size.
Extracting the RC4 key
Extracting the RC4 key of each string proved to be the most challenging part while creating the plugin. In our first attempt, we were extracting the RC4 key after detecting the initialization of the RC4 algorithm. However, this approach had the following issues:
- Incorrect extraction of the RC4 key: In many cases, an invalid/junk string was placed in-between the correct RC4 key and the RC4 algorithm initialization.
- Incorrect detection of RC4 initialization code block: For example, if the size of the encrypted array was 256 bytes then an incorrect RC4 key would be detected.
Instead of trying to detect the RC4 key by detecting the initialization of the RC4 algorithm, we decided to extract all strings from each targeted function. Then, we decrypted the RC4 encrypted array with each extracted RC4 key and validated the decrypted output by applying the following checks:
- If it matches the expected string size.
- If all characters of the string are readable.
ANALYST NOTE: After successful decryption, the RC4 key is marked and not reused in order to limit any false-positives. For example, if the decrypted string does not have any junk characters.