Disclaimer: Opinions expressed are solely my own. None of the ideas expressed in this blog post are shared, supported, or endorsed in any manner by my employer.

Introduction

Python is high-level, dynamically typed, portable and interpreted language which is often used for scripting. Python 2 was discounted with version 2.7.18. Currently, Python 3 is used with version 3.10.5 being the latest. 

When Python source code is executed, it is compiled to byte code which are often stored with .pyc extension. In case, it is not able to write to the machine, the byte code is generated in the memory and then discarded after the program exists. 

Once the byte code is created, it is executed by Python Virtual Machine (PVM) which is technically a big loop that iterated through byte code instructions and executes it. PVM is runtime engine of python which is present in Python system. 

It is important to note that byte code is Python-specific representation and is platform-independent. However, byte code instruction can change depending upon the version of python. You can read about byte code instructions here: https://docs.python.org/3/library/dis.html#python-bytecode-instructions

Implementation of Python

A few popular implementation of Python: 

  • CPython 
CPython is the default and standard implementation. In most machines, preinstalled version of Python would be CPython. Do not confuse it with Cython which allows one to write C extension for Python. 
  • Jython
Jython or JPython is the implementation of Python in Java. One can compile Python source code to Java bytecode which is then executed by Java Virtual Machine (JVM). Jython allows the programmers to use Java class files along with Python code. Unfortunately, it only support Python 2.7. 
  • IronPython
IronPython is the implementation of Python in .NET framework. Programmers can use C# along with Python. One can compile Python source code to IL which is then executed by CLR. Optionally, IronPython can compile to assemblies, which can be saved to disk and used to make binary-only distributions of applications. Currently, it support Python 2.7. The work is going on to support Python 3. 
  • PyPy
PyPy is the implementation of Python written in Python. The interpreter is written in RPython. It uses just-in-time compilation which makes it faster than default implementation of python (Cpython). 

https://www.python.org/download/alternatives/ contains the list of alternative python implementations. 

Creating executables

With the help of third-party tools, it is possible to create executables using frozen binaries. These files often contain the byte code of the program files, PVM and other required python libraries. Because of this, the size of executable is unusually large. 

The following third-party tools can be used: 
  • Pyinstaller (most popular one) 
  • Py2exe (Good for Python 2) 
  • Py2App (If target is OS X)
  • bbfreeze 
  • cx_freeze 
For more information, refer this page: https://docs.python-guide.org/shipping/freezing/ 

Reversing an executable created using Pyinstaller

Let's take a look at the sample (MD5: 79abb39081305740a833146200d0228c, SHA256: 4a70b909dbe668d0d2c5241dc582acb90c8820acb436a1ecbb620019e93fbda8 ) available at https://bazaar.abuse.ch/sample/4a70b909dbe668d0d2c5241dc582acb90c8820acb436a1ecbb620019e93fbda8/ 

Identifying executable created using pyinstaller

The exe file has pyinstaller icon and has a size of 17293300 bytes which is quite large. Running a string commands shows a few interesting strings: 


It seems to suggest that it is using python37.dll , crypto libraries and pyinstaller. The presence of the string python37.dll suggested that it uses python 3.7 interpreter. 

In the section: https://pyinstaller.org/en/stable/operating-mode.html#how-the-one-file-program-works, it says that pyinstaller creates a temporary folder named _MEIxxxxx where xxxxx is a random number. In fact, when  the program is executed, it creates a temp folder with _MEIxxxxx. Also, in IDA, The following code snippet uses _MEIxxxx: 


This confirms that the exe is created using pyinstaller which means that it is possible to get the python source code. Refer: https://pyinstaller.org/en/stable/operating-mode.html#hiding-the-source-code 

Extracting files from executables

There is a frequently updated github project  which can used to extract contents of the exe file:  https://github.com/extremecoders-re/pyinstxtractor 

Upon executing the script, it extracts following items: 



Most of the files are compiled bytecodes of python libraries. The folder PYZ-00.pyz_extracted contains compressed and encrypted bytecodes. 

A few words about pyc file

The header of a pyc file differs based on the python version: 


Python 2.7: \x03\xf3\x0d\x0a\0\0\0\0
Python 3.0: \x3b\x0c\x0d\x0a\0\0\0\0
Python 3.1: \x4f\x0c\x0d\x0a\0\0\0\0
Python 3.2: \x6c\x0c\x0d\x0a\0\0\0\0
Python 3.3: \x9e\x0c\x0d\x0a\0\0\0\0\0\0\0\0
Python 3.4: \xee\x0c\x0d\x0a\0\0\0\0\0\0\0\0
Python 3.5: \x17\x0d\x0d\x0a\0\0\0\0\0\0\0\0
Python 3.6: \x33\x0d\x0d\x0a\0\0\0\0\0\0\0\0
Python 3.7: \x42\x0d\x0d\x0a\0\0\0\0\0\0\0\0\0\0\0\0
Python 3.8: \x55\x0d\x0d\x0a\0\0\0\0\0\0\0\0\0\0\0\0
Python 3.9: \x61\x0d\x0d\x0a\0\0\0\0\0\0\0\0\0\0\0\0
Python 3.10: \x6f\x0d\x0d\x0a\0\0\0\0\0\0\0\0\0\0\0\0

Compiled bytecodes are stored in pyc file using marshal module. Read more here : https://docs.python.org/3/library/marshal.html 

Reading and disassembling .pyc files

pyc file can be read and disassembled by using dis and marshal module in python:
 

To further investigate disassembled byte code, refer the following articles: 

There is a tool called Decompyle++ which has functionality to print the disassembled byte code with more details.  

Using the command pycdas , disassembled bytecode with additional details can be obtained:

 

Decompiling .pyc file

Taking a look at the files extracted from the sample, the main file is extrack.pyc.  

extrack.pyc can be decompiled using any one of the following tools: 
A few caveats : 
  • These tools will not work 100% all the time. 
  • Currently, there is no support for python version 3.9 and 3.10. Decompyle++ has partial support.
  • Output from uncompyl6 and decompile3 will be similar, if not same in most cases. However, feel free to try both in case of errors. 

Output from Decompyle++

Decompyle++ project contains pycdas which is a python disassembler and pycdc which is a python decompiler. Running pycdc on the file extract.py, it is clear that decompilation is incomplete for almost all the function: 

Output from uncompyle6/decompile3

Using uncompyle6 or decompile3, the following decompiled code can be obtained with some decompilation errors: 


One important thing to note is that the use of file called passwordstealer. It doesn't seem to be a standard python library.  PYZ-00.pyz_extracted contains the passwordstealer byte code file but it is compressed and encrypted.  

Extracting compressed and encrypted bytecodes. 

To decrypt the file, the key is needed which is stored in the file name: pyimod00_crypto_key.pyc . Decompile the file using uncompyl6 to find the key for decryption. 

Now, decryption and decompression routine is required which is present in the file named: pyimod02_archive.pyc. Decompile the file using uncompyl6 to get the routine  shown in the following code snippet:
 

The class zlibarchivereader has extract method which when implemented in a python script can be used to get the decrypted and decompressed passwordstealer file using the key found earlier.

Finally, using uncompyl6, decompiled passwordstealer.pyc would be obtained:


From reading the decompiled code, it becomes clear that the sample is a stealer malware. 


Final points:

  • The blog doesn't mention the project: Nuikta which is source-to-source compiler that compiles python source code to C. It also has a commercial offering that claims to be effective against reverse engineering. Perhaps, in a future blog , executables created using this project will be examined.
  • Similar, Pyarmor is a project that is used for creating obfuscated python scripts which will be examined in a future blog. 
  • Currently, decompilation tools for python bytecodes are not fully mature. Deducing the nature and functionality of the sample from disassembled bytecode is the best option. 

Have a good day!