Saturday, 17 October 2015

In Depth Windows: What is PE File Format?


Windows operating system is used by most of the Computer users out there but little do they know about how it works. Recently, I started to research deeper and deeper into Window's Internals for thorough Malware Analysis. One thing that holds utmost importance when It comes to digging deeper into Windows is the Portable Executable File Format, commonly known as the PE file format derived from Common Object File Format aka COFF.

PE is the native file format of Windows operating system, binaries (EXE, DLL, SYS, SCR) or even object files (BPL, DPL, CPL) use this format. Even NT's kernel mode drivers use PE file format. The reason why it is called "Portable Executable" refers to the fact that this file format is universal across all Win32 platforms. The loader on every Win32 platform recognizes this file format and loads the executable in memory despite the architecture of CPU being Intel or ARM or any other for that matter.

The knowledge of PE file format is useful to people who are trying to become Software Reverse Engineers, Better Programmers, and the know all kind. This knowledge is definitely necessary for people who are trying to write malicious softwares(those evil people).

As It was stated in Xeno Kovah's class: "The more you know about forward engineering, the more you know about reverse engineering."

In this guide, I will cover up only those things which are important and things that matter to programmers, reverse engineers, and security people. Furthermore, We will limit our study to PE32 to be specific but the concepts learned here can be easily applied to the 64bit / PE32+ binaries as well.

Following are the few tools that are freely available and can be used to see the structure of PE inside a Win32 binary:
  • PEView
  • CFF-Explorer
  • FileAlyzer
  • PEStudio
Enough with the Introduction, Let us move on to the real thing now.

Basic PE Structure

The structure of a PE file is same on both disk and memory, What I mean by that is that a PE file consists of the same structure as It is loaded into memory but make no mistake that It is not copied exactly into memory, the OS loader maps the file where It needs mapping. To best describe the structure of PE, It'd be best to put up a picture here so, I'm just going to copy that from somewhere and put it here(Shout out to Code Breaker's Magazine).

Here is the PE file format of PEView.exe viewed through the tool PEview:


Every PE file tends to start with the DOS header and It also contains a STUB program which is a valid application that is executed whenever a PE file is executed inside DOS prompting the user with the message "This Program cannot be run in DOS mode". IMAGE_DOS_HEADER basically consists of the following structure:
typedef struct _IMAGE_DOS_HEADER
     WORD e_magic;
     WORD e_cblp;
     WORD e_cp;
     WORD e_crlc;
     WORD e_cparhdr;
     WORD e_minalloc;
     WORD e_maxalloc;
     WORD e_ss;
     WORD e_sp;
     WORD e_csum;
     WORD e_ip;
     WORD e_cs;
     WORD e_lfarlc;
     WORD e_ovno;
     WORD e_res[4];
     WORD e_oemid;
     WORD e_oeminfo;
     WORD e_res2[10];
     LONG e_lfanew;

There are only two entries that are of interest to us and those are:
  1. e_magic
  2. e_lfanew
  • e_magic contains the value 4Dh and 5Ah (0x5A4D) which is basically the ASCII string 'MZ' which is referring to the name of the developer who wrote MS-DOS, named "Mark Zbikowski".
  • e_lfanew is the crucial part because It holds an offset to where PE header resides in the file. It can also be said to contain the pointer to the start of the NT Headers.


The e_lfanew value in the DOS header points to the start of this header which basically is the signature PE. This structure consists of two more structures or headers (if you prefer) embedded in it. Without further a due, Let us look at the structure of this header:
typedef struct _IMAGE_NT_HEADERS
     DWORD Signature;
     IMAGE_FILE_HEADER FileHeader;
     IMAGE_OPTIONAL_HEADER OptionalHeader;


So, what is of interest here? Everything in this structure.

Dword holds a hex value of 0x00004550 which is the ASCII equivalent of PE where 'P' is 0x50 in hex and 'E' is 0x45. As It can be seen, It stores the value in little-endian order.


The structure of this header looks like this:

typedef struct _IMAGE_FILE_HEADER 
       WORD  Machine;
       WORD  NumberOfSections;
       DWORD TimeDateStamp;
       DWORD PointerToSymbolTable;
       DWORD NumberOfSymbols;
       WORD  SizeOfOptionalHeader;
       WORD  Characteristics;

Following are the things that we care about from this structure:
  1. Machine
  2. NumberOfSections
  3. TimeDateStamp
  4. Characteristics
  • Machine field specifies what architecture this executable is supposed to run on. It does so by choosing one of these two values:
    • 0x14C  << This value indicates x86 / PE32 / 32 bit
    • 0x8664 << This value indicates x86_64 / PE32+ / 64 bit
  • NumberOfSections contains the total number of sections that are in the current file. We will talk about it later when we get to the sections part.
  • TimeDateStamp field specifies the date&time when this file was compiled and based on EPOC which is a Unix time stamp and refers to 1st January 1970, It counts using that. Pretty handy when you want to find out when was a file created (unless bad guys modify this entry).
  • Characteristics field specifies things like whether this file is an Executable, or this file is a DLL, or this file is a System File, or can this file handle addresses > 2gb
    Following are the few characteristics that can be set onto a PE file:

    • IMAGE_FILE_SYSTEM (0x1000)
    • IMAGE_FILE_DLL (0x2000)


Every Image file has an optional header that provides information to the loader. It is not at all 'optional', this header is optional in the sense that some files, object files to be specific, don't have it. But make no mistake, only the name says It is optional and It is very much required by the Image files to be loaded by the OS loader. This header consists of the following structure:
typedef struct _IMAGE_OPTIONAL_HEADER  
         WORD  Magic;
         BYTE  MajorLinkerVersion;
         BYTE  MinorLinkerVersion;
         DWORD  SizeOfCode;
         DWORD  SizeOfInitializedData;
         DWORD  SizeOfUninitializedData;
         DWORD  AddressOfEntryPoint;
         DWORD  BaseOfCode;
         DWORD  BaseOfData;
         DWORD  ImageBase;
         DWORD  SectionAlignment;
         DWORD  FileAlignment;
         WORD  MajorOperatingSystemVersion;
         WORD  MinorOperatingSystemVersion;
         WORD  MajorImageVersion;
         WORD  MinorImageVersion;
         WORD  MajorSubsystemVersion;
         WORD  MinorSubsystemVersion;
         DWORD  Win32VersionValue;
         DWORD  SizeOfImage;
         DWORD  SizeOfHeaders;
         DWORD  CheckSum;
         WORD  Subsystem;
         WORD  DllCharacteristics;
         DWORD  SizeOfStackReserve;
         DWORD  SizeOfStackCommit;
         DWORD  SizeOfHeapReserve;
         DWORD  SizeOfHeapCommit;
         DWORD  LoaderFlags;
         DWORD  NumberOfRvaAndSizes;
         IMAGE_DATA_DIRECTORY DataDirectory[];


 Things that we care about in this header are:
  1. Magic
  2. AddressOfEntryPoint
  3. ImageBase
  4. SectionAlignment
  5. FileAlignment
  6. SizeOfImage
  7. DllCharacteristics
  8. DataDirectory
  • Magic is what truly helps the OS Loader to determine whether the binary is a PE32 or PE32+. Earlier in the File header we had something similar called Machine but it only serves as the first indication and moreover both fields can differ from each other. This field holds one of these two values:

    • 0x10C = 32 bit, PE32
    • 0x10B = 64 bit, PE32+ 

  • AddressOfEntryPoint holds what is called an RVA(Relative Virtual Address) to the starting of the program code and when the OS Loader is done loading the binary into memory, It basically jumps to that RVA to start executing code.
  • ImageBase contains a memory location where the file wants to be mapped out in the memory If it is available. Not much of use since ASLR came along, If you do not know what ASLR is, feel free to look at another post on my blog covering ASLR.
  • SectionAlignment specifies that sections must be aligned on boundaries in memory which are multiples of this value. Usually, It is set to 0x1000 and we can expect to see the sections starting from 0x1000 and then 0x2000 and so on.
  • FileAlignment specifies that sections must be aligned according to this value on disk which are multiples of this value. It's the same thing as SectionAlignment but only have to do with Disk.
  • SizeOfImage contains the total amount of contiguous memory that must be reserved to load the binary into memory by OS Loader.
  • DllCharacteristics implements a certain list of security options like ASLR, DEP, SEH, etc. Few of these options are:


  • DataDirectory is the final entry in the Optional Header but It is actually an array of 16 IMAGE_DATA_DIRECTORY structures. Each array refers to a predefined item, such as the import table, export table. The structure has 2 members which contain the location and size of the data structures.
    typedef struct _IMAGE_DATA_DIRECTORY 
              DWORD VirtualAddress;
              DWORD Size;
    • VirtualAddress is the RVA(Relative Virtual Address) pointer to some other structure in the file.

    • Size is the size(in bytes to be specific) of the structure to which RVA points to.
    The 16 directories to which these structures refer are:


    Let us continue our discussion on the PE File Format. Now that we are done with talking about DOS Header, NT Header, File Header, and Optional Header, It's time we talked about Section Headers as well.
    So, What are Sections?


    Sections in a PE file are a group portion of Code or Data which have similar functionality or have the same memory read/write permissions.

    Some of the common section names that are found in most PE binaries are:
  • .text    =  Code Section which can not be paged to disk.
  • .data   =  Read/Write Section containing Global Data.
  • .rdata =  Read-Only Data usually Strings.
  • .bss     =  Contains Uninitialized Data (Usually gets merged with .data)
  • .idata  =  Contains information about Import Address Table.
  • .edata =  Contains information about Exports.
  • .pdata =  Helps in Debugging.
  • .reloc  =  Used for modifying hard coded addresses.
  • .rsrc   =  This section contains resources for the binary, like icons.
  • PAGE  =  Contains code/data that is allowable to page to disk.

Sections that are listed above are not necessarily to be found in all PE binaries. The names are actually irrelevant as they are ignored by the OS and are present only for the convenience of the programmer. As we discussed earlier, The total number of sections that are in a PE file can be found in the File Header's 2nd member called "NumberOfSections" but the names of these sections do not necessarily have to be the above ones; they can change and they do.

Look here:

The Section Table / Section Headers

Section Table basically consists of consecutive Section headers which is basically an array of IMAGE_SECTION_HEADER structure. The size of this array depends upon the number of sections in the PE file which can easily be accessed from the IMAGE_FILE_HEADER. If there happens to be 6 sections in a PE file, then there would be 6 duplicates of this structure in the table.


The structure of this header looks like this:

typedef struct _IMAGE_SECTION_HEADER
                   DWORD  PhysicalAddress;
                   DWORD  VirtualSize;
       } Misc;
       DWORD  VirtualAddress;
       DWORD  SizeOfRawData;
       DWORD  PointerToRawData;
       DWORD  PointerToRelocations;
       DWORD  PointerToLinenumbers;
       WORD     NumberOfRelocations;
       WORD     NumberOfLinenumbers;
       DWORD  Characteristics;


Things that we care about from this structure are:
  1.  Name
  2.  VirtualAddress
  3.  SizeOfRawData
  4.  PointerToRawData
  5.  Characteristics
  • Name is an array of 8 bytes to use as a label and can be left blank. Since, It's not an ASCII string, It's not null terminated.
  • VirtualAddress is the relative virtual address(RVA) of the section which is calculated using the ImageBase value in the Optional Header.
  • SizeOfRawData is the total size of section's Data in the file on disk using the FileAlignment entry that we talked about above.
  • PointerToRawData contains offset which points to the location of the data stored in a particular section calculated from the beginning of the file.
  • Characteristics tells you about the sections that are writable, readable, executable, shareable, initialized and uninitialized data.

I think this should cover up the basics of PE format. I have not covered everything in detail but this should get you started on the route. Head over to Microsoft's MSDN for further documentation on each of the headers.

No comments:

Post a Comment