x86 Disassembly/Windows Executable Files

< X86 Disassembly

MS-DOS COM Files

COM files are loaded into RAM exactly as they appear; no change is made at all from the harddisk image to RAM. This is possible due to the segmented memory model of the early x86 line. Two 16-bit registers determine the actual address used for a memory access, a “segment” register specifying a 64K byte window into the 1M+64K byte space (in 16-byte increments) and an “offset” specifying an offset into that window. The segment register would be set by DOS and the COM file would be expected to respect this setting and not ever change the segment registers. The offset registers, however, were fair game and served (for COM files) the same purpose as a modern 32-bit register. The downside was that the offset registers were only 16-bit and, therefore, since COM files could not change the segment registers, COM files were limited to using 64K of RAM. The good thing about this approach, however, was that no extra work was needed by DOS to load and run a COM file: just load the file, set the segment register, and jump to it. (The programs could perform 'near' jumps by just giving an offset to jump to.)

COM files are loaded into RAM at offset $100. The space before that would be used for passing data to and from DOS (for example, the contents of the command line used to invoke the program).

Note that COM files, by definition, cannot be 32-bit. Windows provides support for COM files via a special CPU mode.

MS-DOS EXE Files

One way MS-DOS compilers got around the 64K memory limitation was with the introduction of memory models. The basic concept is to cleverly set different segment registers in the x86 CPU (CS, DS, ES, SS) to point to the same or different segments, thus allowing varying degrees of access to memory. Typical memory models were:

tiny
All memory accesses are 16-bit (segment registers unchanged). Produces a .COM file instead of an .EXE file.
small
All memory accesses are 16-bit (segment registers unchanged).
compact
Data addresses include both segment and offset, reloading the DS or ES registers on access and allowing up to 1M of data. Code accesses don't change the CS register, allowing 64K of code.
medium
Code addresses include the segment address, reloading CS on access and allowing up to 1M of code. Data accesses don't change the DS and ES registers, allowing 64K of data.
large
Both code and data addresses are (segment, offset) pairs, always reloading the segment addresses. The whole 1M byte memory space is available for both code and data.
huge
Same as the large model, with additional arithmetic being generated by the compiler to allow access to arrays larger than 64K.

When looking at an EXE file, one has to decide which memory model was used to build that file.

PE Files

A Portable Executable (PE) file is the standard binary file format for an Executable or DLL under Windows NT, Windows 95, and Win32. The Win32 SDK contains a file, winnt.h, which declares various structs and variables used in the PE files. Some functions for manipulating PE files are also included in imagehlp.dll. PE files are broken down into various sections which can be examined.

Relative Virtual Addressing (RVA)

In a Windows environment, executable modules can be loaded at any point in memory, and are expected to run without problem. To allow multiple programs to be loaded at seemingly random locations in memory, PE files have adopted a tool called RVA: Relative Virtual Addresses. RVAs assume that the "base address" of where a module is loaded into memory is not known at compile time. So, PE files describe the location of data in memory as an offset from the base address, wherever that may be in memory.

Some processor instructions require the code itself to directly identify where in memory some data is. This is not possible when the location of the module in memory is not known at compile time. The solution to this problem is described in the section on "Relocations".

It is important to remember that the addresses obtained from a disassembly of a module will not always match up to the addresses seen in a debugger as the program is running.

File Format

The PE portable executable file format includes a number of informational headers, and is arranged in the following format:

The basic format of a Microsoft PE file

MS-DOS header

Open any Win32 binary executable in a hex editor, and note what you see: The first 2 letters are always the letters "MZ", the initials of Mark Zbikowski, who created the first linker for DOS. To some people, the first few bytes in a file that determine the type of file are called the "magic number," although this book will not use that term, because there is no rule that states that the "magic number" needs to be a single number. Instead, we will use the term "File ID Tag", or simply, File ID. Sometimes this is also known as File Signature.

After the File ID, the hex editor will show several bytes of either random-looking symbols, or whitespace, before the human-readable string "This program cannot be run in DOS mode".

What is this?

Hex Listing of an MS-DOS file header

What you are looking at is the MS-DOS header of the Win32 PE file. To ensure either a) backwards compatibility, or b) graceful decline of new file types, Microsoft has written a series of machine instructions(an example program is listed below the DOS header structure) into the head of each PE file. When a 32-bit Windows file is run in a 16-bit DOS environment, the program will display the error message: "This program cannot be run in DOS mode.", then terminate.

The DOS header is also known by some as the EXE header. Here is the DOS header presented as a C data structure:

 struct DOS_Header 
 {
// short is 2 bytes, long is 4 bytes
     char signature[2] = "MZ";
     short lastsize;
     short nblocks;
     short nreloc;
     short hdrsize;
     short minalloc;
     short maxalloc;
     void *ss;
     void *sp;
     short checksum;
     void *ip;
     void *cs;
     short relocpos;
     short noverlay;
     short reserved1[4];
     short oem_id;
     short oem_info;
     short reserved2[10];
     long  e_lfanew;
 }

After the DOS header there is a stub program mentioned in the paragraph above the DOS header structure. Listed below is a commented example of that program, it was taken from a program compiled with GCC.

;# Using NASM with Intel syntax

push cs ;# Push CS onto the stack
pop ds  ;# Set DS to CS

mov dx,message ; point to our message "This program cannot be run in DOS mode.", 0x0d, 0x0d, 0x0a, '$'
mov ah, 09  
int 0x21 ;# when AH = 9, DOS interrupt to write a string

;# terminate the program
mov ax,0x4c01 
int 0x21

message db "This program cannot be run in DOS mode.", 0x0d, 0x0d, 0x0a, '$'

PE Header

At offset 60 (0x3C) from the beginning of the DOS header is a pointer to the Portable Executable (PE) File header (e_lfanew in MZ structure). DOS will print the error message and terminate, but Windows will follow this pointer to the next batch of information.

Hex Listing of a PE signature, and the pointer to it

The PE header consists only of a File ID signature, with the value "PE\0\0" where each '\0' character is an ASCII NUL character. This signature shows that a) this file is a legitimate PE file, and b) the byte order of the file. Byte order will not be considered in this chapter, and all PE files are assumed to be in "little endian" format.

The first big chunk of information lies in the COFF header, directly after the PE signature.

COFF Header

The COFF header is present in both COFF object files (before they are linked) and in PE files where it is known as the "File header". The COFF header has some information that is useful to an executable, and some information that is more useful to an object file.

Here is the COFF header, presented as a C data structure:

 struct COFFHeader
 {
    short Machine;
    short NumberOfSections;
    long TimeDateStamp;
    long PointerToSymbolTable;
    long NumberOfSymbols;
    short SizeOfOptionalHeader;
    short Characteristics;
 }
Machine 
This field determines what machine the file was compiled for. A hex value of 0x14C (332 in decimal) is the code for an Intel 80386.

Here's a list of possible values it can have.

Value Description
0x14c Intel 386
0x8664 x64
0x162 MIPS R3000
0x168 MIPS R10000
0x169 MIPS little endian WCI v2
0x183 old Alpha AXP
0x184 Alpha AXP
0x1a2 Hitachi SH3
0x1a3 Hitachi SH3 DSP
0x1a6 Hitachi SH4
0x1a8 Hitachi SH5
0x1c0 ARM little endian
0x1c2 Thumb
0x1d3 Matsushita AM33
0x1f0 PowerPC little endian
0x1f1 PowerPC with floating point support
0x200 Intel IA64
0x266 MIPS16
0x268 Motorola 68000 series
0x284 Alpha AXP 64-bit
0x366 MIPS with FPU
0x466 MIPS16 with FPU
0xebc EFI Byte Code
0x8664 AMD AMD64
0x9041 Mitsubishi M32R little endian
0xc0ee clr pure MSIL
NumberOfSections 
The number of sections that are described at the end of the PE headers.
TimeDateStamp 
32 bit time at which this header was generated: is used in the process of "Binding", see below.
SizeOfOptionalHeader 
this field shows how long the "PE Optional Header" is that follows the COFF header.
Characteristics 
This is a field of bit flags, that show some characteristics of the file.
Value Description
0x02 Executable file
0x200 file is non-relocatable (addresses are absolute, not RVA)
0x2000 File is a DLL Library, not an EXE

PE Optional Header

The "PE Optional Header" is not "optional" per se, because it is required in Executable files, but not in COFF object files. The Optional header includes lots and lots of information that can be used to pick apart the file structure, and obtain some useful information about it.

The PE Optional Header occurs directly after the COFF header, and some sources even show the two headers as being part of the same structure. This wikibook separates them out for convenience.

Here is the PE Optional Header presented as a C data structure:

 struct PEOptHeader
 {
/*
char is 1 byte
short is 2 bytes
long is 4 bytes
*/
    short signature; //decimal number 267 for 32 bit, 523 for 64 bit, and 263 for a ROM image. 
    char MajorLinkerVersion; 
    char MinorLinkerVersion;
    long SizeOfCode;
    long SizeOfInitializedData;
    long SizeOfUninitializedData;
    long AddressOfEntryPoint;  //The RVA of the code entry point
    long BaseOfCode;
    long BaseOfData;
    /*The next 21 fields are an extension to the COFF optional header format*/
    long ImageBase;
    long SectionAlignment;
    long FileAlignment;
    short MajorOSVersion;
    short MinorOSVersion;
    short MajorImageVersion;
    short MinorImageVersion;
    short MajorSubsystemVersion;
    short MinorSubsystemVersion;
    long Win32VersionValue;
    long SizeOfImage;
    long SizeOfHeaders;
    long Checksum;
    short Subsystem;
    short DLLCharacteristics;
    long SizeOfStackReserve;
    long SizeOfStackCommit;
    long SizeOfHeapReserve;
    long SizeOfHeapCommit;
    long LoaderFlags;
    long NumberOfRvaAndSizes;
    data_directory DataDirectory[NumberOfRvaAndSizes];     //Can have any number of elements, matching the number in NumberOfRvaAndSizes.
 }                                        //However, it is always 16 in PE files.
/*
long is 4 bytes
*/
 struct data_directory
 { 
    long VirtualAddress;
    long Size;
 }
Signature 
Contains a signature that identifies the image.
Constant Name Value Description
IMAGE_NT_OPTIONAL_HDR32_MAGIC 0x10b 32 bit executable image.
IMAGE_NT_OPTIONAL_HDR64_MAGIC 0x20b 64 bit executable image
IMAGE_ROM_OPTIONAL_HDR_MAGIC 0x107 ROM image
MajorLinkerVersion 
The major version number of the linker.
MinorLinkerVersion 
The minor version number of the linker.
SizeOfCode 
The size of the code section, in bytes, or the sum of all such sections if there are multiple code sections.
SizeOfInitializedData 
The size of the initialized data section, in bytes, or the sum of all such sections if there are multiple initialized data sections.
SizeOfUninitializedData 
The size of the uninitialized data section, in bytes, or the sum of all such sections if there are multiple uninitialized data sections.
AddressOfEntryPoint 
A pointer to the entry point function, relative to the image base address. For executable files, this is the starting address. For device drivers, this is the address of the initialization function. The entry point function is optional for DLLs. When no entry point is present, this member is zero.
BaseOfCode 
A pointer to the beginning of the code section, relative to the image base.
BaseOfData 
A pointer to the beginning of the data section, relative to the image base.
ImageBase 
The preferred address of the first byte of the image when it is loaded in memory. This value is a multiple of 64K bytes. The default value for DLLs is 0x10000000. The default value for applications is 0x00400000, except on Windows CE where it is 0x00010000.
SectionAlignment 
The alignment of sections loaded in memory, in bytes. This value must be greater than or equal to the FileAlignment member. The default value is the page size for the system.
FileAlignment 
The alignment of the raw data of sections in the image file, in bytes. The value should be a power of 2 between 512 and 64K (inclusive). The default is 512. If the SectionAlignment member is less than the system page size, this member must be the same as SectionAlignment.
MajorOSVersion 
The major version number of the required operating system.
MinorOSVersion 
The minor version number of the required operating system.
MajorImageVersion 
The major version number of the image.
MinorImageVersion 
The minor version number of the image.
MajorSubsystemVersion 
The major version number of the subsystem.
MinorSubsystemVersion 
The minor version number of the subsystem.
Win32VersionValue 
This member is reserved and must be 0.
SizeOfImage 
The size of the image, in bytes, including all headers. Must be a multiple of SectionAlignment.
SizeOfHeaders 
The combined size of the following items, rounded to a multiple of the value specified in the FileAlignment member.
CheckSum 
The image file checksum. The following files are validated at load time: all drivers, any DLL loaded at boot time, and any DLL loaded into a critical system process.
Subsystem 
The Subsystem that will be invoked to run the executable
Constant Name Value Description
IMAGE_SUBSYSTEM_UNKNOWN 0 Unknown subsystem
IMAGE_SUBSYSTEM_NATIVE 1 No subsystem required (device drivers and native system processes)
IMAGE_SUBSYSTEM_WINDOWS_GUI 2 Windows graphical user interface (GUI) subsystem
IMAGE_SUBSYSTEM_WINDOWS_CUI 3 Windows character-mode user interface (CUI) subsystem
IMAGE_SUBSYSTEM_OS2_CUI 5 OS/2 CUI subsystem
IMAGE_SUBSYSTEM_POSIX_CUI 7 POSIX CUI subsystem
IMAGE_SUBSYSTEM_WINDOWS_CE_GUI 9 Windows CE system
IMAGE_SUBSYSTEM_EFI_APPLICATION 10 Extensible Firmware Interface (EFI) application
IMAGE_SUBSYSTEM_EFI_BOOT_SERVICE_DRIVER 11 EFI driver with boot services
IMAGE_SUBSYSTEM_EFI_RUNTIME_DRIVER 12 EFI driver with run-time services
IMAGE_SUBSYSTEM_EFI_ROM 13 EFI ROM image
IMAGE_SUBSYSTEM_XBOX 14 Xbox system
IMAGE_SUBSYSTEM_WINDOWS_BOOT_APPLICATION 16 Boot application
DLLCharacteristics 
The DLL characteristics of the image
Constant Name Value Description
No constant name 0x0001 Reserved
No constant name 0x0002 Reserved
No constant name 0x0004 Reserved
No constant name 0x0008 Reserved
IMAGE_DLLCHARACTERISTICS_DYNAMIC_BASE 0x0040 The DLL can be relocated at load time
IMAGE_DLLCHARACTERISTICS_FORCE_INTEGRITY 0x0080 Code integrity checks are forced
IMAGE_DLLCHARACTERISTICS_NX_COMPAT 0x0100 The image is compatible with data execution prevention (DEP)
IMAGE_DLLCHARACTERISTICS_NO_ISOLATION 0x0200 The image is isolation aware, but should not be isolated
IMAGE_DLLCHARACTERISTICS_NO_SEH 0x0400 The image does not use structured exception handling (SEH). No handlers can be called in this image
IMAGE_DLLCHARACTERISTICS_NO_BIND 0x0800 Do not bind the image
No constant name 0x1000 Reserved
IMAGE_DLLCHARACTERISTICS_WDM_DRIVER 0x2000 A WDM driver
No constant name 0x4000 Reserved
IMAGE_DLLCHARACTERISTICS_TERMINAL_SERVER_AWARE 0x8000 The image is terminal server aware
SizeOfStackReserve 
The number of bytes to reserve for the stack. Only the memory specified by the SizeOfStackCommit member is committed at load time; the rest is made available one page at a time until this reserve size is reached.
SizeOfStackCommit 
The number of bytes to commit for the stack.
SizeOfHeapReserve 
The number of bytes to reserve for the local heap. Only the memory specified by the SizeOfHeapCommit member is committed at load time; the rest is made available one page at a time until this reserve size is reached.
SizeOfHeapCommit 
The number of bytes to commit for the local heap.
LoaderFlags 
This member is obsolete.
NumberOfRvaAndSizes 
The number of directory entries in the remainder of the optional header. Each entry describes a location and size.
DataDirectory 
Possibly the most interesting member of this structure. Provides RVAs and sizes which locate various data structures, which are used for setting up the execution environment of a module. The details of what these structures do exist in other sections of this page. The most interesting entries in DataDirectory are as follows, Export Directory, Import Directory, Resource Directory, and the Bound Import directory. Note that the offsets in bytes are relative to the beginning of the optional header.
Constant Name Value Description Offset PE(32 bit) Offset PE32+(64 bit)
IMAGE_DIRECTORY_ENTRY_EXPORT 0 Export Directory 96 112
IMAGE_DIRECTORY_ENTRY_IMPORT 1 Import Directory 104 120
IMAGE_DIRECTORY_ENTRY_RESOURCE 2 Resource Directory 112 128
IMAGE_DIRECTORY_ENTRY_EXCEPTION 3 Exception Directory 120 136
IMAGE_DIRECTORY_ENTRY_SECURITY 4 Security Directory 128 144
IMAGE_DIRECTORY_ENTRY_BASERELOC 5 Base Relocation Table 136 152
IMAGE_DIRECTORY_ENTRY_DEBUG 6 Debug Directory 144 160
IMAGE_DIRECTORY_ENTRY_ARCHITECTURE 7 Architecture specific data 152 168
IMAGE_DIRECTORY_ENTRY_GLOBALPTR 8 Global pointer register relative virtual address 160 176
IMAGE_DIRECTORY_ENTRY_TLS 9 Thread Local Storage directory 168 184
IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG 10 Load Configuration directory 176 192
IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT 11 Bound Import directory 184 200
IMAGE_DIRECTORY_ENTRY_IAT 12 Import Address Table 192 208
IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT 13 Delay Import table 200 216
IMAGE_DIRECTORY_ENTRY_COM_DESCRIPTOR 14 COM descriptor table 208 224
No constant name 15 Reserved 216 232

Code Sections

The PE Header defines the number of sections in the executable file. Each section definition is 40 bytes in length. Below is an example hex from a program I am writing:

2E746578_74000000_00100000_00100000_A8050000 .text
00040000_00000000_00000000_00000000_20000000
2E646174_61000000_00100000_00200000_86050000 .data
000A0000_00000000_00000000_00000000_40000000
2E627373_00000000_00200000_00300000_00000000 .bss
00000000_00000000_00000000_00000000_80000000

The structure of the section descriptor is as follows:

Offset Length   Purpose
------ -------  ------------------------------------------------------------------
 0x00   8 bytes Section Name - in the above example the names are .text .data .bss
 0x08   4 bytes Size of the section once it is loaded to memory
 0x0C   4 bytes RVA (location) of section once it is loaded to memory
 0x10   4 bytes Physical size of section on disk
 0x14   4 bytes Physical location of section on disk (from start of disk image)
 0x18  12 bytes Reserved (usually zero) (used in object formats)
 0x24   4 bytes Section flags

A PE loader will place the sections of the executable image at the locations specified by these section descriptors (relative to the base address) and usually the alignment is 0x1000, which matches the size of pages on the x86.

Common sections are:

  1. .text/.code/CODE/TEXT - Contains executable code (machine instructions)
  2. .textbss/TEXTBSS - Present if Incremental Linking is enabled
  3. .data/.idata/DATA/IDATA - Contains initialised data
  4. .bss/BSS - Contains uninitialised data

Section Flags

The section flags is a 32-bit bit field (each bit in the value represents a certain thing). Here are the constants defined in the WINNT.H file for the meaning of the flags:

#define IMAGE_SCN_TYPE_NO_PAD                0x00000008  // Reserved.
#define IMAGE_SCN_CNT_CODE                   0x00000020  // Section contains code.
#define IMAGE_SCN_CNT_INITIALIZED_DATA       0x00000040  // Section contains initialized data.
#define IMAGE_SCN_CNT_UNINITIALIZED_DATA     0x00000080  // Section contains uninitialized data.
#define IMAGE_SCN_LNK_OTHER                  0x00000100  // Reserved.
#define IMAGE_SCN_LNK_INFO                   0x00000200  // Section contains comments or some  other type of information.
#define IMAGE_SCN_LNK_REMOVE                 0x00000800  // Section contents will not become part of image.
#define IMAGE_SCN_LNK_COMDAT                 0x00001000  // Section contents comdat.
#define IMAGE_SCN_NO_DEFER_SPEC_EXC          0x00004000  // Reset speculative exceptions handling bits in the TLB entries for this section.
#define IMAGE_SCN_GPREL                      0x00008000  // Section content can be accessed relative to GP
#define IMAGE_SCN_MEM_FARDATA                0x00008000
#define IMAGE_SCN_MEM_PURGEABLE              0x00020000
#define IMAGE_SCN_MEM_16BIT                  0x00020000
#define IMAGE_SCN_MEM_LOCKED                 0x00040000
#define IMAGE_SCN_MEM_PRELOAD                0x00080000
#define IMAGE_SCN_ALIGN_1BYTES               0x00100000  //
#define IMAGE_SCN_ALIGN_2BYTES               0x00200000  //
#define IMAGE_SCN_ALIGN_4BYTES               0x00300000  //
#define IMAGE_SCN_ALIGN_8BYTES               0x00400000  //
#define IMAGE_SCN_ALIGN_16BYTES              0x00500000  // Default alignment if no others are specified.
#define IMAGE_SCN_ALIGN_32BYTES              0x00600000  //
#define IMAGE_SCN_ALIGN_64BYTES              0x00700000  //
#define IMAGE_SCN_ALIGN_128BYTES             0x00800000  //
#define IMAGE_SCN_ALIGN_256BYTES             0x00900000  //
#define IMAGE_SCN_ALIGN_512BYTES             0x00A00000  //
#define IMAGE_SCN_ALIGN_1024BYTES            0x00B00000  //
#define IMAGE_SCN_ALIGN_2048BYTES            0x00C00000  //
#define IMAGE_SCN_ALIGN_4096BYTES            0x00D00000  //
#define IMAGE_SCN_ALIGN_8192BYTES            0x00E00000  //
#define IMAGE_SCN_ALIGN_MASK                 0x00F00000
#define IMAGE_SCN_LNK_NRELOC_OVFL            0x01000000  // Section contains extended relocations.
#define IMAGE_SCN_MEM_DISCARDABLE            0x02000000  // Section can be discarded.
#define IMAGE_SCN_MEM_NOT_CACHED             0x04000000  // Section is not cachable.
#define IMAGE_SCN_MEM_NOT_PAGED              0x08000000  // Section is not pageable.
#define IMAGE_SCN_MEM_SHARED                 0x10000000  // Section is shareable.
#define IMAGE_SCN_MEM_EXECUTE                0x20000000  // Section is executable.
#define IMAGE_SCN_MEM_READ                   0x40000000  // Section is readable.
#define IMAGE_SCN_MEM_WRITE                  0x80000000  // Section is writeable.

Imports and Exports - Linking to other modules

What is linking?

Whenever a developer writes a program, there are a number of subroutines and functions which are expected to be implemented already, saving the writer the hassle of having to write out more code or work with complex data structures. Instead, the coder need only declare one call to the subroutine, and the linker will decide what happens next.

There are two types of linking that can be used: static and dynamic. Static uses a library of precompiled functions. This precompiled code can be inserted into the final executable to implement a function, saving the programmer a lot of time. In contrast, dynamic linking allows subroutine code to reside in a different file (or module), which is loaded at runtime by the operating system. This is also known as a "Dynamically linked library", or DLL. A library is a module containing a series of functions or values that can be exported. This is different from the term executable, which imports things from libraries to do what it wants. From here on, "module" means any file of PE format, and a "Library" is any module which exports and imports functions and values.

Dynamically linking has the following benefits:

This section discusses how this is achieved using the PE file format. An important point to note at this point is that anything can be imported or exported between modules, including variables as well as subroutines.

Loading

The downside of dynamically linking modules together is that, at runtime, the software which is initialising an executable must link these modules together. For various reasons, you cannot declare that "The function in this dynamic library will always exist in memory here". If that memory address is unavailable or the library is updated, the function will no longer exist there, and the application trying to use it will break. Instead, each module (library or executable) must declare what functions or values it exports to other modules, and also what it wishes to import from other modules.

As said above, a module cannot declare where in memory it expects a function or value to be. Instead, it declares where in its own memory it expects to find a pointer to the value it wishes to import. This permits the module to address any imported value, wherever it turns up in memory.

Exports

Exports are functions and values in one module that have been declared to be shared with other modules. This is done through the use of the "Export Directory", which is used to translate between the name of an export (or "Ordinal", see below), and a location in memory where the code or data can be found. The start of the export directory is identified by the IMAGE_DIRECTORY_ENTRY_EXPORT entry of the resource directory. All export data must exist in the same section. The directory is headed by the following structure:

struct IMAGE_EXPORT_DIRECTORY {
	long Characteristics;
	long TimeDateStamp;
	short MajorVersion;
	short MinorVersion;
	long Name;
	long Base;
	long NumberOfFunctions;
	long NumberOfNames;
	long *AddressOfFunctions;
	long *AddressOfNames;
	long *AddressOfNameOrdinals;
}

The "Characteristics" value is generally unused, TimeDateStamp describes the time the export directory was generated, MajorVersion and MinorVersion should describe the version details of the directory, but their nature is undefined. These values have little or no impact on the actual exports themselves. The "Name" value is an RVA to a zero terminated ASCII string, the name of this library name, or module.

Names and Ordinals

Each exported value has both a name and an "ordinal" (a kind of index). The actual exports themselves are described through AddressOfFunctions, which is an RVA to an array of RVAs, each pointing to a different function or value to be exported. The size of this array is in the value NumberOfFunctions. Each of these functions has an ordinal. The "Base" value is used as the ordinal of the first export, and the next RVA in the array is Base+1, and so forth.

Each entry in the AddressOfFunctions array is identified by a name, found through the RVA AddressOfNames. The data where AddressOfNames points to is an array of RVAs, of the size NumberOfNames. Each RVA points to a zero terminated ASCII string, each being the name of an export. There is also a second array, pointed to by the RVA in AddressOfNameOrdinals. This is also of size NumberOfNames, but each value is a 16 bit word, each value being an ordinal. These two arrays are parallel and are used to get an export value from AddressOfFunctions. To find an export by name, search the AddressOfNames array for the correct string and then take the corresponding value from the AddressOfNameOrdinals array. This value is then used as index to AddressOfFunctions (yes, it's 0-based index actually, NOT base-biased ordinal, as the official documentation suggests!).

Forwarding

As well as being able to export functions and values in a module, the export directory can forward an export to another library. This allows more flexibility when re-organising libraries: perhaps some functionality has branched into another module. If so, an export can be forwarded to that library, instead of messy reorganising inside the original module.

Forwarding is achieved by making an RVA in the AddressOfFunctions array point into the section which contains the export directory, something that normal exports should not do. At that location, there should be a zero terminated ASCII string of format "LibraryName.ExportName" for the appropriate place to forward this export to.

Imports

The other half of dynamic linking is importing functions and values into an executable or other module. Before runtime, compilers and linkers do not know where in memory a value that needs to be imported could exist. The import table solves this by creating an array of pointers at runtime, each one pointing to the memory location of an imported value. This array of pointers exists inside of the module at a defined RVA location. In this way, the linker can use addresses inside of the module to access values outside of it.

The Import directory

The start of the import directory is pointed to by both the IMAGE_DIRECTORY_ENTRY_IAT and IMAGE_DIRECTORY_ENTRY_IMPORT entries of the resource directory (the reason for this is uncertain). At that location, there is an array of IMAGE_IMPORT_DESCRIPTOR structures. Each of these identify a library or module that has a value we need to import. The array continues until an entry where all the values are zero. The structure is as follows:

struct IMAGE_IMPORT_DESCRIPTOR {
	long *OriginalFirstThunk;
	long TimeDateStamp;
	long ForwarderChain;
	long Name;
	long *FirstThunk;
}

The TimeDateStamp is relevant to the act of "Binding", see below. The Name value is an RVA to an ASCII string, naming the library to import. ForwarderChain will be explained later. The only thing of interest at this point, are the RVAs OriginalFirstThunk and FirstThunk. Both these values point to arrays of RVAs, each of which point to a IMAGE_IMPORT_BY_NAMES struct. The arrays are terminated with an entry that is equal to zero. These two arrays are parallel and point to the same structure, in the same order. The reason for this will become apparent shortly.

Each of these IMAGE_IMPORT_BY_NAMES structs has the following form:

struct IMAGE_IMPORT_BY_NAME {
	short Hint;
	char Name[1];
}

"Name" is an ASCII string of any size that names the value to be imported. This is used when looking up a value in the export directory (see above) through the AddressOfNames array. The "Hint" value is an index into the AddressOfNames array; to save searching for a string, the loader first checks the AddressOfNames entry corresponding to "Hint".


To summarise: The import table consists of a large array of IMAGE_IMPORT_DESCRIPTORs, terminated by an all-zero entry. These descriptors identify a library to import things from. There are then two parallel RVA arrays, each pointing at IMAGE_IMPORT_BY_NAME structures, which identify a specific value to be imported.

Imports at runtime

Using the above import directory at runtime, the loader finds the appropriate modules, loads them into memory, and seeks the correct export. However, to be able to use the export, a pointer to it must be stored somewhere in the importing module's memory. This is why there are two parallel arrays, OriginalFirstThunk and FirstThunk, identifying IMAGE_IMPORT_BY_NAME structures. Once an imported value has been resolved, the pointer to it is stored in the FirstThunk array. It can then be used at runtime to address imported values.

Bound imports

The PE file format also supports a peculiar feature known as "binding". The process of loading and resolving import addresses can be time consuming, and in some situations this is to be avoided. If a developer is fairly certain that a library is not going to be updated or changed, then the addresses in memory of imported values will not change each time the application is loaded. So, the import address can be precomputed and stored in the FirstThunk array before runtime, allowing the loader to skip resolving the imports - the imports are "bound" to a particular memory location. However, if the versions numbers between modules do not match, or the imported library needs to be relocated, the loader will assume the bound addresses are invalid, and resolve the imports anyway.

The "TimeDateStamp" member of the import directory entry for a module controls binding; if it is set to zero, then the import directory is not bound. If it is non-zero, then it is bound to another module. However, the TimeDateStamp in the import table must match the TimeDateStamp in the bound module's FileHeader, otherwise the bound values will be discarded by the loader.

Forwarding and binding

Binding can of course be a problem if the bound library / module forwards its exports to another module. In these cases, the non-forwarded imports can be bound, but the values which get forwarded must be identified so the loader can resolve them. This is done through the ForwarderChain member of the import descriptor. The value of "ForwarderChain" is an index into the FirstThunk and OriginalFirstThunk arrays. The OriginalFirstThunk for that index identifies the IMAGE_IMPORT_BY_NAME structure for a import that needs to be resolved, and the FirstThunk for that index is the index of another entry that needs to be resolved. This continues until the FirstThunk value is -1, indicating no more forwarded values to import.

Resources

Resource structures

Resources are data items in modules which are difficult to be stored or described using the chosen programming language. This requires a separate compiler or resource builder, allowing insertion of dialog boxes, icons, menus, images, and other types of resources, including arbitrary binary data. A number of API calls can then be used to retrieve resources from the module. The base of resource data is pointed to by the IMAGE_DIRECTORY_ENTRY_RESOURCE entry of the data directory, and at that location there is an IMAGE_RESOURCE_DIRECTORY structure:

struct IMAGE_RESOURCE_DIRECTORY
{
	long Characteristics;
	long TimeDateStamp;
	short MajorVersion;
	short MinorVersion;
	short NumberOfNamedEntries;
	short NumberOfIdEntries;
}

Characteristics is unused, and TimeDateStamp is normally the time of creation, although it doesn't matter if it's set or not. MajorVersion and MinorVersion relate to the versioning info of the resources: the fields have no defined values. Immediately following the IMAGE_RESOURCE_DIRECTORY structure is a series of IMAGE_RESOURCE_DIRECTORY_ENTRYs, the number of which are defined by the total of NumberOfNamedEntries and NumberOfIdEntries. The first portion of these entries are for named resources, the latter for ID resources, depending on the values in the IMAGE_RESOURCE_DIRECTORY struct. The actual shape of the resource entry structure is as follows:

struct IMAGE_RESOURCE_DIRECTORY_ENTRY
{
	long NameId;
	long *Data;
}

The NameId value has dual purpose: if the most significant bit (or sign bit) is clear, then the lower 16 bits are an ID number of the resource. Alternatly, if the top bit is set, then the lower 31 bits make up an offset from the start of the resource data to the name string of this particular resource. The Data value also has a dual purpose: if the most significant bit is set, the remaining 31 bits form an offset from the start of the resource data to another IMAGE_RESOURCE_DIRECTORY (i.e. this entry is an interior node of the resource tree). Otherwise, this is a leaf node, and Data contains the offset from the start of the resource data to a structure which describes the specifics of the resource data itself (which can be considered to be an ordered stream of bytes):

struct IMAGE_RESOURCE_DATA_ENTRY
{
        long *Data;
        long Size;
        long CodePage;
        long Reserved;
}

The Data value contains an RVA to the actual resource data, Size is self-explanatory, and CodePage contains the Unicode codepage to be used for decoding Unicode-encoded strings in the resource (if any). Reserved should be set to 0.

Layout

The above system of resource directory and entries allows simple storage of resources, by name or ID number. However, this can get very complicated very quickly. Different types of resources, the resources themselves, and instances of resources in other languages can become muddled in just one directory of resources. For this reason, the resource directory has been given a structure to work by, allowing separation of the different resources.

For this purpose, the "Data" value of resource entries points at another IMAGE_RESOURCE_DIRECTORY structure, forming a tree-diagram like organisation of resources. The first level of resource entries identifies the type of the resource: cursors, bitmaps, icons and similar. They use the ID method of identifying the resource entries, of which there are twelve defined values in total. More user defined resource types can be added. Each of these resource entries points at a resource directory, naming the actual resources themselves. These can be of any name or value. These point at yet another resource directory, which uses ID numbers to distinguish languages, allowing different specific resources for systems using a different language. Finally, the entries in the language directory actually provide the offset to the resource data itself, the format of which is not defined by the PE specification and can be treated as an arbitrary stream of bytes.

Windows DLL Files

Windows DLL files are a brand of PE file with a few key differences:

DLLs may be loaded in one of two ways, a) at load-time, or b) by calling the LoadModule() Win32 API function.

Function Exports

Functions are exported from a DLL file by using the following syntax:

 __declspec(dllexport) void MyFunction() ...

The "__declspec" keyword here is not a C language standard, but is implemented by many compilers to set extendable, compiler-specific options for functions and variables. Microsoft C Compiler and GCC versions that run on windows allow for the __declspec keyword, and the dllexport property.

Functions may also be exported from regular .exe files, and .exe files with exported functions may be called dynamically in a similar manner to .dll files. This is a rare occurrence, however.

Identifying DLL Exports

There are several ways to determine which functions are exported by a DLL. The method that this book will use (often implicitly) is to use dumpbin in the following manner:

dumpbin /EXPORTS <dll file>

This will post a list of the function exports, along with their ordinal and RVA to the console.

Function Imports

In a similar manner to function exports, a program may import a function from an external DLL file. The dll file will load into the process memory when the program is started, and the function will be used like a local function. DLL imports need to be prototyped in the following manner, for the compiler and linker to recognize that the function is coming from an external library:

 __declspec(dllimport) void MyFunction();

Identifying DLL Imports

If is often useful to determine which functions are imported from external libraries when examining a program. To list import files to the console, use dumpbin in the following manner:

dumpbin /IMPORTS <dll file>

You can also use depends.exe to list imported and exported functions. Depends is a a GUI tool and comes with Microsoft Platform SDK.

This article is issued from Wikibooks. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.