Sunday, May 3, 2015

Easy anti-trace with Syscall instruction

When syscall/sysret were introduced with the AMD64 architecture it addressed a few needed improvements. Firstly, sysenter did not save a return address, to which 32 bit Windows always returned to a predefined location. Secondly, IF was cleared on a sysenter to allow an uninterrupted trap frame saving, but TF was not. Much like how a CPL 3 -> CPL 0 call gate transfer with TF set would introduce an INT 01 originating from privilege level 0, sysenter had the same issue.

Now contrary to some of the nonsense I've seen online, Windows does not use call gates. Therefore in 32 bit Windows the only instance this could happen was on a sysenter. Nonetheless however, Windows needed a hack for user-mode single-stepping through sysenter, so the INT 01 handler has a check to see if the instruction pointer on the interrupt stack is that of KiFastCallEntry. Secondly, KiServiceExit will check to see if TF is set in the frame and then does an IRET to the return address, with TF set. A trap is raised after the next instruction following the IRET, this is why if you step over sysenter in 32 bit Windows, it appears to skip an instruction though it really does not.

When syscall came along, this issue was also fixed by adding a flag mask which could be used to mask off the trap flag on a system call entry. RFlags is saved in the r11 register, and then reloaded again by sysret. In fact, sysret is special, if TF is set in r11, sysret will raise a #DB after its instruction boundary unlike loading an RFlags image from the stack with TF set using IRET. This makes for a smooth single-step operation over a syscall instruction.

The magic happens because r11 is volatile in this case and shows the state of RFlags after sysret branches back to user-mode.

Pretty simple eh?

mov eax, 0ffffffffh    //invalid syscall
test r11d, 0100h
jnz single_step_detected

Monday, November 11, 2013

Simple debugger detection with GetProcessIoCounters

This higher level API is provided to application developers in order to count IO transactions for a process, or a job object (group of processes). Even with such an innocent face, it can easily be used to determine if the process has an active debug port.

The IO_COUNTERS structure, which is filled as a result of the call, tells us operation counts, and byte transfer counts. If you don't already know, it's pretty simple:

Read operation count - Pending or completed IO with NtReadFile

Write operation count - Pending or completed IO with NtWriteFile

Other operation count - IO with NtCreateFile/NtDeviceIoControlFile/NtFsControlFile (not limited to these, the list goes on NtCancelIoFileEx, NtQueryDirectoryFile, etc).

When a debugger is attached and it's target calls NtMapViewOfSection (hint, mapping a dll image) for a section object that is an image, it will queue a debug event. Included in this debug event, is a file handle to the image, the debugger thread waiting on the port then calls ObDuplicateObject to provide a file handle as part of it's debug message to the application.

In Peter Ferrie's anti-debug paper, he describes how to deduce that a debugger is attached due to the debugger end not closing it's duplicated handle thereby preventing exclusive access to the file.

This method however is not based off whether the debugger code forgets to close the handle, or uses it (either way preventing exclusive file access) but instead will work regardless, even if the debugger does not use the file handle and closes it upon reception. This is because the initial handle is opened within the context of the target via NtOpenFile (therefore increasing OtherOperationCount by 1), and although closed before NtMapViewOfSection returns, the fact that it incremented means the process has a debug port to dispatch messages to. Otherwise NtOpenFile would never be called, and the other operation count would not increment.

So detection can be as a simple as:


//store other operation count somewhere

MapViewOfFile();    //remember, only builds a debug event if it's an image


//check otheroperationcount, if incremented, asplode.

Sunday, October 20, 2013

Tricky and powerful anti-tracing mechanisms with BTF and LBR

If you haven't already read this, you probably should. It covers the fundamentals of what will be discussed here. That way, I can assume you already know what is going on and I don't have to cover all the miniscule details in this post :)

Back already eh?

Simply setting the trap flag with an iret/popf variant has always been a common technique to thwart single-stepping. There are also API's to offer similar functionality, we wont cover them today because that isn't really the scope here.

One of the most common is something similar to this:

or word ptr [sp], 0100h
xor eax, eax
xor ebx, ebx

As you know,  when the boundary of xor eax, eax is reached, we will have an int 01 trap with a saved IP of whatever follows it. Again as you should hopefully know, this is common method to trick a debugger that is already single stepping this sequence into thinking that it caused the exception and to continue right along. Now any debugger worth its weight in (bytes? gold? plugins?), or a user who isn't just auto-tracing and looking manually, should catch this.

There are a few plugins already for various debuggers that check the trap flag status prior to popf/iret/syscall/ints and attempt to act accordingly, like resuming the trace operations at KiUserExceptionDispatcher.

Now lets look at this sequence again, but imagine that BTF is enabled.

or word ptr [sp], 0100h
xor eax, eax
xor ebx, ebx

 Now lets just assume for a minute that no debugger is attached. Execution will continue right along after popf/popfd and no trap will be recognized. This as you know is because even though TF is set, we haven't hit a taken branch. Thus no trap. We could then modify our sequence a bit into something like this:

or word ptr [sp], 0100h
xor eax, eax
xor ebx, ebx

jmp 02h
xor eax, eax

The trap will occur after the boundary of the unconditional jump is reached. The application can then handle accordingly.

Now lets throw OllyDbg into the mix and step through this sequence. You will notice how Olly will single step normally normally over the sequence. Olly will mask off Dr7.BTF after debug event, even if it passes the event back to user code. This means the following situations could easily happen:

-A user or a plugin unaware of this during a trace could mistakenly let the application process a single step exception which followed an instruction that set EFLAGS.TF. The application would see this and act accordingly (like.. explode or something.)

-Ollydbg AND WinDbg both mask off Dr7.BTF when sending an exception back to KiUserExceptionDispatcher. This means that for the duration of the exception chain dispatching, BTF will have no effect.

So the following scenario would ensue:

The following is executed while debugger is attached.

 mov rcx, hThread //thread handle
mov rdx, context //context setting LBR and BTF
call SetThreadContext
//random crap
xor ebx, ebx
mov eax, 0x1
shl rax, 0x10
//cause some kind of event
int 3
  The application must have wanted this, so pass it back. But since the debugger masked Dr7.BTF, setting the trap flag in your exception handler with popf/iret will cause a trap at following instruction boundary. Otherwise nothing would happen until you either A. reset the flag, or b, hit a taken branch. This is ample evidence that a debugger is involved.

IDA's win32 debugger and Cheat Engine do not have this problem, but don't worry, we have something up our sleeve for them. Also a quick side-note here; a year or so ago, a colleague of mine made some real fun of me for using Cheat Engine as a dynamic analysis and debugging tool. Contrary to whatever he thinks, anyone who does this as a passion loves Cheat Engine. The arsenal just isn't complete without it.

Here is how we can fool them all.

Reminder: LBR data will only be written to the ExceptionInformation structure if the trap flag is set when a #DB exception occurs. In this case we use ICEBP for our #DB. ICEBP for all intents and purposes is a #DB exception.

So if we single step OR branch step over the following magical sequence, it will easily be detectable:

//LBR and BTF already set

 inc eax
cmp eax, 0x5
je 02h
xor ebx, ebx
mov ecx, edx
popfd                             //sets trap flag
 Our first assumption is that the debugger is smart enough to detect ICEBP, whether it be by decoding the instruction stream or checking Dr6, and then passing the exception back to the application. If this isn't happening then the application already wins this round because the exception chain was never dispatched.

If no debugger is tracing this sequence, the ExceptionInformation fields rendered to our application via the EXCEPTION_RECORD structure will contain the linear address of the 'je 02h' instruction, and the second field will contain the linear address of the 'mov ecx, edx' instruction.

If a debugger were single stepping over this sequence, it's implied that it masked Dr7.BTF, and maybe even Dr7.LBR. In either case, even if it only masked one, the ExceptionInformation fields will have a null index, and no data.

Furthermore, if the debugger were branch tracing instead of single stepping over this sequence meaning it left BTF and LBR on, the ExceptionInformation data would contain the linear address of KiDebugTrapOrFault's IRET instruction, followed by the linear address of 'mov ecx, edx. If the debugger for some reason decided to mask off LBR but leave BTF enabled, ExceptionInformation index would be null and the fields would be empty.

In either of the above case, if the debugger didn't preserve LBR or BTF, the improper values would be stored in the ExceptionInformation fields, and we could assume a debugger is attached.

The BTF and LBR Dr7 backdoors exist from XP to Windows 8 in both 32 and 64 bit editions of Windows making this a highly portable anti-debug/trace technique.

Tuesday, October 15, 2013

User/kernel shared page continued...

 This is a continuation of the original post
Finally had some time to look this one over. As you hopefully recall in the previous installment I mentioned how I noticed data fluctuation in the same area of the page for 32 bit builds of Windows 7 (haven't checked 8 for either build yet).

As I guessed it's pretty much the same functionality (garbage stack portion) and can be used to infer /debug. This is the mode where a kernel debugger is not necessarily attached, but can be at anytime. Other indicators such as KdDebuggerEnabled at 0x2D4 or KdDebuggerNotPresent which as you know can be queried with NtQuerySystemInformation will not be of any value.

Anyways in this case, it's close to the same but not entirely. KdInitSystem parses the load options, if /debug is set, we expand our stack further than anticipated for a normal boot phase and land at DbgLoadImageSymbols which uses int 2D (debugger services, like symbols ;p) regardless of whether or not a KD is actually present, if not it's just caught by exception handlers in this case.

Now since we grew the stack quite a bit, and the stack pages were zeroed to begin with, we find ourselves at KiInitializeXStatePolicy. This function writes vendor specific extended processor feature bits into the shared page. It allocates a good 0x450 bytes, which then uncovers the garbage left behind (or is it?) from the DbgLoadImageSymbols interrupt control transfer and exception dispatch.

If the value at 0x4C0 is non-zero, this is enough to indicate. It is highly improbable that the Xsave features will extend that far, but starting at Xsave and searching at a 4 byte boundary for 0xFFFFFD34 would be a more appropriate solution. Similar to the 4 byte 'DBGP' signature for 64 bit builds.

This applies to an original deployed 32 bit copy, all the way to the most recent Windows updates.

Keep in mind this is only for 32 bit builds of Windows 7. The same deal exists in x86/64 targets but is a slightly different story.

Tuesday, July 23, 2013

Kernel/user shared page kernel debugger detection (x64)

No no, this isn't the single byte indicator at 0x2D4. Just in case you had maybe thought I lost my mind or something. I did however lose my mind over dictating whether or not they did this on purpose. Read on and post your thoughts.

Lets imagine an operating instance with no outstanding boot flags used to enable the kernel debugger. The data beyond the xsave features area (fpu xstor features etc) may look something like this:

Nothing out of the ordinary eh?

Alright. Lets boot with /debug and com port 1

Wow would you look at all this extra data. Hey I even see a string 'DBGP'! Lets analyze what is really going on here to see if this is on purpose or just simply some kind of accident. After KiSystemStartup passes the loader parameter block to KdInitSystem, KdInitSystem dictates whether or not to initialize the kernel debugger based off of the boot parameters. It is at this point of  deciding where our kernel stack is in the current state. You'll have to excuse my art skills though, no fancy crayon drawings today:

data higher
then SP. in use.


data lower
then SP. not
allocated (garbage)

As KdInitializeDebugger goes through it's layers of execution, needless to say it expands SP as it goes. DBGP is actually an ACPI table in which HAL determines if existing and capable debug ports do exist. For example it ensures that the com port is an actual 16550 UART. This isn't limited to just serial ports, as you know, debugging over USB/network/IEEE is also available. ACPI simply states whether or not these interfaces abide by the Microsoft debugging standard. For instance the USB host controllers must have a debug interface, or it cannot be used for this purpose.

It just so happens that during this process, the table identifier 'DBGP' is saved to the stack prior to asking HAL to look up the table ;p

Thus when KdInitializeDebugger unravels itself, this extra data along with our lovely friend DBGP still exist in the garbage portion of the stack. Ok you are with me so far, that is good, lets continue.

A short time later, KiComputeEnabledFeatures allocates itself a structure to fill for xsave features. It just so happens that this structure overlaps the garbage left behind from KdInitializeDebugger. Otherwise the structure would in fact be zeroed out because it has not been used prior. This structure is then written to the xsave features portion of the kernel/user shared page, and contains this extra information. This extra information is enough to infer presence of a kernel debugger because without /DEBUG KdInitializeDebugger is never called.

This heading is also labeled as (x64). I did look at windows in legacy operating mode but didn't notice the same results however there was some fluctuation, perhaps enough to detect the same flags. When I get more time I will have a look.

Now whether or not this is on purpose, you can decide :)

Tuesday, July 2, 2013

Time slip DPC kernel debugger detection

Been quite awhile since my last entry. Spent some time in Key West, FL and spent some more time moving to the other side of town. I have a some fun things to post about over the next month or so. So stay tuned ;p

When a kernel debugger can attach to the system (KdPitchDebugger == 0) the possibility exists for software (usermode included) to implement an event object type to be set to the signaled state when a time slip occurs. In this context, a time slip occurs because an exception that is passed to the kernel debugger puts all logical processors in a wait state with interrupts masked off.

No external interrupts from timing chips (pit, hpet) can occur. Thus when the logical processor(s) are continued, the machine is living in the past so to speak. Time keeps on slippin slippin slippin...


Prior to exiting the debugger, KdExitDebugger will insert the KdpTimeSlipDpc DPC object into the processor's DPC queue. This DPC will queue a passive level work item routine (KdpTimeSlipWork) which will set a provided event object to the signaled state, if one is provided. User level software can set this field with NtSetSystemInformation with an infoclass of 0x2E. The windows time service
in particular sets this field when it starts up, that is, if the service is running. However it can still be reset. I haven't really looked over the windows time service but my guess is that when and if it is notified of a time slip, that it probably attempts to synchronize the system back over NTP, but who knows.. haven't looked.

We can be sure that if this DPC is fired that a kernel debugger is attached to the system because the only way the initial DPC can be queued is via KdExitDebugger. Control flow cannot reach that point unless an exception occured which was forwarded to the debugger.

The passive level work routine will queue another timer based DPC object with KiSetTimer with a hardcoded duetime of 94B62E00. This value is relative to the system clock at 179999999900 nanoseconds, or every 180 seconds (3 minutes ;p) that it will attempt to set your provided event
object to the signaled state.

Please note this requires the SeSystemtimePrivilege privilege.

Quick example for clarity:



if(WaitForSingleObject(a1,1)==WAIT_OBJECT_0)  //kernel debugger attached

Wednesday, March 13, 2013

KeLoaderBlock and you

My goal of this blog is to generally post undocumented details of the Windows operating system. By details I mean topics that would interest both software reverse-engineers and malware analysts alike. One of those topics to me is a lot more prominent then the rest, and that is mechanisms that attempt to detect or evade debugging. Whether it be DRM or actual malware, I'd have to say it's my favorite topic.

What were going to discuss today has probably already been discussed elsewhere, however out of all the methods used to detect if a kernel debugger is attached to the system, I think this one is hardly used or mentioned. Therefore I think it warrants a quick discussion today.

As you probably already know, KeLoaderBlock is the first argument to KiSystemStartup. Among a plethora of other details this structure contains the boot flags from the current BCD entries corresponding our current boot. For instance boot option selection timeout, test-signing, NX opt in or opt out, /debug flags for the kernel debugger etc.

KeLoaderBlock is not accessible from user-mode, but I'm always surprised that many are unaware that during initialization, the startup flags are written to the following registry fields.

HKLM\System\CurrentControlSet\Control - SystemStartOptions

From these flags the software can easily find out if the system was booted with /TESTSIGNING or /DEBUG ON 

This method we discussed as you can see is very simple. So simple that it's often overlooked.