For each thread quantum (in this case, the time that a single logical thread gets on a physical processor), windows will keep track of each time KiSwapContext is called and returns to the saved thread state (stack, registers) for that thread. Each time this happens, SwapContext will increment the ContextSwitchCount member of the KTHREAD structure. We will be using the following native API's:
NtQuerySystemInformation
NtQueryInformationThread
I recommend using 2 threads for probing ContextSwitchCount as an anti-debug mechanism, it's not required but otherwise you have to ensure the current thread is near the beginning of it's cycle time. Otherwise a context switch could occur at the next DPC interrupt. As for probing cycle time itself as an anti-debug mechanism, 2 threads is required.
First I will explain probing ContextSwitchCount then afterwards, the thread cycle time.
Step 1 is to create an additional thread in our application. These will be extremely simple and vague examples ;p
All this thread will do is wait on a synchronization object.
ULONG Waiter(HANDLE event1)
{
WaitForSingleObject(event1,INFINITE);
}
int main()
{
HANDLE event1=CreateEvent(NULL,FALSE,FALSE,NULL);
CreateThread(NULL,0,(LPTHREAD_START_ROUTINE)Waiter,(LPVOID)event1,0,NULL);
//...
//...
}
Step 2. We will call NtQuerySystemInformation and locate our SYSTEM_PROCESS_INFORMATION structure. We will then navigate to the SYSTEM_THREAD_INFORMATION structure for the thread we have just created. We will wait until it has entered a waiting state (0x5). Once we have established that the thread is waiting, we will store it's ContextSwitchCount.
int main()
{
//...
//...
SYSTEM_PROCESS_INFORMATION π
SYSTEM_THREAD_INFORMATION &ti;
ULONG SwitchCount;
//Call NtQuerySystemInformation. Assign a structure pointer.
do
{
NtQuerySystemInformation(SystemProcessandThreadInformation,&heapbuffer,heapbuffersize,&len);
} while(ti->ThreadState!=0x5);
SwitchCount=ti->ContextSwitches;
}
Like I said, vague examples ;p
Step 3. At this point we have established the fact that our secondary thread is waiting on our synchronization object. We have also stored and saved it's last ContextSwitchCount. When a thread is waiting on a synchronization object, it is not added to the ready queue until either a kernel APC is queued to the thread, or the sync object is signaled.
In our main thread we will trigger an exception, this can be anything. For the sake of simplicity we will just use int3.
int main()
{
//...
//...
_asm
{
push handler
push fs:[0x0]
mov fs:[0x0], esp
int 3
}
}
I don't really know why I'm putting a code example for that one, but there it is. At this point, lets assume since int3 is a trap exception, (but even though SEH uses ExceptionAddress, so EIP-1), we advance our instruction pointer ahead one byte, then resume execution.
Step 4. We once again call NtQuerySytemInformation and walk through the SystemProcessandThreadInformation buffer to locate our process and our waiting thread, and probe it's context switch count.
int main()
{
//...
//...
//...
//Call NtQuerySystemInformation, walk buffer to our thread data
if(ti->ContextSwitches>SwitchCount)
{
//debugger detected, do something
}
}
As you can see, we compare our waiting thread's current context switch count to the previous value we probed. If it is higher, a debugger was attached to the process when we generated our exception and here is why:
When a thread generates an exception and a debug port is present for the process, it calls DbgkSuspendProcess to suspend all remaining threads in the process, while the thread that generated the exception will go on to wait on the debug object's synchronization mutex until the debugger continues the exception.
The context switch count is incremented because thread suspension is done via kernel APC's. As stated earlier, the waiting thread will be entered into the ready queue in one of 2 cases. Kernel APC's or the object being signaled. The same goes for cycle time. Using the above logic, we can probe the thread's cycle time, generate an exception and then probe it again. If incremented, a debugger is present. To probe cycle time we use NtQueryInformationThread with an infoclass of 0x17.
If no debugger is present, the faulting thread does not suspend remaining threads in wait for the debugger, instead it will resume its execution at KiUserExceptionDispatcher, and the thread we probed which is waiting on the synchronization object will have it's context switch count and cycle time unchanged.
Monday, February 25, 2013
Tuesday, February 19, 2013
RTL_USER_PROCESS_PARAMETERS anti-debug mechanism
While trying to figure out why windbg was modifying the current directory in this structure under certain scenarios, I took note of the flags members and wondered if they were possibly based off of certain process creation flags. Sure enough, and this doesn't seem to be documented anywhere.
The flags member is at 0x8 in the RTL_USER_PROCESS_PARAMETERS structure. Note, it is it not the DebugFlags member.
If the process is started using either the DEBUG_PROCESS or DEBUG_ONLY_THIS_PROCESS flags with CreateProcess(), bit 14 of this value will not be set. If these flags aren't used then bit 14 of this value will be set.
The flags member is at 0x8 in the RTL_USER_PROCESS_PARAMETERS structure. Note, it is it not the DebugFlags member.
If the process is started using either the DEBUG_PROCESS or DEBUG_ONLY_THIS_PROCESS flags with CreateProcess(), bit 14 of this value will not be set. If these flags aren't used then bit 14 of this value will be set.
Branch tracing and LBR's
A few months ago I wrote an article on how windows provides user-mode access to debug_ctl MSR's for branch tracing and LBR stack records. I also explain how this method can be used to catch the last branch to a function when the caller nukes the call stack in an attempt to obscure where it originated from.
The article then goes on to explain how this method can be used in VM detection due to most hyper-visor's not making use of LBR virtualization.
You can find the article here
The article then goes on to explain how this method can be used in VM detection due to most hyper-visor's not making use of LBR virtualization.
You can find the article here
Instrumentationcallback and advanced debugging
This entry was updated on March 14, 2015
A week or so ago I posted an article on CodeProject related to InstrumentationCallback and how this feature facilitates code instrumentation for important transitions, as well as works as an interesting anti-debug and analysis mechanism.
You can find the article here.
What I failed to mention in the article, is that while 32 bit processes running in the WOW64 layer can also make use of this functionality, they are left void of the KiRaiseUserExceptionDispatcher and system call transitions. This is not a major problem because under WOW64, system calls can still be instrumented in a number of interesting ways without kernel code. One of them being usage of the wow64log library. You can read more about that here.
Under WOW64, you can still instrument:
LdrInitializeThunk
KiUserExceptionDispatcher
KiUserApcDispatcher
KiUserCallbackDispatcher
A week or so ago I posted an article on CodeProject related to InstrumentationCallback and how this feature facilitates code instrumentation for important transitions, as well as works as an interesting anti-debug and analysis mechanism.
You can find the article here.
What I failed to mention in the article, is that while 32 bit processes running in the WOW64 layer can also make use of this functionality, they are left void of the KiRaiseUserExceptionDispatcher and system call transitions. This is not a major problem because under WOW64, system calls can still be instrumented in a number of interesting ways without kernel code. One of them being usage of the wow64log library. You can read more about that here.
Under WOW64, you can still instrument:
LdrInitializeThunk
KiUserExceptionDispatcher
KiUserApcDispatcher
KiUserCallbackDispatcher
Subscribe to:
Posts (Atom)