In this context, the term "anti-trace" refers to detecting the mode in which the processor causes a #DB exception after every instruction boundary or in the case of branch tracing, a #DB exception after each successful branch is taken, due to RFLAGS.TF being set.
Please note that under long-mode and windows x64 that the first method I am about to describe will in fact work under the wow64 subsystem, however it only works because of the long-mode implementation. It's best to leave these methods in 64 bit mode code only, because they will not function without long-mode and windows x64.
Nested task bit
The nested task bit of RFLAGS was used and set by the legacy task switching system when the processor would transfer control through a task-gate, a task segment or the actual TSS descriptor in the GDT itself. When this happened, the processor would set EFLAGS.NT. This would enable a subsequent IRET to use the TSS backlink selector to return to the previous task. However like the segmented memory model, the hardware task switching mechanism was hardly used. The only purpose it served in most cases was for storing stack pointers for different privilege levels.
In 64-bit mode, most of the segmented memory model was done away with. Except for GS, FS and system descriptors, the base is always treated as zero, and the displacement offset usually found in a general purpose register is treated as the actual linear address.
The same goes for the hardware task switching model. In long-mode it's purpose is to hold stack pointers for each CPL change and an IST which can be used for secure stacks when needed for NMI's etc.
Thus since there is no TSS backlink for an IRET to dispatch to, nor is the hardware task switching mechanism even available in long-mode, an IRET with RFLAGS.NT=1 will cause a general protection exception. In user-mode, depending on the scenario, these are usually dispatched as STATUS_ACCESS_VIOLATION (0xC0000005).
Now when the trap flag is set for a task we know that a debug exception will occur at the next instruction boundary, the trap flag is also set on the interrupt handler stack RFLAGS image, however... prior to dispatching the exception the kernel will mask the trap flag in the RFLAGS image to 0. As you already know then, this requires the user level debugger code to call SetThreadContext to reenable the trap flag to continue single-stepping (or branch tracing). However an interesting thing occurs in x64 kernels as we have a look at PspSetContext. This function is part of the APC routine used to modify a thread's context on it's saved trap-frame.
If the CONTEXT_CONTROL flag is specified in the ContextFlags member (which it needs to be in order to mask on RFLAGS.TF to continue single-stepping), PspSetContext will mask off RFLAGS.NT each time it's called. This means that if we are single-stepping over an IRET which has RFLAGS.NT=1 no general protection fault will be generated, otherwise it will be.
Here is another interesting scenario, this isn't just limited to detecting tracing. Notice how PspSetContext will mask off RFLAGS.NT each time the APC is queued to the thread and the CONTEXT_CONTROL flag is set? CONTEXT_CONTROL is not only used for RFLAGS it is used for the instruction pointer as well as other general purpose registers. Lets say somewhere during the initialization of our program we set RFLAGS.NT. Then somewhere down the road we use the IRET gp fault mechanism to cause some indirection. If at any time a debugger has re-adjusted the context of our thread with CONTEXT_CONTROL (which it would need to do for int3 ;p), we can assume a debugger is attached because RFLAGS.NT will no longer be set and therefore no GP fault will be generated.
Hopefully you see how this goes beyond just a simple anti-tracing mechanism to a pretty powerful anti-debugging trick altogether.
The second is based off of the exact same logic we just discussed, except in this case it is applied to RFLAGS.AC. When this flag is set it causes an alignment check fault when the task attempts to access data that is not a multiple of the operand offset. For example the following instruction would cause an alignment check fault if RFLAGS.AC was masked on:
mov rax, qword ptr [rsp+04h]
However following the same logic with our above discussion, this flag is also masked off each time PspSetContext is called. Thus if we were stepping over it, it would not generate an alignment check fault. The same logic also applies if PspSetContext is called at any point after RFLAGS.AC is set, it will be unmasked, and not cause a fault at the desired location.
An important thing to note however is that the first mechanism we described today (Nested task bit) will work within wow64. However the x64 kernel will not dispatch alignment faults that are generated in user-mode within the context of a wow64 process. Instead it will simply mask off RFLAGS.AC and IRET to the faulted instruction. This is why these methods should be left strictly to code that runs in a 64 bit process.