When syscall/sysret were introduced with the AMD64 architecture it addressed a few needed improvements. Firstly, sysenter did not save a return address, to which 32 bit Windows always returned to a predefined location. Secondly, IF was cleared on a sysenter to allow an uninterrupted trap frame saving, but TF was not. Much like how a CPL 3 -> CPL 0 call gate transfer with TF set would introduce an INT 01 originating from privilege level 0, sysenter had the same issue.
Now contrary to some of the nonsense I've seen online, Windows does not use call gates. Therefore in 32 bit Windows the only instance this could happen was on a sysenter. Nonetheless however, Windows needed a hack for user-mode single-stepping through sysenter, so the INT 01 handler has a check to see if the instruction pointer on the interrupt stack is that of KiFastCallEntry. Secondly, KiServiceExit will check to see if TF is set in the frame and then does an IRET to the return address, with TF set. A trap is raised after the next instruction following the IRET, this is why if you step over sysenter in 32 bit Windows, it appears to skip an instruction though it really does not.
When syscall came along, this issue was also fixed by adding a flag mask which could be used to mask off the trap flag on a system call entry. RFlags is saved in the r11 register, and then reloaded again by sysret. In fact, sysret is special, if TF is set in r11, sysret will raise a #DB after its instruction boundary unlike loading an RFlags image from the stack with TF set using IRET. This makes for a smooth single-step operation over a syscall instruction.
The magic happens because r11 is volatile in this case and shows the state of RFlags after sysret branches back to user-mode.
Pretty simple eh?
mov eax, 0ffffffffh //invalid syscall
syscall
test r11d, 0100h
jnz single_step_detected
ret