A couple of weeks ago I went hunting for a better way to compute x!=0
on x86. Eventually, I came up with a cute carry-flag trick and blogged about it.
(Note: I’m not branching on this comparison — that would be easy. Instead I want the value of the comparison in a general-purpose register. I should have made this explicitly clear in my original post. Alas, I did not. Doh.)
My goal was to avoid using setcc
, because partial-register writes are the devil.
Try as I might, I couldn’t imagine a way generalize my solution so that it would also work for x==0
. Someone suggested that I try the GNU superoptimizer (PDF, code), so I did.
At first I was a bit disappointed that the superoptimizer didn’t discover my sequence for x!=0
. I think, maybe, the cost heuristics are outdated. (It should model xor reg,reg
as being really cheap†.)
Turns out that the superoptimizer is still really clever anyway. It was a source of some great ideas. I’m delighted with what “we” came up with for x==0
:
Old method; naive and literal:
85 c9 test ecx, ecx
0f 94 c0 sete al
0f b6 c0 movzx eax, al
New method:
31 c0 xor eax, eax
83 f9 01 cmp ecx, 1
11 c0 adc eax, eax
Once again the new method avoids the setcc
and thus avoids insert semantics. As a nice bonus, we save a byte of code.
†This doesn’t actually depend on the input register at all. It’s essentially a “load-zero” instruction. Modern processors understand this and schedule accordingly.
Latest Comments