The Entire World of Bit Twiddling Hacks
Several months back, we posted link to Stanford bit twiddling website in Fastest Way to Reverse Complement a Sequence. Now thanks to ?@maxal6, we found many other useful ones. Links to all are posted below.
1. Stanford bit twiddling hacks  Sean Eron Anderson
Contents
About the operation counting methodology
Compute the sign of an integer
Detect if two integers have opposite signs
Compute the integer absolute value (abs) without branching
Compute the minimum (min) or maximum (max) of two integers without branching
Determining if an integer is a power of 2
Sign extending
Sign extending from a constant bitwidth
Sign extending from a variable bitwidth
Sign extending from a variable bitwidth in 3 operations
Conditionally set or clear bits without branching
Conditionally negate a value without branching
Merge bits from two values according to a mask
Counting bits set
Counting bits set, naive way
Counting bits set by lookup table
Counting bits set, Brian Kernighan’s way
Counting bits set in 14, 24, or 32bit words using 64bit instructions
Counting bits set, in parallel
Count bits set (rank) from the mostsignificant bit upto a given position
Select the bit position (from the mostsignificant bit) with the given count (rank)
Computing parity (1 if an odd number of bits set, 0 otherwise)
Compute parity of a word the naive way
Compute parity by lookup table
Compute parity of a byte using 64bit multiply and modulus division
Compute parity of word with a multiply
Compute parity in parallel
Swapping Values
Swapping values with subtraction and addition
Swapping values with XOR
Swapping individual bits with XOR
Reversing bit sequences
Reverse bits the obvious way
Reverse bits in word by lookup table
Reverse the bits in a byte with 3 operations (64bit multiply and modulus division)
Reverse the bits in a byte with 4 operations (64bit multiply, no division)
Reverse the bits in a byte with 7 operations (no 64bit, only 32)
Reverse an Nbit quantity in parallel with 5 * lg(N) operations
Modulus division (aka computing remainders)
Computing modulus division by 1 « s without a division operation (obvious)
Computing modulus division by (1 « s)  1 without a division operation
Computing modulus division by (1 « s)  1 in parallel without a division operation
Finding integer log base 2 of an integer (aka the position of the highest bit set)
Find the log base 2 of an integer with the MSB N set in O(N) operations (the obvious way)
Find the integer log base 2 of an integer with an 64bit IEEE float
Find the log base 2 of an integer with a lookup table
Find the log base 2 of an Nbit integer in O(lg(N)) operations
Find the log base 2 of an Nbit integer in O(lg(N)) operations with multiply and lookup
Find integer log base 10 of an integer
Find integer log base 10 of an integer the obvious way
Find integer log base 2 of a 32bit IEEE float
Find integer log base 2 of the pow(2, r)root of a 32bit IEEE float (for unsigned integer r)
Counting consecutive trailing zero bits (or finding bit indices)
Count the consecutive zero bits (trailing) on the right linearly
Count the consecutive zero bits (trailing) on the right in parallel
Count the consecutive zero bits (trailing) on the right by binary search
Count the consecutive zero bits (trailing) on the right by casting to a float
Count the consecutive zero bits (trailing) on the right with modulus division and lookup
Count the consecutive zero bits (trailing) on the right with multiply and lookup
Round up to the next highest power of 2 by float casting
Round up to the next highest power of 2
Interleaving bits (aka computing Morton Numbers)
Interleave bits the obvious way
Interleave bits by table lookup
Interleave bits with 64bit multiply
Interleave bits by Binary Magic Numbers
Testing for ranges of bytes in a word (and counting occurances found)
Determine if a word has a zero byte
Determine if a word has a byte equal to n
Determine if a word has byte less than n
Determine if a word has a byte greater than n
Determine if a word has a byte between m and n
Compute the lexicographically next bit permutation
2. The Aggregate Magic Algorithms
Absolute Value of a Float
Alignment of Pointers
Average of Integers
Bit Reversal
Comparison of Float Values
Comparison to Mask Conversion
Divide Rounding
DualLinked List with One Pointer Field
GPU Any
GPU SyncBlocks
Gray Code Conversion
Integer Constant Multiply
Integer Minimum or Maximum
Integer Power
Integer Selection
Is Power of 2
Leading Zero Count
Least Significant 1 Bit
Log2 of an Integer
Next Largest Power of 2
Most Significant 1 Bit
Natural Data Type Precision Conversions
Polynomials
Population Count (Ones Count)
Same Within Tolerance
ShiftandAdd Optimization
Sign Extension
Swap Values Without a Temporary
SIMD Within A Register (SWAR) Operations
Trailing Zero Count
x86 gems
4x4 by 4x1 matrix multiply
63 bit CRC
Absoulte Value of
Add with 255 saturation
Aligned Fill
Bit Block Inversion
Bit Transposition algorithm
BSWAP with 16bit registers
CodeAlign  Align your code
Convert hex to ASCII
Converting DX:AX into a base 10 string
Copy Zero Flag to Carry Flag
Copying data using the FPU
Disable the Cache(s)
Dividing 64bit unsigned integers
Dividing signed integers by powers of 2
Doing a NOP
Exchange two values
Fast Float to Int
Fast Nibble swap
Fast strlen()
Filling a register with the Carry Flag
Find common divisor
Find minimum of two unsigned values
Get Parity of 32bit word
Halt  the loop
Hadamard Transform in MMX
IF using AND
Inverse squareroot
LEA a power command
MOVS with SI=X, DI=+X
Negate multipleword numbers
Penalties on the 486
Performance monitoring
Population count of a 32bit register
Pseudo Random number generator
Quadprecision shift
Quad Adders
Reversing bit order
Round signed integers
SALC  Set AL on Carry
Shifting + rounding upwards
Shifting bit arrays
Sinus Generator  40 bytes
Small bubble sort
Smaller loops
Snippets from Ramblings in Realtime
Square root < 20 cycles
Testing for zero
The ADCGem
ulong2ascii
x86 Mnemonic Replacements
Emulate Bit scan instructions
Emulate BSF instruction
Fast BSF replacement
Remove shift by one latency
Replace Bit scan instructions
Replace MOVSX/MOVZX
Usage of JCXZ/JCXNZ
PC Platform Gems
A BIOS delay
Flat Real Mode
Flat Real Mode with EMM
Writing a string
Writing a twodigit number
PC VGA Gems
DRAMs Speed Up/Booster
Drawing pixels
Palette Programming
PC Tiny Programs
15byte Dump Meg to STDOUT
20byte Starfield
56byte Sierpinski’s Triangle Generator
61byte Mandelbrot generator
65byte Life Simulator
78byte Fade screen
79byte Copper bar x 3
80byte Pi calculator
84byte Sample player
122byte Quality fire
Motorola Gems
Absoulte Value of
Clear / Fill memory fast
Insert sort
Jump tables
Optimize for MC68k CPUs
4. JJ’s Source code for low level binary functions
bitasm.h: conditionally include inline asm versions of various functions.
Define BITS_PER_LONG:
Many of the files #include
bitsperlong.h which #defines BITS_PER_LONG according to sizeof(long).
The type ‘unsigned long’ is abbreviated ‘ulong’ throughout, this is done in fxttypes.h via
typedef unsigned long ulong;
Inline assembler (for i386 and AMD64):
bitasmi386.h: i386 inline asm.
bitasmamd64.h: AMD64 inline asm.
Combinatorial routines:
bitsubset.h: generate all bitsubsets of a word.
bitsubsetgray.h: all bitsubsets of a word, Gray code order.
bitlex.h: generate all bit sets in (subset)lex order.
bitslgray.h: Gray code corresponding to subsetlex order (SLGray order).
bintoslgray.h: conversion from binary to SLGray order.
bitcombcolex.h: generate all bitcombinations in colex order.
bitcombcolex.h: generate all bitcombinations in colex order.
bitcombminchange.h: generate all bitcombinations in minimalchange order (Gray code).
fibrepsubsetlexrev.h: Fibonacci representations in subsetlex order.
bitfibgray: Gray code for the Fibonacci words.
bitrll2.h: run length limited (RLL) words and Gray code for the Fibonacci words.
parenwords.h: determine whether binary word corresponds to a wellformed parenthesizing.
bitnecklace.h: generate all binary necklaces.
bitcombshifts.h: generate all combinations in shifts order.
thuemorse.h: ThueMorse sequence.
grsnegative.h: whether GolayRudinShapiro sequence negative at an index.
Arithmetic and representation in nonstandard bases:
average.h: average of two integers avoiding overflow.
evenodd.h: next_even(), next_odd() etc.
negbin.h: conversion from and to radix 2.
radixm4.h: conversion from and to radix 4.
radix2i.h: conversion from and to the complex radix (2*i).
radixm1pi.h: conversion from and to the complex radix (1+i).
bin2naf.h: nonadjacent form: sparse signed binary representation.
fibrep.h: Fibonacci representations.
bitsequency.h: generate all words with given number of edges.
bit2adic.h: 2adic inverse and square root.
Spacefilling curves:
hilbert.h: 1dim coord. to 2dim Hilbert coord.
zorder.h: 1dim coord. to 2dim/3dim space filling curve, Zorder.
bitpaperfold.h: paperfolding sequence, also turns of the dragon curve.
bitdragon3.h: turns of the terdragon curve.
bitdragonr4.h: turns of the R4dragon curve.
bitdragonr5.h: turns of the R5dragon curve.
bitdragonr7.h: turns of the R7dragon curve.
bitdragonr9.h: turns of the R9dragon curve.
bitdragonr13.h: turns of the R13dragon curve.
Functions about cyclic rotations, necklaces, and Lyndon words:
bitrotate.h: rotating words.
bitnecklace.h: generate all binary necklaces.
bitcyclicminmax.h: min and max among bitwise rotations.
bitcyclicperiod.h: periodicity of bitwise rotations.
bitcyclicdist.h: distance among bitwise rotations.
bitcyclicmatch.h: matching bitwise rotations.
bitcyclicxor.h: rotatexor operation and its inverse.
bitperiodic.h: creating a periodic word.
Gray code, parity, and invertible transformations:
graycode.h: graycode and its inverse.
parity.h: parity.
revgraycode.h: reversed Gray code and its inverse.
graypower.h: Gray code powered.
bittransforms.h: invertible transforms of binary words: blue, yellow, cyan, and red code.
bluefixedpoints.h: fixed points fo the blue code.
bitxtransforms.h: invertible transforms Kroneckerpowered.
Cyclic Redundancy Check (CRC):
crc64.h: 64 bit CRC .
crc32.h: 32 bit CRC.
tcrc64.h: CRC with lookup table, 64 bit.
pcrc64.h: parallel CRC, 64 bit.
Mixing colors (8bit RGB channels):
colormix.h: functions that add/average/multiply/mix color channels.
colormixp.h: versions of some of the above funcs that are correct to the last bit.
colormixfl.h: color manipulation with floating point scaling.
Permutations of bits:
bitzip.h: even/odd bits to/from low/high bits.
bitzippairs.h: even/odd bits to/from low/high bits.
bitbutterfly.h: auxiliary routine for bitzip.
bitswap.h: swapping groups of bits.
revbin.h: reversing the order of bits in a word.
revbinupd.h: counting in bitreversed order.
bitgather.h: bit gather/scatter.
bitseparate.h: separate bits according to a mask.
Isolation of low or high bits and blocks:
bitlow.h: functions that isolate/find the lowest set bit.
bitlowedge.h: create 10 or 01 edge at the lowest set bit.
bithigh.h: functions that isolate/find the highest set bit.
bithighedge.h: create 10 or 01 edge at the highest set bit.
bitisolate.h: isolation of bits/blocks at low and high end, and at 0/1 and 1/0 transitions.
bit2pow.h: functions like floor(log2()).
bitldeq.h: comparing base2 logarithms of two words.
All the rest (low level functions):
bitcount.h: population count: counting the bits in a word.
bitblock.h: generating a bit block.
bitsubsetq.h: return whether a word is a bitsubset of another word.
ithoneidx.h: index of the ith set bit in a word.
bittest.h: testing and changing bits of a word.
bitcopy.h: copying bits of a word.
branchless.h: code that avoids branches.
zerobyte.h: finding zero bytes in a word.
tinyfactors.h: determining whether d divides x for d,x<BITS_PER_LONG.
cswap.h: conditional swap.
5. Other sources shared by University of Kentucky site 
_a collection of mostly lowlevel tricks and
otherwise interesting ideas, some quite old_
a page full of tricks very similar in spirit to those here
_an exceptionally complete coverage of Athlonoriented
magic at both the C and x86 assembly levels_
Intel Pentium 4 Processor Optimization Reference Manual
_Intel’s equivalents to the AMD document above.
very x86oriented (including MMX, SSE, and SSE2)_

Technical Site Index for azillionmonkeys
lots of stuff, including hacks in x86 and C
a really nice piece of work by Doug Jones…
book online
is a book that seems to talk about many of the same issues discussed
on this page
by Jasper Neumann, with very nice diagrams explaining them