The Entire World of Bit Twiddling Hacks

The Entire World of Bit Twiddling Hacks


Several months back, we posted link to Stanford bit twiddling website in Fastest Way to Reverse Complement a Sequence. Now thanks to ?@maxal6, we found many other useful ones. Links to all are posted below.

1. Stanford bit twiddling hacks - Sean Eron Anderson

Contents

About the operation counting methodology

Compute the sign of an integer

Detect if two integers have opposite signs

Compute the integer absolute value (abs) without branching

Compute the minimum (min) or maximum (max) of two integers without branching

Determining if an integer is a power of 2

Sign extending

Sign extending from a constant bit-width

Sign extending from a variable bit-width

Sign extending from a variable bit-width in 3 operations

Conditionally set or clear bits without branching

Conditionally negate a value without branching

Merge bits from two values according to a mask

Counting bits set

Counting bits set, naive way

Counting bits set by lookup table

Counting bits set, Brian Kernighan’s way

Counting bits set in 14, 24, or 32-bit words using 64-bit instructions

Counting bits set, in parallel

Count bits set (rank) from the most-significant bit upto a given position

Select the bit position (from the most-significant bit) with the given count (rank)

Computing parity (1 if an odd number of bits set, 0 otherwise)

Compute parity of a word the naive way

Compute parity by lookup table

Compute parity of a byte using 64-bit multiply and modulus division

Compute parity of word with a multiply

Compute parity in parallel

Swapping Values

Swapping values with subtraction and addition

Swapping values with XOR

Swapping individual bits with XOR

Reversing bit sequences

Reverse bits the obvious way

Reverse bits in word by lookup table

Reverse the bits in a byte with 3 operations (64-bit multiply and modulus division)

Reverse the bits in a byte with 4 operations (64-bit multiply, no division)

Reverse the bits in a byte with 7 operations (no 64-bit, only 32)

Reverse an N-bit quantity in parallel with 5 * lg(N) operations

Modulus division (aka computing remainders)

Computing modulus division by 1 « s without a division operation (obvious)

Computing modulus division by (1 « s) - 1 without a division operation

Computing modulus division by (1 « s) - 1 in parallel without a division operation

Finding integer log base 2 of an integer (aka the position of the highest bit set)

Find the log base 2 of an integer with the MSB N set in O(N) operations (the obvious way)

Find the integer log base 2 of an integer with an 64-bit IEEE float

Find the log base 2 of an integer with a lookup table

Find the log base 2 of an N-bit integer in O(lg(N)) operations

Find the log base 2 of an N-bit integer in O(lg(N)) operations with multiply and lookup

Find integer log base 10 of an integer

Find integer log base 10 of an integer the obvious way

Find integer log base 2 of a 32-bit IEEE float

Find integer log base 2 of the pow(2, r)-root of a 32-bit IEEE float (for unsigned integer r)

Counting consecutive trailing zero bits (or finding bit indices)

Count the consecutive zero bits (trailing) on the right linearly

Count the consecutive zero bits (trailing) on the right in parallel

Count the consecutive zero bits (trailing) on the right by binary search

Count the consecutive zero bits (trailing) on the right by casting to a float

Count the consecutive zero bits (trailing) on the right with modulus division and lookup

Count the consecutive zero bits (trailing) on the right with multiply and lookup

Round up to the next highest power of 2 by float casting

Round up to the next highest power of 2

Interleaving bits (aka computing Morton Numbers)

Interleave bits the obvious way

Interleave bits by table lookup

Interleave bits with 64-bit multiply

Interleave bits by Binary Magic Numbers

Testing for ranges of bytes in a word (and counting occurances found)

Determine if a word has a zero byte

Determine if a word has a byte equal to n

Determine if a word has byte less than n

Determine if a word has a byte greater than n

Determine if a word has a byte between m and n

Compute the lexicographically next bit permutation

2. The Aggregate Magic Algorithms

Absolute Value of a Float

Alignment of Pointers

Average of Integers

Bit Reversal

Comparison of Float Values

Comparison to Mask Conversion

Divide Rounding

Dual-Linked List with One Pointer Field

GPU Any

GPU SyncBlocks

Gray Code Conversion

Integer Constant Multiply

Integer Minimum or Maximum

Integer Power

Integer Selection

Is Power of 2

Leading Zero Count

Least Significant 1 Bit

Log2 of an Integer

Next Largest Power of 2

Most Significant 1 Bit

Natural Data Type Precision Conversions

Polynomials

Population Count (Ones Count)

Same Within Tolerance

Shift-and-Add Optimization

Sign Extension

Swap Values Without a Temporary

SIMD Within A Register (SWAR) Operations

Trailing Zero Count

3. The Assembly gems page

x86 gems

4x4 by 4x1 matrix multiply

63 bit CRC

Absoulte Value of

Add with 255 saturation

Aligned Fill

Bit Block Inversion

Bit Transposition algorithm

BSWAP with 16-bit registers

CodeAlign - Align your code

Convert hex to ASCII

Converting DX:AX into a base 10 string

Copy Zero Flag to Carry Flag

Copying data using the FPU

Disable the Cache(s)

Dividing 64-bit unsigned integers

Dividing signed integers by powers of 2

Doing a NOP

Exchange two values

Fast Float to Int

Fast Nibble swap

Fast strlen()

Filling a register with the Carry Flag

Find common divisor

Find minimum of two unsigned values

Get Parity of 32-bit word

Halt - the loop

Hadamard Transform in MMX

IF using AND

Inverse squareroot

LEA a power command

MOVS with SI=-X, DI=+X

Negate multiple-word numbers

Penalties on the 486

Performance monitoring

Population count of a 32-bit register

Pseudo Random number generator

Quad-precision shift

Quad Adders

Reversing bit order

Round signed integers

SALC - Set AL on Carry

Shifting + rounding upwards

Shifting bit arrays

Sinus Generator - 40 bytes

Small bubble sort

Smaller loops

Snippets from Ramblings in Realtime

Square root < 20 cycles

Testing for zero

The ADC-Gem

ulong2ascii

x86 Mnemonic Replacements

Emulate Bit scan instructions

Emulate BSF instruction

Fast BSF replacement

Remove shift by one latency

Replace Bit scan instructions

Replace MOVSX/MOVZX

Usage of JCXZ/JCXNZ

PC Platform Gems

A BIOS delay

Flat Real Mode

Flat Real Mode with EMM

Writing a string

Writing a two-digit number

PC VGA Gems

DRAMs Speed Up/Booster

Drawing pixels

Palette Programming

PC Tiny Programs

15-byte Dump Meg to STDOUT

20-byte Starfield

56-byte Sierpinski’s Triangle Generator

61-byte Mandelbrot generator

65-byte Life Simulator

78-byte Fade screen

79-byte Copper bar x 3

80-byte Pi calculator

84-byte Sample player

122-byte Quality fire

Motorola Gems

Absoulte Value of

Clear / Fill memory fast

Insert sort

Jump tables

Optimize for MC68k CPUs

4. JJ’s Source code for low level binary functions

bitasm.h: conditionally include inline asm versions of various functions.

Define BITS_PER_LONG:

Many of the files #include

bitsperlong.h which #defines BITS_PER_LONG according to sizeof(long).

The type ‘unsigned long’ is abbreviated ‘ulong’ throughout, this is done in fxttypes.h via

typedef unsigned long ulong;

Inline assembler (for i386 and AMD64):

bitasm-i386.h: i386 inline asm.

bitasm-amd64.h: AMD64 inline asm.

Combinatorial routines:

bitsubset.h: generate all bit-subsets of a word.

bitsubset-gray.h: all bit-subsets of a word, Gray code order.

bitlex.h: generate all bit sets in (subset-)lex order.

bit-sl-gray.h: Gray code corresponding to subset-lex order (SL-Gray order).

bin-to-sl-gray.h: conversion from binary to SL-Gray order.

bitcombcolex.h: generate all bit-combinations in colex order.

bitcombcolex.h: generate all bit-combinations in colex order.

bitcombminchange.h: generate all bit-combinations in minimal-change order (Gray code).

fibrep-subset-lexrev.h: Fibonacci representations in subset-lex order.

bitfibgray: Gray code for the Fibonacci words.

bit-rll2.h: run length limited (RLL) words and Gray code for the Fibonacci words.

parenwords.h: determine whether binary word corresponds to a well-formed parenthesizing.

bit-necklace.h: generate all binary necklaces.

bitcombshifts.h: generate all combinations in shifts order.

thue-morse.h: Thue-Morse sequence.

grsnegative.h: whether Golay-Rudin-Shapiro sequence negative at an index.

Arithmetic and representation in non-standard bases:

average.h: average of two integers avoiding overflow.

evenodd.h: next_even(), next_odd() etc.

negbin.h: conversion from and to radix -2.

radix-m4.h: conversion from and to radix -4.

radix-2i.h: conversion from and to the complex radix (2*i).

radix-m1pi.h: conversion from and to the complex radix (-1+i).

bin2naf.h: non-adjacent form: sparse signed binary representation.

fibrep.h: Fibonacci representations.

bitsequency.h: generate all words with given number of edges.

bit2adic.h: 2-adic inverse and square root.

Space-filling curves:

hilbert.h: 1dim coord. to 2dim Hilbert coord.

zorder.h: 1dim coord. to 2dim/3dim space filling curve, Z-order.

bit-paper-fold.h: paper-folding sequence, also turns of the dragon curve.

bit-dragon3.h: turns of the terdragon curve.

bit-dragon-r4.h: turns of the R4-dragon curve.

bit-dragon-r5.h: turns of the R5-dragon curve.

bit-dragon-r7.h: turns of the R7-dragon curve.

bit-dragon-r9.h: turns of the R9-dragon curve.

bit-dragon-r13.h: turns of the R13-dragon curve.

Functions about cyclic rotations, necklaces, and Lyndon words:

bitrotate.h: rotating words.

bit-necklace.h: generate all binary necklaces.

bitcyclic-minmax.h: min and max among bitwise rotations.

bitcyclic-period.h: periodicity of bitwise rotations.

bitcyclic-dist.h: distance among bitwise rotations.

bitcyclic-match.h: matching bitwise rotations.

bitcyclic-xor.h: rotate-xor operation and its inverse.

bitperiodic.h: creating a periodic word.

Gray code, parity, and invertible transformations:

graycode.h: graycode and its inverse.

parity.h: parity.

revgraycode.h: reversed Gray code and its inverse.

graypower.h: Gray code powered.

bittransforms.h: invertible transforms of binary words: blue-, yellow-, cyan-, and red code.

blue-fixed-points.h: fixed points fo the blue code.

bitxtransforms.h: invertible transforms Kronecker-powered.

Cyclic Redundancy Check (CRC):

crc64.h: 64 bit CRC .

crc32.h: 32 bit CRC.

tcrc64.h: CRC with lookup table, 64 bit.

pcrc64.h: parallel CRC, 64 bit.

Mixing colors (8-bit RGB channels):

colormix.h: functions that add/average/multiply/mix color channels.

colormixp.h: versions of some of the above funcs that are correct to the last bit.

colormix-fl.h: color manipulation with floating point scaling.

Permutations of bits:

bitzip.h: even/odd bits to/from low/high bits.

bitzip-pairs.h: even/odd bits to/from low/high bits.

bitbutterfly.h: auxiliary routine for bit-zip.

bitswap.h: swapping groups of bits.

revbin.h: reversing the order of bits in a word.

revbin-upd.h: counting in bit-reversed order.

bitgather.h: bit gather/scatter.

bitseparate.h: separate bits according to a mask.

Isolation of low or high bits and blocks:

bitlow.h: functions that isolate/find the lowest set bit.

bitlow-edge.h: create 10 or 01 edge at the lowest set bit.

bithigh.h: functions that isolate/find the highest set bit.

bithigh-edge.h: create 10 or 01 edge at the highest set bit.

bit-isolate.h: isolation of bits/blocks at low and high end, and at 0/1 and 1/0 transitions.

bit2pow.h: functions like floor(log2()).

bitldeq.h: comparing base-2 logarithms of two words.

All the rest (low level functions):

bitcount.h: population count: counting the bits in a word.

bitblock.h: generating a bit block.

bitsubsetq.h: return whether a word is a bit-subset of another word.

ith-one-idx.h: index of the i-th set bit in a word.

bittest.h: testing and changing bits of a word.

bitcopy.h: copying bits of a word.

branchless.h: code that avoids branches.

zerobyte.h: finding zero bytes in a word.

tinyfactors.h: determining whether d divides x for d,x<BITS_PER_LONG.

cswap.h: conditional swap.

5. Other sources shared by University of Kentucky site -

_a collection of mostly low-level tricks and

otherwise interesting ideas, some quite old_

a page full of tricks very similar in spirit to those here

_an exceptionally complete coverage of Athlon-oriented

magic at both the C and x86 assembly levels_

Intel Pentium 4 Processor Optimization Reference Manual

_Intel’s equivalents to the AMD document above.

very x86-oriented (including MMX, SSE, and SSE2)_

lots of stuff, including hacks in x86 and C

a really nice piece of work by Doug Jones…

book online

is a book that seems to talk about many of the same issues discussed

on this page

by Jasper Neumann, with very nice diagrams explaining them



Written by M. //