Hamming weight

Article on other languages:

del.icio.us del.icio.us
Digg Digg
Furl Furl
Reddit Reddit
Rojo Rojo
Add to OnlyWire

The Hamming weight of a string is the number of symbols that are different from the zero-symbol of the alphabet used.

It is thus equivalent to the Hamming distance from the all-zero string of the same length. For the most typical case, a string of bits, this is the number of 1's in the string. In this binary case, it is also called the population count, or popcount.

Contents

Examples

alphabet string hamming weight
0,1 11101 4
0,1 11101000 4
0,1 00000000 0
' ',a-z hello world 10

History and usage

The Hamming weight is named after Richard Hamming. It is used in several disciplines including information theory, coding theory, and cryptography.

Efficient implementation

The population count of a bitstring is often needed in cryptography and other applications. The problem of how to implement it efficiently has been widely studied. Some processors have a single command to calculate it, and some have parallel operations on bit vectors. For processors lacking those features, the best solutions known are based on adding counts in a tree pattern. For example, to count the number of 1 bits in the 16-bit binary number A=0110110010111010, these operations can be done:

Expression Binary Decimal Comment
A 0110110010111010 The original number
B = A & 01 01 01 01 01 01 01 01 01 00 01 00 00 01 00 00 1,0,1,0,0,1,0,0 every other bit from A
C = (A >> 1) & 01 01 01 01 01 01 01 01 00 01 01 00 01 01 01 01 0,1,1,0,1,1,1,1 the remaining bits from A
D = B + C 01 01 10 00 01 10 01 01 1,1,2,0,1,2,1,1 list giving # of 1s in each 2-bit piece of A
E = D & 0011 0011 0011 0011 0001 0000 0010 0001 1,0,2,1 every other count from D
F = (D >> 2) & 0011 0011 0011 0011 0001 0010 0001 0001 1,2,1,1 the remaining counts from D
G = E + F 0010 0010 0011 0010 2,2,3,2 list giving # of 1s in each 4-bit piece of A
H = G & 00001111 00001111 00000010 00000010 2,2 every other count from G
I = (G >> 4) & 00001111 00001111 00000010 00000011 2,3 the remaining counts from G
J = H + I 00000100 00000101 4,5 list giving # of 1s in each 8-bit piece of A
K = J & 0000000011111111 0000000000000101 5 every other count from J
L = (J >> 8) & 0000000011111111 0000000000000100 4 the remaining counts from J
M = K + L 0000000000001001 9 the final answer

Here, the operations are as in C, so X >> Y means to shift X right by Y bits, X & Y means the bitwise AND of X and Y, and + is ordinary addition. The best algorithms known for this problem are based on the concept illustrated above and are given here:

//types and constants used in the functions below

typedef unsigned __int64 uint64;  //assume this gives 64-bits
const uint64 m1  = 0x5555555555555555; //binary: 0101...
const uint64 m2  = 0x3333333333333333; //binary: 00110011..
const uint64 m4  = 0x0f0f0f0f0f0f0f0f; //binary:  4 zeros,  4 ones ...
const uint64 m8  = 0x00ff00ff00ff00ff; //binary:  8 zeros,  8 ones ...
const uint64 m16 = 0x0000ffff0000ffff; //binary: 16 zeros, 16 ones ...
const uint64 m32 = 0x00000000ffffffff; //binary: 32 zeros, 32 ones
const uint64 hff = 0xffffffffffffffff; //binary: all ones
const uint64 h01 = 0x0101010101010101; //the sum of 256 to the power of 0,1,2,3...

//This is a naive implementation, shown for comparison,
//and to help in understanding the better functions.
//It uses 24 arithmetic operations (shift, add, and).
int popcount_1(uint64 x) {
    x = (x & m1 ) + ((x >>  1) & m1 ); //put count of each  2 bits into those  2 bits 
    x = (x & m2 ) + ((x >>  2) & m2 ); //put count of each  4 bits into those  4 bits 
    x = (x & m4 ) + ((x >>  4) & m4 ); //put count of each  8 bits into those  8 bits 
    x = (x & m8 ) + ((x >>  8) & m8 ); //put count of each 16 bits into those 16 bits 
    x = (x & m16) + ((x >> 16) & m16); //put count of each 32 bits into those 32 bits 
    x = (x & m32) + ((x >> 32) & m32); //put count of each 64 bits into those 64 bits 
    return x;
}

//This uses fewer arithmetic operations than any other known  
//implementation on machines with slow multiplication.
//It uses 17 arithmetic operations.
int popcount_2(uint64 x) {
    x -= (x >> 1) & m1;             //put count of each 2 bits into those 2 bits
    x = (x & m2) + ((x >> 2) & m2); //put count of each 4 bits into those 4 bits 
    x = (x + (x >> 4)) & m4;        //put count of each 8 bits into those 8 bits 
    x += x >>  8;  //put count of each 16 bits into their lowest 8 bits
    x += x >> 16;  //put count of each 32 bits into their lowest 8 bits
    x += x >> 32;  //put count of each 64 bits into their lowest 8 bits
    return x & 0x7f;
}

//This uses fewer arithmetic operations than any other known  
//implementation on machines with fast multiplication.
//It uses 12 arithmetic operations, one of which is a multiply.
int popcount_3(uint64 x) {
    x -= (x >> 1) & m1;             //put count of each 2 bits into those 2 bits
    x = (x & m2) + ((x >> 2) & m2); //put count of each 4 bits into those 4 bits 
    x = (x + (x >> 4)) & m4;        //put count of each 8 bits into those 8 bits 
    return (x * h01)>>56;  //returns left 8 bits of x + (x<<8) + (x<<16) + (x<<24) + ... 
}

The above implementations have the best worst-case behavior of any known algorithm. However, if a number is known to have most of its bits 0 (or most of its bits 1), then there are faster algorithms. They are based on the fact that the bitwise AND of a number X and the number X-1 will be the same as X with the rightmost 1 bit set to 0. For example:

Expression Value
X 0 1 0 0 0 1 0 0 0 1 0 0 0 0
X - 1 0 1 0 0 0 1 0 0 0 0 1 1 1 1
X & (X - 1) 0 1 0 0 0 1 0 0 0 0 0 0 0 0

Subtracting 1 changes the rightmost string of 0s to 1s, and changes the rightmost 1 to a 0. The AND then removes that rightmost 1. If X originally had N bits that were 1, then after only N iterations of this operation, X will be reduced to zero. The following are based on this principle.

//This is better when most bits in x are 0
//It uses 3 arithmetic operations and one comparison/branch per "1" bit in x.
int popcount_4(uint64 x) {
    uint64 count;
    for (count=0; x; count++)
        x &= x-1;
    return count;
}

//This is better if most bits in x are 0.
//It uses 2 arithmetic operations and one comparison/branch  per "1" bit in x.
//It is the same as the previous function, but with the loop unrolled.
#define f(y) if ((x &= x-1) == 0) return y;
int popcount_5(uint64 x) {
    if (x == 0) return 0;
    f( 1) f( 2) f( 3) f( 4) f( 5) f( 6) f( 7) f( 8)
    f( 9) f(10) f(11) f(12) f(13) f(14) f(15) f(16)
    f(17) f(18) f(19) f(20) f(21) f(22) f(23) f(24)
    f(25) f(26) f(27) f(28) f(29) f(30) f(31) f(32)
    f(33) f(34) f(35) f(36) f(37) f(38) f(39) f(40)
    f(41) f(42) f(43) f(44) f(45) f(46) f(47) f(48)
    f(49) f(50) f(51) f(52) f(53) f(54) f(55) f(56)
    f(57) f(58) f(59) f(60) f(61) f(62) f(63)
    return 64;
}

//Use this instead if most bits in x are 1 instead of 0
#define f(y) if ((x |= x+1) == hff) return 64-y;

Language Support

In C++ STL, the bit-array data structure bitset has a count() method that counts the number of bits that are set.

In Java, the growable bit-array data structure BitSet has a BitSet.cardinality() method that counts the number of bits that are set. In addition, there are Integer.bitCount(int) and Long.bitCount(long) functions to count bits in primitive 32-bit and 64-bit integers, respectively. Also, the BigInteger arbitrary-precision integer class also has a BigInteger.bitCount() method that counts bits.

See also

External links

This article is from Wikipedia. All text is available under the terms of the GNU Free Documentation License.


Giant Panda

Mercedes Car
James Bond Guide
This site monitored by SitePinger.net