Binary Indexed Tree

The structure was first used for data compression. Now it is often used for:

  • storing frequencies
  • manipulatin cumulative frequency tables

Basic problem

we have n boxes, possible queries are

  • add marble to box i
  • sum marbles from box k to box l

Naive array would have O(1) and O(n).

Using Binary indexed trees the query 2 can be O(log n)

Notation

  • BIT
  • MaxVal
  • f[i] - freq for index i
  • c[i] - cumulative freq for index i (f[1] + f[2] + ... + f[i])
  • tree[i] - sum of frequencies stored in BIT, aka tree frequency

Basic idea

Each integer can be represented as sum of powers of two.(二进制)

idx is some index, Let r be the position of the last digit 1 in binary notation.

tree[idx] = f[idx - 2^r + 1] + ..... + f[idx]

suppose we are looking for cumulative frequency of index 13(1101).

c[1101] = tree[1101] + tree[1100] + tree[1000]

isolate last digit

x & ~(x-1)

Operations

Read cumulative frequency

def read(idx):
    sum = 0
    while idx:
        sum += tree[idx]
        idx -= (idx & ~(idx-1))
    return sum

time: O(log MaxVal)

change single frequency at some position and update tree

update tree frequency at all indexes that are responsible for frequency whose value are changing.

In changing, we should increment value at the current index, add the last digit to index and go on while the index <= maxVal

def update(idx, val):
    while idx <= MaxVal:
        tree[idx] += val
        idx += (idx & ~(idx-1))

time: O(log MaxVal)

Read the actual frequency at a position

  1. One approach is to have one additional array.

  2. f[idx] = read[idx] - read[idx-1] -> 2 * O(log n)

def readSingle(idx):
    sum = tree[idx]
    while idx: # special case 0
        stop_idx = idx - (idx & ~(idx-1))
        idx -= 1 # get idx - 1
        while idx != stop_idx:
            sum -= tree[idx]
            idx -= (idx & ~(idx-1))

scaling the entire tree by a constant factor

def scale(c):
    for i in range(1, MaxVal + 1):
        tree[i] = tree[i] / c

find index with given cumulative frequency

Similar as binary search

def find(target):
    idx = 0
    bitMask = 1 << 31
    while bitMask and idx < MaxVal:
        mid = idx + bitMask # mid point
        if target == tree[mid]:
            return mid
        elif target > tree[mid]: # mid should be included in the cumulative
            idx = mid # higher half
            target -= tree[mid]
        bitMask = bitMask >> 1 # half the range
    if target != 0:
        return -1
    else:
        return idx

2D BIT

Suppose we have a plane with dots, you can make 3 queries:

  1. set dot at (x, y)
  2. remove dot from (x, y)
  3. count number of dots in rectangle(0,0), (x, y)

In this case, each element of the tree will contain array - tree[max_x][max_y]

Updating indexes of x-coordinates is the same as before. For example, suppose we are setting/removing dot (a, b), we will call update(a, b, 1)/update(a, b, -1).

def update(x, y, val):
    while x <= max_x:
        updatey(x, y, val)
        x += (x & ~(x-1))

def updatey(x, y, val):
    while y <= max_y:
        tree[x][y] += val
        y += (y & ~(y-1))

can be written as one function

def update(x, y, val):
    while x <= max_x:
        y1 = y
        while y1 <= max_y:
            tree[x][y] += val
            y1 += (y1 & ~(y1-1))
        x += (x & ~(x-1))

read function is changed similarly

Conclusion

  • easy to code
  • each query takes constant or logarithmic time
  • require linear memory space
  • can use it as an n-dimensional data structure