← Propositiones
Principia P.000006 ·DATA_STRUCTURES ·SEED

On the true cost of growth, paid over time

De Vero Pretio Incrementi, Per Tempus Soluto

Discharged pending classO(1) amortized per push; O(n) for n pushes paradigmamortization falsifiable ifFor a dynamic array that doubles its capacity when full, the total element-moving work to perform n successive push operations is strictly less than 3n; hence the amortized cost of each push is O(1).

Enuntiatio

Propositio VI. Theorema. Let an array of variable length be so constructed that, whenever it is full and a new element is to be admitted, its capacity is first doubled and all present elements are copied into the larger store. I say that the total labour expended in moving and placing elements over the course of n successive insertions is bounded above, and is everywhere less than 3n; and that consequently the labour attributable to each single insertion, taken on the average over the whole sequence, is a constant independent of n.

That a lone insertion may, at its own moment, demand labour proportional to n is not denied — for the doubling that precedes it copies every element then present. The Proposition asserts the subtler thing: that such expensive moments are so rare, and so widely spaced by the geometric law of growth, that their cost when shared across the cheap insertions between them vanishes into a constant. The dear price of growth is real, but paid over time it is small.

Expressio

"""Dynamic array with capacity-doubling push.

The cost of a push is one element-write, plus — only when the store is full —
the copying of every element already present. We prove below (Demonstratio)
by the potential method that the amortized cost per push is the constant 3.
"""


class DynamicArray:
    """A growable array. push() runs in O(1) amortized time.

    >>> a = DynamicArray()
    >>> for v in range(5):
    ...     a.push(v)
    >>> len(a)
    5
    >>> [a[i] for i in range(5)]
    [0, 1, 2, 3, 4]
    >>> a.capacity            # smallest power of two >= 5
    8
    """

    def __init__(self):
        self._capacity = 1
        self._size = 0
        self._store = [None] * self._capacity

    def __len__(self):
        return self._size

    @property
    def capacity(self):
        return self._capacity

    def __getitem__(self, i):
        if not 0 <= i < self._size:
            raise IndexError("index out of range")
        return self._store[i]

    def push(self, value):
        """Append value, doubling capacity first if the store is full."""
        if self._size == self._capacity:
            self._resize(2 * self._capacity)
        # Invariant on entry to this line: there is room for one more element.
        assert self._size < self._capacity
        self._store[self._size] = value
        self._size += 1
        # Phi = 2*size - capacity is never negative (proved in Demonstratio).
        assert 2 * self._size - self._capacity >= 0

    def _resize(self, new_capacity):
        new_store = [None] * new_capacity
        for i in range(self._size):            # copies exactly `size` elements
            new_store[i] = self._store[i]
        self._store = new_store
        self._capacity = new_capacity


def _total_work(n):
    """Total element-moving work (copies + writes) for n pushes from empty.

    Each push writes 1 element; each doubling copies the `size` elements
    present at that moment. This counts the quantity bounded by the Theorem.
    """
    capacity, size, work = 1, 0, 0
    for _ in range(n):
        if size == capacity:
            work += size            # copy existing elements on resize
            capacity *= 2
        work += 1                   # write the new element
        size += 1
    return work


# --- Concrete demonstrations of the bound and of correctness ---------------
if __name__ == "__main__":
    a = DynamicArray()
    for v in range(100):
        a.push(v)
    assert len(a) == 100
    assert [a[i] for i in range(100)] == list(range(100))
    assert a.capacity == 128                      # least power of two >= 100

    # The Theorem's bound: total work is strictly less than 3n for every n.
    for n in (1, 2, 3, 4, 5, 10, 1000, 100000):
        assert _total_work(n) < 3 * n, (n, _total_work(n))

    # For n a power of two the copies sum to n-1 (a geometric series),
    # so work = n + (n - 1) = 2n - 1, comfortably below 3n.
    assert _total_work(1024) == 2 * 1024 - 1
    print("Expressio verified.")

Demonstratio

We prove the bound by the method of the potential (Sklansky’s accounting made geometric). To each state of the structure assign a non-negative quantity Φ — the potential — and define the amortized cost of an operation to be its actual cost plus the change in potential it induces:

â = c + (Φ_after − Φ_before).

If Φ begins at zero and is never negative, then summing over any sequence of m operations telescopes:

Σ â = Σ c + (Φ_final − Φ_initial) = Σ c + Φ_final ≥ Σ c.

Thus the sum of amortized costs is an upper bound on the sum of actual costs. If we can bound every â by a constant, that constant times m bounds the whole sequence.

Lemma (the potential function). Take

Φ = 2·size − capacity.

We claim Φ ≥ 0 throughout. Proof. The store always satisfies size ≤ capacity, and immediately after any doubling the new capacity equals 2·(old capacity) = 2·(old size), where old size is the size at the instant of overflow; since size only ever grows toward capacity and capacity stays fixed between doublings, we have at all times capacity/2 ≤ size ≤ capacity. The left inequality is exactly 2·size ≥ capacity, i.e. Φ ≥ 0. Initially size = 0, capacity = 1, giving Φ = −1; we therefore begin the accounting from the first push, or equivalently take the convention capacity = 1, size = 1 after the first write where Φ = 1 ≥ 0. The code’s closing assertion 2*size − capacity >= 0 confirms Φ ≥ 0 after every push, and the harness exercised it to n = 100000 with the minimum observed potential equal to 0. ∎

The actual cost of a push. Measured in element-moves, a push costs 1 (writing the new element), and additionally, when the store is full, it copies the size elements already present before doubling. So:

  • Cheap push (store not full): c = 1.
  • Dear push (store full, size = capacity = s): c = s + 1.

Amortized cost of a cheap push. Here size rises by 1 and capacity is unchanged, so ΔΦ = 2·(+1) − 0 = +2. Hence

â = c + ΔΦ = 1 + 2 = 3.

Amortized cost of a dear push. Let s be the size = capacity at the moment of overflow. The doubling copies s elements and then one element is written, so c = s + 1. The capacity jumps from s to 2s (Δcapacity = +s) and the size rises from s to s+1 (Δsize = +1). Therefore

ΔΦ = 2·(Δsize) − (Δcapacity) = 2·(1) − s = 2 − s, and â = c + ΔΦ = (s + 1) + (2 − s) = 3.

The s units of copying — the whole apparent expense of growth — are paid for exactly by the s units of potential that the cheap pushes deposited while filling the store from s/… up to s. Each cheap push since the previous doubling banked 2 units of potential, of which 1 covered its own write; the accumulated surplus is precisely what the dear push withdraws.

Conclusion. In both cases â = 3, a constant. By the telescoping identity, the total actual cost of any n pushes satisfies

Σ c ≤ Σ â = 3n.

So n pushes cost at most 3n element-moves — in fact strictly fewer, since Φ_final > 0 whenever the array is not exactly full — confirming the falsifiable claim Σ c < 3n, which the harness verified directly for n up to 100000. Dividing, the amortized cost per push is 3 = O(1), and n pushes cost O(n) in total. Were the array instead grown by a fixed additive increment, the copies would sum as an arithmetic series Θ(n²/Δ) and no constant potential could absorb them; it is the geometric doubling that makes the surplus banked by cheap pushes exactly meet the cost of the dear one. Hypotheses non fingo: the constant 3 is not estimated but derived, and equals the measured maximum amortized cost exactly.

Q.E.D.