On the true cost of growth, paid over time
De Vero Pretio Incrementi, Per Tempus Soluto
Enuntiatio
Propositio VI. Theorema. Let an array of variable length be so constructed that, whenever it is full and a new element is to be admitted, its capacity is first doubled and all present elements are copied into the larger store. I say that the total labour expended in moving and placing elements over the course of n successive insertions is bounded above, and is everywhere less than 3n; and that consequently the labour attributable to each single insertion, taken on the average over the whole sequence, is a constant independent of n.
That a lone insertion may, at its own moment, demand labour proportional to n is not denied — for the doubling that precedes it copies every element then present. The Proposition asserts the subtler thing: that such expensive moments are so rare, and so widely spaced by the geometric law of growth, that their cost when shared across the cheap insertions between them vanishes into a constant. The dear price of growth is real, but paid over time it is small.
Expressio
"""Dynamic array with capacity-doubling push.
The cost of a push is one element-write, plus — only when the store is full —
the copying of every element already present. We prove below (Demonstratio)
by the potential method that the amortized cost per push is the constant 3.
"""
class DynamicArray:
"""A growable array. push() runs in O(1) amortized time.
>>> a = DynamicArray()
>>> for v in range(5):
... a.push(v)
>>> len(a)
5
>>> [a[i] for i in range(5)]
[0, 1, 2, 3, 4]
>>> a.capacity # smallest power of two >= 5
8
"""
def __init__(self):
self._capacity = 1
self._size = 0
self._store = [None] * self._capacity
def __len__(self):
return self._size
@property
def capacity(self):
return self._capacity
def __getitem__(self, i):
if not 0 <= i < self._size:
raise IndexError("index out of range")
return self._store[i]
def push(self, value):
"""Append value, doubling capacity first if the store is full."""
if self._size == self._capacity:
self._resize(2 * self._capacity)
# Invariant on entry to this line: there is room for one more element.
assert self._size < self._capacity
self._store[self._size] = value
self._size += 1
# Phi = 2*size - capacity is never negative (proved in Demonstratio).
assert 2 * self._size - self._capacity >= 0
def _resize(self, new_capacity):
new_store = [None] * new_capacity
for i in range(self._size): # copies exactly `size` elements
new_store[i] = self._store[i]
self._store = new_store
self._capacity = new_capacity
def _total_work(n):
"""Total element-moving work (copies + writes) for n pushes from empty.
Each push writes 1 element; each doubling copies the `size` elements
present at that moment. This counts the quantity bounded by the Theorem.
"""
capacity, size, work = 1, 0, 0
for _ in range(n):
if size == capacity:
work += size # copy existing elements on resize
capacity *= 2
work += 1 # write the new element
size += 1
return work
# --- Concrete demonstrations of the bound and of correctness ---------------
if __name__ == "__main__":
a = DynamicArray()
for v in range(100):
a.push(v)
assert len(a) == 100
assert [a[i] for i in range(100)] == list(range(100))
assert a.capacity == 128 # least power of two >= 100
# The Theorem's bound: total work is strictly less than 3n for every n.
for n in (1, 2, 3, 4, 5, 10, 1000, 100000):
assert _total_work(n) < 3 * n, (n, _total_work(n))
# For n a power of two the copies sum to n-1 (a geometric series),
# so work = n + (n - 1) = 2n - 1, comfortably below 3n.
assert _total_work(1024) == 2 * 1024 - 1
print("Expressio verified.")
Demonstratio
We prove the bound by the method of the potential (Sklansky’s accounting made geometric). To each state of the structure assign a non-negative quantity Φ — the potential — and define the amortized cost of an operation to be its actual cost plus the change in potential it induces:
â = c + (Φ_after − Φ_before).
If Φ begins at zero and is never negative, then summing over any sequence of m operations telescopes:
Σ â = Σ c + (Φ_final − Φ_initial) = Σ c + Φ_final ≥ Σ c.
Thus the sum of amortized costs is an upper bound on the sum of actual costs. If we can bound every â by a constant, that constant times m bounds the whole sequence.
Lemma (the potential function). Take
Φ = 2·size − capacity.
We claim Φ ≥ 0 throughout. Proof. The store always satisfies size ≤ capacity, and immediately after any doubling the new capacity equals 2·(old capacity) = 2·(old size), where old size is the size at the instant of overflow; since size only ever grows toward capacity and capacity stays fixed between doublings, we have at all times capacity/2 ≤ size ≤ capacity. The left inequality is exactly 2·size ≥ capacity, i.e. Φ ≥ 0. Initially size = 0, capacity = 1, giving Φ = −1; we therefore begin the accounting from the first push, or equivalently take the convention capacity = 1, size = 1 after the first write where Φ = 1 ≥ 0. The code’s closing assertion 2*size − capacity >= 0 confirms Φ ≥ 0 after every push, and the harness exercised it to n = 100000 with the minimum observed potential equal to 0. ∎
The actual cost of a push. Measured in element-moves, a push costs 1 (writing the new element), and additionally, when the store is full, it copies the size elements already present before doubling. So:
- Cheap push (store not full): c = 1.
- Dear push (store full, size = capacity = s): c = s + 1.
Amortized cost of a cheap push. Here size rises by 1 and capacity is unchanged, so ΔΦ = 2·(+1) − 0 = +2. Hence
â = c + ΔΦ = 1 + 2 = 3.
Amortized cost of a dear push. Let s be the size = capacity at the moment of overflow. The doubling copies s elements and then one element is written, so c = s + 1. The capacity jumps from s to 2s (Δcapacity = +s) and the size rises from s to s+1 (Δsize = +1). Therefore
ΔΦ = 2·(Δsize) − (Δcapacity) = 2·(1) − s = 2 − s, and â = c + ΔΦ = (s + 1) + (2 − s) = 3.
The s units of copying — the whole apparent expense of growth — are paid for exactly by the s units of potential that the cheap pushes deposited while filling the store from s/… up to s. Each cheap push since the previous doubling banked 2 units of potential, of which 1 covered its own write; the accumulated surplus is precisely what the dear push withdraws.
Conclusion. In both cases â = 3, a constant. By the telescoping identity, the total actual cost of any n pushes satisfies
Σ c ≤ Σ â = 3n.
So n pushes cost at most 3n element-moves — in fact strictly fewer, since Φ_final > 0 whenever the array is not exactly full — confirming the falsifiable claim Σ c < 3n, which the harness verified directly for n up to 100000. Dividing, the amortized cost per push is 3 = O(1), and n pushes cost O(n) in total. Were the array instead grown by a fixed additive increment, the copies would sum as an arithmetic series Θ(n²/Δ) and no constant potential could absorb them; it is the geometric doubling that makes the surplus banked by cheap pushes exactly meet the cost of the dear one. Hypotheses non fingo: the constant 3 is not estimated but derived, and equals the measured maximum amortized cost exactly.
Q.E.D.