Most implementations of the ArrayList use an array internally and when the size is already exhausted upon adding an element to the list, it resizes or “grows” by essentially doing the following:
- caching a new array with a new batch of freshly allocated memory.
- copying all the elements of the internal array into the new array.
- setting the internal array to the new array.
- setting index
N - 1
of the internal array to the element object, whereN
is the new size of the array.
The explanation provided is that growing the list is a rare necessity for your average add operation so the time complexity of the average add is O(1)
, hence amortized constant time.
I am confused as to how this makes sense. Say the list grows by Q
. Simple arithmetic series will show you that if I were to add x
elements to an ArrayList, the total number of element copies done internally is x^2 + Qx / 2Q
, if x
were several times larger than Q
.
Sure, for the first few values that are added the time may very well be constant, but for a large enough number of elements added we’re seeing the average time complexity for each add operation to be linear or O(n)
. Hence, adding a large number of elements to the list takes exponential time. I don’t understand how even the amortized time complexity of a single add operation is constant. Is there something I’m missing?
EDIT: I did not realize that the list growth is in fact geometric and this optimizes the amortized time complexity.
Conclusion:
Linear Growth of Dynamic List
Let N = kQ
For N + 1
insertions
Copies:
Q + 2Q + 3Q + … + kQ = (k / 2)(2Q + (k - 1)Q) = (k / 2)(Q + kQ) = (kQ + k^2 * Q) / 2 -> kQ + k^2 * Q
Element Initializations:
Q + 2Q + 3Q + 4Q + … + (k + 1) * Q = ((k + 1) / 2)(2Q + kQ) = (k^2 * Q + 2kQ + 2Q + kQ) / 2 -> k^2 * Q + 3kQ + 2Q
Cheap Insertions:
kQ + 1 -> kQ
Total Cost: 2Q * k^2 + 5kQ + 2Q
Amortized Cost of Each Insertion:
2k + 5 + 2 / k -> 2k + 2 / k -> O(N / Q) -> O(N)
Geometric Growth of Dynamic List
Let N = Q^k
For N + 1
insertions
Copies:
1 + Q + Q^2 + … + Q^k = (1 - Q^(k + 1)) / (1 - Q) -> Q^k
Element Initializations:
1 + Q + Q^2 + … + Q^(k + 1) = (1 - Q^(k + 2)) / (1 - Q) -> Q^(k + 1)
Cheap Insertions:
Q^k + 1 -> Q^k
Total Cost: 2Q^k + Q^(k + 1)
Amortized Cost of Each Insertion:
2 + Q -> O(1)
Comparison
The geometric resizing/growth of the array is constant time while linear resizing is linear time. It is interesting to compare the two methods of growth to see the performance difference and why ArrayLists are chosen to grow geometrically.
Advertisement
Answer
Without loss of generality, suppose the initial capacity of the list is 1. We further assume that the capacity is doubled each time the capacity is exceeded by an insertion. Now consider inserting 2^k + 1
elements (which is the general worst case, since the last operation triggers the dynamic growing).
There are k
insertions that trigger dynamic growing, and their accumulative cost is
1 + 2 + 4 + 8 + ... + 2^k = 2^(k+1) - 1
The accumulative cost of the other “cheap” insertions is 2^k - k + 1
.
But we are interested in the amortized complexity, hence we have to average over all 2^k + 1
operations:
(2^(k+1) + 2^k - k) / (2^k + 1) < (2^(k+1) + 2^k - k) / 2^k = 2 + 1 - k/2^k = O(1)
Therefore, inserting 2^(k+1)
elements into the list has an amortized time complexity of O(1) per insertion, and the constant factor approaches 3. Inserting any other number of elements into the list cannot be worse, so the amortized time complexity per insertion is O(1) in general.