码农pilot的个人博客

0%

Java源码阅读 - ArrayList

做技术,不能只知其然而不知其所以然。在知道了工具的原理之后,才能更高效的使用这个工具。在程序的世界里,源码里面没有秘密,看懂了源码,也就看懂了原理。

这次就来阅读一下ArrayList的源码。

类的声明

1
2
3
public class ArrayList<E>
extends AbstractList<E>
implements List<E>, RandomAccess, Cloneable, java.io.Serializable { ... }

上面代码声明了一个叫ArrayList的泛型类,继承了AbstractList,并实现了ListRandomAccessCloneableSerializable接口。

AbstractList抽象类提供了一个“骨架”级别的List接口的实现,用来减少实现一个支持随机存储的List的工作量。

RandomAccess中没有声明任何方法,是一个标记接口(marker interface),表明了这个类支持快速(通常是O(1)时间复杂度)的随机存取。在遍历一个集合前,可以用instanceof判断这个集合是否实现了RandomAccess,来选择合适的遍历方法。

Cloneable也是一个标记接口,表明了这个类允许使用Object.clone()命令进行属性到属性的复制。

Serializable也是一个标记接口,表明在这个类上启用Java的序列化功能。

如何存储数据

1
2
3
4
5
6
7
8
9
10
11
12
13
14
/**
* The array buffer into which the elements of the ArrayList are stored.
* The capacity of the ArrayList is the length of this array buffer. Any
* empty ArrayList with elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA
* will be expanded to DEFAULT_CAPACITY when the first element is added.
*/
transient Object[] elementData; // non-private to simplify nested class access

/**
* The size of the ArrayList (the number of elements it contains).
*
* @serial
*/
private int size;

elementData数组用来实际存放数据,ArrayList的空间(capacity)对应这个数组的长度(size)。ArrayList实现了自己的序列化(ArrayList#writeObject())和反序列化(ArrayList#readObject())方法,所以加上transient关键字来使elementData不参与Java自带的序列化和反序列化过程。

size成员变量记录当前ArrayList中元素的数量。

构造方法

ArrayList有三个构造方法

  • 使用默认大小的ArrayList()
  • 指定最初大小的ArrayList(int initialCapacity)
  • 根据一个给定集合来初始化的ArrayList(Collection<? extends E> c)

使用默认大小

类中首先指定了默认的大小

1
2
3
4
/**
* Default initial capacity.
*/
private static final int DEFAULT_CAPACITY = 10;

但是,在它下面,还有这么一个东西:

1
2
3
4
5
6
/**
* Shared empty array instance used for default sized empty instances. We
* distinguish this from EMPTY_ELEMENTDATA to know how much to inflate when
* first element is added.
*/
private static final Object[] DEFAULTCAPACITY_EMPTY_ELEMENTDATA = {};

在最初被构造时,elementData会先指向DEFAULTCAPACITY_EMPTY_ELEMENTDATA,而不是直接创建一个容量为10的数组。

1
2
3
4
5
6
/**
* Constructs an empty list with an initial capacity of ten.
*/
public ArrayList() {
this.elementData = DEFAULTCAPACITY_EMPTY_ELEMENTDATA;
}

这样做的好处在于可以更合理的利用空间。试想一下,如果某个场景中需要创建5个ArrayList备用,如果直接就分配好空间的话,那么就会消耗掉至少50个元素所需要的空间。所以Java选择先将elementData指向一个空数组,在向ArrayList中添加数据时,再去创建合适大小的数组。

指定最初大小

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
/**
* Constructs an empty list with the specified initial capacity.
*
* @param initialCapacity the initial capacity of the list
* @throws IllegalArgumentException if the specified initial capacity
* is negative
*/
public ArrayList(int initialCapacity) {
if (initialCapacity > 0) {
this.elementData = new Object[initialCapacity];
} else if (initialCapacity == 0) {
this.elementData = EMPTY_ELEMENTDATA;
} else {
throw new IllegalArgumentException("Illegal Capacity: "+
initialCapacity);
}
}

当指定的大小是一个正整数时,Java会创建好对应大小的数组,并将elementData指向这个数组;如果指定的大小为零,那么Java也会将elementData指向一个共享的空数组EMPTY_ELEMENTDATA,注意这个空数组与上文提到的不是同一个;如果指定的大小为负数,则抛出一个异常。

那么为什么要专门把EMPTY_ELEMENTDATADEFAULTCAPACITY_EMPTY_ELEMENTDATA区分出来呢?DEFAULTCAPACITY_EMPTY_ELEMENTDATA的JavaDoc是这么说的:

We distinguish this from EMPTY_ELEMENTDATA to know how much to inflate when first element is added.
我们将它与EMPTY_ELEMENTDATA区分开来,是方便在添加第一个元素时计算要扩张多少空间。

根据给定的集合初始化

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
/**
* Constructs a list containing the elements of the specified
* collection, in the order they are returned by the collection's
* iterator.
*
* @param c the collection whose elements are to be placed into this list
* @throws NullPointerException if the specified collection is null
*/
public ArrayList(Collection<? extends E> c) {
elementData = c.toArray();
if ((size = elementData.length) != 0) {
// c.toArray might (incorrectly) not return Object[] (see 6260652)
if (elementData.getClass() != Object[].class)
elementData = Arrays.copyOf(elementData, size, Object[].class);
} else {
// replace with empty array.
this.elementData = EMPTY_ELEMENTDATA;
}
}

程序首先试图调用给定集合的Collection#toArray()方法,将集合转换成一个Object[]数组。

当数组中有元素时,检查elementData的数据类型是否为Object[]类型,如果不是则使用Arrays.copyOf()方法重新复制元素到一个Object[]对象中;而当数组中没有元素时,则重新使elementData指向EMPTY_ELEMENTDATA

添加元素

当添加元素时,首先会调用ensureCapacityInternal()方法,来保证空间足够。保证有足够空间后,就会向elementData[size]处放置被添加的元素,并且使size加一。

1
2
3
4
5
6
7
8
9
10
11
/**
* Appends the specified element to the end of this list.
*
* @param e element to be appended to this list
* @return <tt>true</tt> (as specified by {@link Collection#add})
*/
public boolean add(E e) {
ensureCapacityInternal(size + 1); // Increments modCount!!
elementData[size++] = e;
return true;
}

扩容

ensureCapacityInternal()方法用于确保在添加元素时有足够的空间。如果空间不足,则会调用grow()方法扩容。

grow()方法会将elementData扩张为当前的1.5倍空间,并使用Arrays.copyOf()方法将元素放入新的数组。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
/**
* 确保空间
*/
private void ensureCapacityInternal(int minCapacity) {
ensureExplicitCapacity(calculateCapacity(elementData, minCapacity));
}

/**
* 计算扩容目标
*/
private static int calculateCapacity(Object[] elementData, int minCapacity) {
if (elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {
return Math.max(DEFAULT_CAPACITY, minCapacity);
}
return minCapacity;
}

private void ensureExplicitCapacity(int minCapacity) {
modCount++;
// overflow-conscious code
// 检查目标容量是否大于当前已有容量
if (minCapacity - elementData.length > 0)
grow(minCapacity);
}

/**
* Increases the capacity to ensure that it can hold at least the
* number of elements specified by the minimum capacity argument.
*
* 增加容量,以确保至少可以容纳minCapacity所指定个数的元素
*
* @param minCapacity the desired minimum capacity 目标最小容量
*/
private void grow(int minCapacity) {
// overflow-conscious code
int oldCapacity = elementData.length;

// newCapacity = olcCapacity + (oldCapacity / 2)
int newCapacity = oldCapacity + (oldCapacity >> 1);
if (newCapacity - minCapacity < 0)
newCapacity = minCapacity;
if (newCapacity - MAX_ARRAY_SIZE > 0)
newCapacity = hugeCapacity(minCapacity);
// minCapacity is usually close to size, so this is a win:
elementData = Arrays.copyOf(elementData, newCapacity);
}

删除元素

ArrayList提供了两种方式来删除一个元素:根据元素位置(index)删除,和匹配元素删除。

根据位置删除

根据位置删除时,首先会检查给定的位置是否越界。如果没有越界,就会先取出被删除的元素,用来向调用方返回。

删除元素的方法是将index+1后面的元素重新放在index起始的位置上。可以看出,删除操作的消耗是比较高的。

在重新排列元素后,数组中最后一个元素将与倒数第二个元素重复。所以还需要将最后一个元素置为null,并将size减一。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
/**
* Removes the element at the specified position in this list.
* Shifts any subsequent elements to the left (subtracts one from their
* indices).
*
* @param index the index of the element to be removed
* @return the element that was removed from the list
* @throws IndexOutOfBoundsException {@inheritDoc}
*/
public E remove(int index) {
rangeCheck(index);
modCount++;
E oldValue = elementData(index);

// 计算要移动的元素数量
int numMoved = size - index - 1;
if (numMoved > 0)
System.arraycopy(
// 源
elementData,
// 源位置
index+1,
// 目标
elementData,
// 目标位置
index,
// 要复制的个数
numMoved);
elementData[--size] = null; // clear to let GC do its work
return oldValue;
}

匹配元素删除

如果向remove()方法提供了一个对象,那么ArrayList会遍历elementData,并会删除第一个与给定对象匹配的元素。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
/**
* Removes the first occurrence of the specified element from this list,
* if it is present. If the list does not contain the element, it is
* unchanged. More formally, removes the element with the lowest index
* <tt>i</tt> such that
* <tt>(o==null&nbsp;?&nbsp;get(i)==null&nbsp;:&nbsp;o.equals(get(i)))</tt>
* (if such an element exists). Returns <tt>true</tt> if this list
* contained the specified element (or equivalently, if this list
* changed as a result of the call).
*
* @param o element to be removed from this list, if present
* @return <tt>true</tt> if this list contained the specified element
*/
public boolean remove(Object o) {
if (o == null) {
for (int index = 0; index < size; index++)
if (elementData[index] == null) {
fastRemove(index);
return true;
}
} else {
for (int index = 0; index < size; index++)
if (o.equals(elementData[index])) {
fastRemove(index);
return true;
}
}
return false;
}

/*
* Private remove method that skips bounds checking and does not
* return the value removed.
*/
private void fastRemove(int index) {
modCount++;
int numMoved = size - index - 1;
if (numMoved > 0)
System.arraycopy(elementData, index+1, elementData, index,
numMoved);
elementData[--size] = null; // clear to let GC do its work
}

缩减容量

ArrayList#trimToSize()方法可以将ArrayList的容量缩减至当前元素个数。这个操作需要通过Arrays.copyOf()方法进行,所以成本也是比较高的。

1
2
3
4
5
6
7
8
9
10
11
12
13
/**
* Trims the capacity of this <tt>ArrayList</tt> instance to be the
* list's current size. An application can use this operation to minimize
* the storage of an <tt>ArrayList</tt> instance.
*/
public void trimToSize() {
modCount++;
if (size < elementData.length) {
elementData = (size == 0)
? EMPTY_ELEMENTDATA
: Arrays.copyOf(elementData, size);
}
}

Fail fast

在会改变elementData大小的方法中,经常会看到类似modCount++这样的操作。那么这个操作的目的是什么呢?

首先来看看modCount成员变量的JavaDoc是怎么说的。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
/**
* The number of times this list has been <i>structurally modified</i>.
* Structural modifications are those that change the size of the
* list, or otherwise perturb it in such a fashion that iterations in
* progress may yield incorrect results.
*
* <p>This field is used by the iterator and list iterator implementation
* returned by the {@code iterator} and {@code listIterator} methods.
* If the value of this field changes unexpectedly, the iterator (or list
* iterator) will throw a {@code ConcurrentModificationException} in
* response to the {@code next}, {@code remove}, {@code previous},
* {@code set} or {@code add} operations. This provides
* <i>fail-fast</i> behavior, rather than non-deterministic behavior in
* the face of concurrent modification during iteration.
*
* <p><b>Use of this field by subclasses is optional.</b> If a subclass
* wishes to provide fail-fast iterators (and list iterators), then it
* merely has to increment this field in its {@code add(int, E)} and
* {@code remove(int)} methods (and any other methods that it overrides
* that result in structural modifications to the list). A single call to
* {@code add(int, E)} or {@code remove(int)} must add no more than
* one to this field, or the iterators (and list iterators) will throw
* bogus {@code ConcurrentModificationExceptions}. If an implementation
* does not wish to provide fail-fast iterators, this field may be
* ignored.
*/
protected transient int modCount = 0;

也就是说,modCount记录了一个List的结构被修改的次数,并且提到了如果在迭代过程中修改了List的结构,那么可能会导致得到错误的结果。

在迭代或者序列化的过程中,程序会检查modCount的值是否被修改过,如果被修改,就会抛出ConcurrentModificationException异常。比如ArrayList.Itr#next()方法:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
@SuppressWarnings("unchecked")
public E next() {
checkForComodification();
int i = cursor;
if (i >= size)
throw new NoSuchElementException();
Object[] elementData = ArrayList.this.elementData;
if (i >= elementData.length)
throw new ConcurrentModificationException();
cursor = i + 1;
return (E) elementData[lastRet = i];
}

final void checkForComodification() {
if (modCount != expectedModCount)
throw new ConcurrentModificationException();
}

序列化与反序列化

如上文所说,ArrayList实现了自己的序列化与反序列化方法,所以elementData使用transient修饰。

在序列化时,程序并不是直接序列化elementData这个数组,而是只取出数组中有效的元素(包括null元素),并逐个序列化每个元素的对象。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
/**
* Save the state of the <tt>ArrayList</tt> instance to a stream (that
* is, serialize it).
*
* @serialData The length of the array backing the <tt>ArrayList</tt>
* instance is emitted (int), followed by all of its elements
* (each an <tt>Object</tt>) in the proper order.
*/
private void writeObject(java.io.ObjectOutputStream s)
throws java.io.IOException{
// Write out element count, and any hidden stuff
int expectedModCount = modCount;
s.defaultWriteObject();
// Write out size as capacity for behavioural compatibility with clone()
s.writeInt(size);
// Write out all elements in the proper order.
for (int i=0; i<size; i++) {
s.writeObject(elementData[i]);
}
if (modCount != expectedModCount) {
throw new ConcurrentModificationException();
}
}

在反序列化时,首先会使elementData指向EMPTY_ELEMENTDATA,只在有元素会被反序列化时,才会为elementData扩容并逐个反序列化对应的对象。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
/**
* Reconstitute the <tt>ArrayList</tt> instance from a stream (that is,
* deserialize it).
*/
private void readObject(java.io.ObjectInputStream s)
throws java.io.IOException, ClassNotFoundException {
elementData = EMPTY_ELEMENTDATA;
// Read in size, and any hidden stuff
s.defaultReadObject();
// Read in capacity
s.readInt(); // ignored
if (size > 0) {
// be like clone(), allocate array based upon size not capacity
int capacity = calculateCapacity(elementData, size);
SharedSecrets.getJavaOISAccess().checkArray(s, Object[].class, capacity);
ensureCapacityInternal(size);
Object[] a = elementData;
// Read in all elements in the proper order.
for (int i=0; i<size; i++) {
a[i] = s.readObject();
}
}
}