Constant time subtype checks
From JopWiki
Contents |
[edit] Efficient subtype checks and compressed interface tables
As JOP is a real-time architecture, one should try to minimize the worst-case time for operations. Subtype checks in particular are unfortunate, because assuming the longest possible search is rather pessimistic. Taking the type hierarchy into account for execution time analysis is possible, but cumbersome. Therefore, it would be interesting to use techniques for constant time subtype checks, which should also be applicable to interface table compression.
[edit] Reading
A nice article on constant time subtype checks is [Efficient subtyping tests with PQ-encoding].
[edit] Preliminaries
We write S <: T to denote that S is a subtype of T, and S != T.
In Java and following [click], the primary types form a single-inheritance type tree, while adding the secondary types leads to a multiple-inheritance type DAG. Primary types are classes, arrays of primitive types and arrays of classes. Secondary types are interfaces and arrays of interfaces.
Note that
int[] <: Object Object[] <: Object Object[][] <: Object[] int[][] <: Object[] S <: T ==> S[] <: T[] S <: T ==> S[][] <: T[][]
where S, T are arbitrary reference types (also interfaces).
[edit] Single inheritance checks
Without dynamic class loading, constant time subtype checks for single inheritance are amazingly simple.
One caveat in Java is that the set of types does not correspond to the set of classes and interfaces, because arrays enjoy (unfortunate) subtype relations.
[edit] Cohen's algorithm
One of the first useful constant time subtype checks [cohen]. The idea is to assign each type an id and a level, the latter being the depth of the type in the type tree. Different types on the same level have different ids. Then it is sufficient to keep an array associating levels with the id of the supertype, for each class.
[edit] Relative numbering
If you have a type tree it is easy to assign an interval to each node, such that the intervals of siblings are disjoints, those of children are contained in those of the parents, but those of parents are strictly larger than those of their children. Now if intervals have this property, then
S <: T if and only if [S.low,S.high] is in [T.low,T.high] S <: T if and only if T.low <= S.low <= T.high
This is called relative numbering, or Schubert's numbering. It needs constant space (2 words) and no compression techniques.
The subtype test is
subtype(S,T) return(T.low <= S.low && S.low <= T.high)
As noted in [zibin], when testing whether S <: T, and T is known at compiler time, only the low bound needs to be fetched from the subtype, and high bounds can be completely inlined.
[edit] Multiple inheritance checks
[edit] Compressed subtype matrices
For N types, a N x N subtype matrix A is a 0-1 matrix, where we have
A[S,T] == 1 <==> S <: T
In other words, the rows list relation to supertypes, and the columns relation to subtypes. The idea of a sparse matrix is that if T1 and T2 do not share a common subtype, we either have A[i,T1] = 0 or A[i,T2] = 0. So we can combine both columns into column T, and set
A[i,T] = 1 if A[i,T1] == 1 A[i,T] = 2 if A[i,T2] == 1 A[i,T] = 0 otherwise
If two types T1, T2 have a common subtype, we say they conflict. Now to compress the matrix, give each type a bucket, s.t. conflicting types have different buckets (graph coloring). This is a hard optimization problem, so a good heuristic is essential.
Again, different types in the same bucket are assigned different IDs. The subtype test becomes
subtype(S,T) S.matrix[bucket[T]] == T.bucketId
We can also encode bucket indices in the matrix, reducing the test to
subtype (S,T) S.matrix[bucket[T]] = T.matrix[bucket[T]]
[edit] Other techniques
[edit] PQE
PQE [zibin] combines relative numbering with the packed bucket encoding and has very low space requirements (at most 80 bit for his benchmarks). It is hard to implement though, requiring a PQ-tree library (Demo Applet).
Along with PVE [barach] another option to consider.
[edit] Packed Encodings
Some work ([vitek],[zibin]), but not so interesting for JOP.
[edit] Bit Vector Tests
Subtype test reduces to a masking test:
S <: T if and only if S.bv & T.bv == S.bv
This test is fast but not constant time, as the size of the bitvector cannot be bounded easily.
[edit] Implementation Options and Considerations
[edit] Work list
Tool prototype for R&B on top of TypeGraph is finished. The bucket selection uses a very simple max-degree first heuristic.
Buckets (for interfaces and classes):
JDK5 java.* namepsace without java.awt and java.beans
Bucket Sizes: {0=245, 1=150, 2=71, 3=107, 4=127, 5=95, 6=26, 7=10, 8=7, 9=3, 10=1, 11=2}
JDK5 Full
Bucket Sizes: {0=256, 1=256, 2=256, 3=256, 4=256, 5=256, 6=256, 7=256, 8=256, 9=256, 10=256, 11=256, 12=256, 13=181, 14=9, 15=6, 16=3, 17=1, 18=1}
Relative numbering + Interface Buckets
JDK5 java.* namespace without java.awt and java.beans
Relative numbering ranges: (0, 669)
Bucket Sizes: {0=63, 1=63, 2=25, 3=12, 4=6, 5=3, 6=2, 7=1}
JDK5 Full:
Relative numbering ranges: (0, 2734)
Bucket Sizes: {0=255, 1=255, 2=160, 3=14, 4=45, 5=13, 6=19, 7=12, 8=7, 9=3, 10=3, 11=3, 12=1, 13=1, 14=1, 15=1, 16=1, 17=1}
- Evaluate space requirements for buckets
- Implement
- Decide whether combining relative numbering and bucket test pays off (space and time) and implement either bucket test or R&B.
- Implement interface table compression
- Think about array subtyping
[edit] Dealing with Arrays
As noted in the introduction, we need to find all array types used in the program in order to provide a array subtype check. We also need to encode the subtype information for array types.
[edit] R & B subtype check
From my feeling, compression is expensive, and classes do not fit well into the bucket algorithm. Therefore, a good option is combine both algorithms as in [palacz], reserving the first or the first two words for single inheritance checks and assigning bucket 0 to classes.
Possible memory layout (restricting to 65536 primary types and as many buckets):
class iface word 0 low | 0 0 2 high | bucket 4 bucket[0] 1 5 bucket[1] 6 bucket[2] 7 bucket[3] 8 ... 2
With a reflexive Schubert's numbering (see Appendix), this leads to the following subtype check algorithm:
subtype(S,T)
if(T.low == 0)
return (T.low <= S.low && S.low <= T.high)
else
return (S.buckets[T.bucket] == T.buckets[T.bucket])
For the full Java JDK, the memory layout takes at most 6 additional words per class, for the JDK without beans, javax and awt at most 3 words, and for our embedded examples 2 words.
[edit] Compressing Interface Tables
The bucket algorithm can be directly used to compress interface tables. We thave
(S obj).invokeinterface IReceiver offs ==> invoke S.itabs[ bucket[IReceiver] ] + offs
[edit] Misc Stuff
[edit] Compressed Subtype Matrices: Example
Consider the following subtype matrix generated from
A <: E, F B <: F, G C <: G, H D <: H, I E <: F
A B C D E F G H I A 0 0 0 0 1 0 0 0 0 B 0 0 0 0 0 1 1 0 0 C 0 0 0 0 0 0 1 1 0 D 0 0 0 0 0 0 0 1 1 E 0 0 0 0 0 1 0 0 0 F 0 0 0 0 0 0 0 0 0 G 0 0 0 0 0 0 0 0 0 H 0 0 0 0 0 0 0 0 0 I 0 0 0 0 0 0 0 0 0
We assign the same bucket to E, G and I, and another one to F and H.
E G I F H A B C D bucket 0 0 0 1 1 2 2 2 2 id 1 2 3 1 2 1 2 3 4
Now we replace 1 by the ID of the corresponding bucket and then merge the columns of each bucket
A B C D E F G H I B0 B1 B2 A 0 0 0 0 1 1 0 0 0 1 1 0 B 0 0 0 0 0 1 2 0 0 2 1 0 C 0 0 0 0 0 0 2 2 0 2 2 0 D 0 0 0 0 0 0 0 2 3 3 2 0 E 0 0 0 0 0 1 0 0 0 0 1 0 F 0 0 0 0 0 0 0 0 0 0 0 0 G 0 0 0 0 0 0 0 0 0 0 0 0 H 0 0 0 0 0 0 0 0 0 0 0 0 I 0 0 0 0 0 0 0 0 0 0 0 0
Note that in this matrix, the ids of types without subtypes do not appear. Optionally we can finally encode the tid[i] on A[i,bucket[i]]
B0 B1 B2 A 1 1 1 B 2 1 2 C 2 2 3 D 3 2 4 E 1 1 0 F 0 1 0 G 2 0 0 H 0 2 0 I 3 0 0
[edit] Reflexive relative numbering algorithm for single inheritance
Here we deviate a little bit from [palacz], in order to get a simple, easy to prove algorithm, optimized for the worst-case.
We actually only have to fetch three words to perform the subtype check. It should be sufficient to compute C.low and C.high for each type C, such that
(1) S <=: T if and only if T.low <= S.low <= T.high
To simplify the task, we additionally require (by construction)
(2) S <: T implies T.low < S.low (3) S <: T implies T.low <= S.low && S.high <= T.high
Now we have some nice properties by design
a) Reflexivity : because of T.low <= T.low <= T.high
b) Antisymmetry: If S <: T and T <: S, we have T.low < S.low < T.low, a contradiction
c) Transitivity of <:
S.low < U.low <= S.high, U.high <= S.high
and T.low < S.low <= T.high, S.high <= T.high
imply T.low < U.low <= T.high and U.high <= T.high
Now 3 simple construction rules allow to derive an algorithm and prove its correctness
(a) S.low <= S.high (b) If S is a direct subtype of T, T.low <= S.low - 1 and T.high <= S.high (c) If U and V are siblings, i.e., neither U <: V nor V <: U, their intervals are disjoint
The straightforward algorithm:
main = assign(root, 0)
assign(T, low)
next = low
foreach direct subtype S of T
assign(S, next+1) // assures disjoint
next = S.high // intervals of siblings
T.low = low // < S.low for all subtypes S
T.high = next // >= S.high for all subtypes S
[edit] References
[vitek] Vitek, J., Horspool, R. N., and Krall, A. 1997. Efficient type inclusion tests. SIGPLAN Not. 32, 10 (Oct. 1997), 142-157. DOI= http://doi.acm.org/10.1145/263700.263730
[palacz] Java Subtype Tests in Real-Time. Krzysztof Palacz and Jan Vitek.
[barach] A Simple and Efficient Subclass Test. http://people.csail.mit.edu/jrb/pve/index.htm . Submitted to ECOOP-02
[click] Click, C. and Rose, J. 2002. Fast subtype checking in the HotSpot JVM. In Proceedings of the 2002 Joint ACM-ISCOPE Conference on Java Grande (Seattle, Washington, USA, November 03 - 05, 2002). JGI '02. ACM, New York, NY, 96-107. DOI= http://doi.acm.org/10.1145/583810.583821
[zibin] Zibin, Y. and Gil, J. Y. 2001. Efficient subtyping tests with PQ-encoding. SIGPLAN Not. 36, 11 (Nov. 2001), 96-107. DOI= http://doi.acm.org/10.1145/504311.504290
