DOC

Inling Java Bytecode

By Ruby Rogers,2014-04-24 11:49
6 views 0
Inling Java Bytecode

     Fabio Pellacini and Ioannis Vetsikas

     Inlining Java bytecode CS612 final project report

    Inlining Java Bytecode

    Fabio Pellacini and Ioannis Vetsikas

    CS612 final project report

1 Problem statement

    Given the OO Java design, small function calls are heavily present in Java programs. In case of programs that use heavily small functions (such as the arithmetic operations on complex numbers, or accessor methods) the execution time can be improved a lot by inlining the bytecode. This is true in all OO languages but especially in Java bytecode since the overhead time required for a method to be invoked is extremely high compared to native code.

    Our project aims at speeding up Java bytecode by inlining functions. To do that we first analyze the bytecode to determine the set of loaded and instantiated classes. Using the latter we prepare the bytecode for the inlining phase using a type hierarchy analysis. And finally we inline the bytecode.

    Using this very simple system, we got some pretty nice results in case of programs like a complex matrix multiplication.

    2 Bytecode details

    This chapter will be a small introduction on the structure of the bytecode and some details about JVM execution meant to be helpful in understanding the rest of the implementation (it is possible to find more information about this in [1]). In the following discussion we will assume that no reflection is used. 2.1 The class file format

    For each class, the JVM requires to have a file containing the bytecode of that class. A class file, schematically, contains information about the class hierarchy, an area called constant pool, information about the fields that are declared in the Java class, information about the methods and a series of attributes.

    The class info section stores the name of the class defined, the name of its parent class, the access flags for the current class and an array of the interfaces that the class implements. As can be easily seen, this preserves all the information about the type structure that is in the Java source code.

    The constant pool section stores all the constants needed in the class file. Whenever a constant is needed in the bytecode, the constant is replaced by an index in the constant pool. The constants are also typed, so we know if we are referring to an integer, a class reference, a string and so on so forth.

    The field info section is an array of fields‟ information. They basically contain a Java field declaration, i.e.

    field name, type and modifiers and an array of attributes relative to that field. As in all the other attributes in Java class files, there are standard attributes declared in the specification, and one can also define ones own attributes (so that one can easily annotate the bytecode).

    The method section is very similar to the field one. It contains the Java declaration of the method (signature and modifiers) and a set of attributes. One of the standard ones that is very important for us is the Code attribute. It contains the compiled code for the method if the method is not abstract or native. 2.2 Class initialization, object instantiation and finalizations

    In order to perform type hierarchy analysis, we need to know when classes get loaded and when they are instantiated. We will assume no specific order for loading and verifying Java bytecodes (unless required for the initialization process).

    1

     Fabio Pellacini and Ioannis Vetsikas

     Inlining Java bytecode CS612 final project report

    1[1 section 2.17.4]. Initialization of an Initialization of a class consists of executing its static initializersinterface consists of executing the initializers for fields declared in the interface. Before a class or interface

    is initialized, its direct superclass must be initialized, but interfaces implemented by the class need not be

    initialized. Similarly, the superinterfaces of an interface need not be initialized before the interface is

    initialized.

    A class or interface type T will be initialized immediately before one of the following occurs: ? T is a class and an instance of T is created.

    ? T is a class and a static method of T is invoked.

    ? A nonconstant static field of T is used or assigned. A constant field is one that is (explicitly or

    implicitly) both final and static, and that is initialized with the value of a compile-time constant

    expression. A reference to such a field must be resolved at compile time to a copy of the compile-time

    constant value, so uses of such a field never cause initialization.

    An object of a class is instatiated only when a NEW instruction is executed in the bytecode [1 section

    2.17.6]. After this instruction there is a call to the appropriate constructor (named in the bytecode), 2so we don‟t need to deal explicitly with them. For the purpose of our analysis we also need to take into account finalization. Objects are finalized before

    they are collected by the garbage collector. To do this, Java execute the finalize method. Classes are

    finalized before they are unloaded by the JVM, and the finalization process is a call to the

    finalizeClass method [1 section 2.17.7].

    2.3 Bytecode method code

    Every method call in the JVM has its own stack frame (the stack is not shared between method like in

    standard native compilation). Also local variables are not stored on the method stack as far as the bytecode

    instruction are concerned. Every method declares the maximum size of the stack and the maximum number

    of locals.

    2.4 Bytecode method invocation

    The procedure for calling a method is similar to the way we usually call when we compile to native code.

    All arguments are put on the stack of the caller, and then we invoke an instruction to jump to subroutine.

    After the execution of the callee is finished, the result is put on the stack.

    The callee receives the parameters in the first n locals (where n is the number of arguments of the callee

    method). The first parameter, in case of non-static functions, is the reference to the object from which the

    function is called.

    There are four different instructions in the bytecode to jump to a new method. They all need a method

    reference that is an index in the constant pool. The method reference contains the signature (that also

    contains the class name) of the method that must be executed.

    The instructions for calling a method are

    ? invokestatic: used to invoke static methods (statically linked, so can be inlined)

    ? invokespecial: used to call private, superclass methods, and constructors (statically linked, so can

    be inlined)

    ? invokevirtual: used for virtual calls (dynamic liked, cannot be inlined)

    ? invokeinterface: used to call a method defined in an interface (dynamic liked, cannot be inlined)

    For virtual method calls, the class in the method reference will be the root of the subtree of possibly

    overriding classes. For interface calls the reference will be to the method in the interface.

     1 In our case, we are only interested in being sure we execute the code in the static initializer for the class,

    i.e. the method. 2 Actually this is not true for String, but we assume that we will always consider String as loaded in the

    JVM, so for our specific application this doesn‟t matter.

    2

     Fabio Pellacini and Ioannis Vetsikas

     Inlining Java bytecode CS612 final project report

    3 Description of solution

    3.1 Overall structure

    Our system takes as input bytecode and outputs optimized bytecode; it does not require Java code. We

    assume that all the class files needed are provided to the system and that includes all the libraries needed

    for the program to run.

    The work is being done in several phases as portrayed in the following schema:

     J Class J Inliner Fixer Bytecode A Hierarchy A

     X Analyzer X System lib.

The first stage is to take the bytecode and the system libraries and insert them into the class hierarchy

    analyzer, which analyzes the bytecode, reconstructs the class hierarchy, finds which classes are instantiated,

    reproduces the caller graph and resolves as many invocations as it can.

    The Inliner is the next stage, which inlines the bytecode where it is deemed necessary and possible.

    Then the resultant bytecode is passed through a “fixing” stage to correct any problems related to the

    security mechanism. 3We have also found a tool from IBM called JAX [2], which can be optionally used before the analyzer to compress the class hierarchy and, after the final bytecode is produced, to strip the code of the methods

    which are not used and to clear up the constant pool.

    3.2 Class hierarchy analysis

    3.2.1 General

    This stage is aimed at getting enough information out of the program to be optimized so that we can 45determine if some method invocations, which appear to be polymorphic, are in fact monomorphic.

    According to the Java Virtual Machine specifications the invokestatic and invokespecial

    commands are monomorphic invocations, whereas the invokevirtual and invokeinterface

    commands are polymorphic. The inliner can only safely inline monomorphic calls. Therefore it is desirable

    to figure out if „virtual‟ or „interface‟ call is in fact monomorphic in which case the command is changed to

    „special‟ call to indicate to the inliner that the call is monomorphic.

    3.2.2 Main idea

    We use a simple data flow analysis. What we need to determine is what classes get loaded and which of 6those get instantiated. We also need to get an approximation of the caller graph. The information therefore

    that needs to be propagated in the data flow analysis is the class hierarchy tree and the set of instantiated

    classes. So what needs to be done is start from the main function and follow the edges in the caller graph.

    We therefore need to keep track of classes that get instantiated, of accesses to static fields and of method

    invocations.

     3 Its use is briefly described in a later part of this report. 4 An invocation site is called polymorphic if the invocation at that site can call more than one overridden

    method. 5 An invocation site is called monomorphic if only one particular method can get called. 6 The caller graph is a graph where there is a directed edge between two methods of the program if and

    only if the first method calls the second one. In our approximation we can have more than one edges per

    call site in case the call site is polymorphic.

    3

     Fabio Pellacini and Ioannis Vetsikas

     Inlining Java bytecode CS612 final project report

    3.2.3 Implementation

    3.2.3.1 Input

    Our program only needs to get the name of the class where the „main‟ method is located. No a priori

    information of which classes and methods are loaded and used is necessary, since the analyzer finds all that

    information for itself.

    3.2.3.2 How it works

    ? It starts from the „main‟ method ? For each method which gets analyzed the following are done in the following order:

    ? If the corresponding class does not exist it is imperative that all super classes and super interfaces

    get loaded (when loading a class the methods , finalize and classFinalize 7must be called).

    ? A node is inserted in the caller graph

    ? If the method is used for the first time:

    ? it must be checked whether the method is a “dangerous reflection” method (e.g. forName,

    newInstance, loadClass etc.)

    ? we must check all the instructions in the code of the method for presence of the NEW

    instruction, in which case the set of instantiated classes must be updated with the new classes

    which got instantiated

    ? If the method is used for the first time or the set of instantiated classes has grown (some more

    classes got instantiated) since the last use of the method we need to check the invocation sites to

    make sure that all the possible methods that could potentially get called are indeed called. This

    means of course that we need to check the whole hierarchy again, which could potentially have

    changed since the last use of the method, and for each class in the sub tree which is instantiated

    consider that the appropriate method gets used.

    ? The set of loaded classes, the set of instantiated classes and the caller graph that the analyzer computes

    are in fact “conservative approximations” of the real ones. What we mean by that is that we might

    actually estimate that more classes are loaded or are instantiated and that more methods can get called

    than they really are. However, we are being conservative in doing that and we will not underestimate

    the number of methods that get called from a call site or the number or classes that get loaded and

    therefore when we devirtualize we preserve the equivalence of the resulting program to the original

    one.

    ? It should be noted that when a class or an interface is loaded its super classes or super interfaces are

    “updated” to know which their “children” are in the hierarchy. Also when a method gets used it

    “remembers” a few pieces of information (e.g. which methods it calls, or how big the instantiated class

    set was the last time it was used etc.)

    3.2.4 Running Time Complexity

    Assume that:

    ? M = total number of methods which are used

    ? C = total number of classes which are instantiated

    ? L = length of code in method i i

    ? H = cost to traverse the hierarchical sub trees from each invocation site in the code of method i and idetermine which methods get called

    For each of the M methods:

    ? Search its code for NEW instructions. This takes O(L) time. i

    ? The analysis for which methods get invoked (can be run) C times in the worst case, since the analysis

    is run only if the instantiated classes set has grown and C is the size the set gets to at the end and

     7 Since the finalizers are called before the objects are collected, and we don‟t know when this is going to

    happen, we will call them at the beginning of the analysis since it does not really matter.

    4

     Fabio Pellacini and Ioannis Vetsikas

     Inlining Java bytecode CS612 final project report

     equal to the sum of the O(H?L)iiobviously that size cannot decrease. For each run the cost is cost of checking all the method code and the cost of traversing the hierarchical sub trees from each

    invocation site in the code of method i and determining which methods get called. Thus this takes

     time. O(C?(H?L))ii

    M? So total cost is: ???? which, since in a OO program, gives OL?C?(H?L)C?1?iiii?1

    M???? OC?(H?L)?iii?1

    ? But this is actually a worst case analysis, in practice it is not really necessary to run the analysis for

    each method C times but usually less.

3.2.5 Type hierarchical analysis and devirtualization

    This is the aftermath of the class hierarchy analysis. What is done is for each polymorphic call site to go

    through the class hierarchy tree and determine which methods can get called; if their number is one then the

    call is really monomorphic and thus the call should be changed to invokespecial so that the inliner 8will know that this is so.

    3.2.6 Caveats

    As the results of the analyzer on several programs have pointed out, it is unfortunately quite often the

    case that it is impossible to know exactly what happens in each part of the program (every class and every

    method). This is due to the use of two Java platform features:

    3.2.6.1 Reflection

    The reflection API allows the programmer to do introspection of object and classes in the current JVM.

    By means of that the programmer has fields and methods references. Using the latter the programmer can

    call a method in a way that we cannot track the call using the given analysis. It is also possible to

    dynamically load new classes, which can also change a class that is in memory. This actually breaks down 9the analysis even worse, since there is no way to detect this.

    The unique way, in which we deal with that, is to at least detect that it has been used. If so, we cannot 10guarantee any part of the analysis, so we cannot devirtualize safely.

    3.2.6.2 Java Native Interface (JNI)

    The Java Native Interface JNI is the standard Java interface to native libraries. A native method can

    create objects, modify fields, call Java methods and load new classes. All these possibilities break down our

    analysis completely. Since we don‟t have the code for native methods, these are just black boxes for us, that

    we cannot analyze.

    Unfortunately they are used extensively (just print a string). We assume that a native method will not give

    us problems; this is true, at least, for all native methods in system libraries. If we find a native method call 11in the user code, then we cannot devirtualize safely.

     8 We can define a set of classes than we are not going to change during the inliner phase, nor we change

    any call site to a method in that set of classes. From now on they will be called system classes, since

    normally we include the system libraries in this set. 9 To our knowledge, there are no compilers or tools that do static analysis that are able to do anything for

    reflection. 10 We can always inline static and private methods, and superclass calls. 11 We can always inline static and private methods, and superclass calls.

    It should be noted however that since the existence of native functions is the case in almost every

    program, we have added the option to the analyzer for different levels of security. This means that,

    depending on the security level given by the user, the analyzer will try to devirtualize anyway, but it will

    output a warning message to inform the user that the optimization might not work and the reason for it

    (existence of native method, use of reflection etc.)

    5

     Fabio Pellacini and Ioannis Vetsikas

     Inlining Java bytecode CS612 final project report

    3.3 Inliner

    3.3.1 Inlining heuristic

    In the current implementation the inliner will inline every static and special call; there is no heuristic for

    inlining. In the test cases we ran, inlining was always better than calling a function. We really don‟t have

    the problem of having bigger code as in standard inlining procedures, since the cost of a function call is

    pretty high in Java bytecode execution.

    3.3.2 Support for static/special calls

    Given a statically linked method, we can easily inline the callee code. To do this we will follow the

    following scheme: 121. Grow maxstack and maxlocals to the sum of the caller and the callee max values

    2. Pop arguments from the stack and store them into the new local variables

    3. Insert code for NullPointerException check, in case we are inlining non-static methods (the

    code we insert is the same as the compiled code for the statement

    if(ref != null)throw new NullPointerException(); 4. Insert code for synchronization, in case of synchronized methods (MONITORENTER instruction)

    5. Insert the callee code, renumbering all references to local variables; in case we are inlining a method

    that is in another class we also need to fix the constant pool

    6. Change every return to unconditional jump to the instruction in the caller code which is the first one

    after the call

    7. Insert code for synchronization, in case of synchronized methods (MONITOREXIT instruction)

    After inlining all the methods in a given caller, we also run a little optimization procedure. This will clean

    up the resulting code in two ways. We will take out the jump instruction that refer to the next bytecode

    instruction and we will clean up some NOP instruction that we used as placeholders for making jump easier to deal with.

    3.3.3 Note about inlining by jump

    We also tried to inline the code in the caller by using the jump to subroutine instruction used normally to

    compile the finally statement. We did this to avoid an unwanted grow of code size in case we need to inline

    several times the same function. The other reason why we were interested in such an inlining procedure,

    was that by doing so we basically compile method call with a shared stack, so it would have been useful to

    understand the behavior of this method call.

    The results we got after running this code are incredibly slower than the non-inlined code. The reason

    seems to the fact that the jsr and ret instruction we use are only used to compile the finally

    statement, and since the JVM is very slow in dealing with exception, this is much slower than a normal call.

    Unfortunately in the bytecode instruction set there is no other way to implement a subroutine call and

    return. In fact we need an instruction to jump to a bytecode address found on the stack and the unique one

    is the ret instruction. This expects to find on the stack a return address put there by the jsr instruction

    (remember that the JVM verifies the type of every element on the stack before using it, so we cannot just

    push an integer on the stack by ourselves, because its type is not “return address”, but “int”).

    3.3.4 Caveats

    There are some caveats that we have to deal with for inlining correctly. Those are dealing with cycles in

    the caller graph, dealing with exception handling and finally dealing with problems due to the way Java

    enforces correctness in the bytecode.

    3.3.4.1 Cycles in caller graph

    The way we deal with cycles in the caller graph is just to inline recursively a certain number of times,

    then stop doing it. This is surely a correct solution and works well in practice.

     12 If we inline more than one call, we share locals and stack slots.

    6

     Fabio Pellacini and Ioannis Vetsikas

     Inlining Java bytecode CS612 final project report

    3.3.4.2 Dealing with exceptions

    The exceptions in the JVM are dealt by means of exception tables. There is no explicit code for a jump

    when an exception occurs, but there is a table that for each try/catch block reports the bytecode address that

    is in the try block, the type of exception caught and the address of the catch code for it (finally is dealt with

    jsr and ret)…

    If the callee throws an exception then its stack does not need to be empty when we exit the method; but in

    this case, we can accidentally “corrupt” the stack of the caller. Given that in the bytecode, there is no

    instruction that let us know the height of the stack (or something similar), there is no way to check that the

    stack has not been corrupted. This does not allow us to inline when the call is in a try/catch block.

    This conservative solution to this problem is not so bad, since exception handling is the dominant factor

    in the execution of a try/catch block (jumping to an exception handler is about 10 times slower than a

    function call).

    In any compiled code that we saw, there was no evidence of the corrupted stack problem even if we

    inlined the code. Unfortunately, we couldn‟t find any reference to clarify this point in the JVM

    specification, so we decided not to inline in that case, just to be on the safe side.

    3.3.4.3 Violation of Java semantics

    After the inlining procedure, we can leave instructions that violate the Java semantics in the optimized

    bytecode. There are two different problems that we must solve. The first one is a violation of access

    privileges.

    Consider for example the code in figure 1. We can safely inline A.f() in B.l(). But after we inline it,

    we will reference the private field i from class B. This is not possible in the Java semantics, and the code will not be executed by the JVM since the class B will be verified before execution. This problem has a simple solution, since we can easily change the access flag of the fields we need in the various class files.

    class A {

    private int I;

    final public f() { I ++; }

    final public h() { g(); }

    private g() {…g();…}

    }

    class A1 extends A {

    public g() { … }

    }

    class B {

    A1 a = new A1();

    public l() { a.f(); a.h(); }

    }

    Figure 1

A subtler problem is the changes in invocation type. Suppose we want to inline method A.h() in B.l().

    We will have a call to method A.g() that is private in A. Since the method is recursive, we cannot avoid to call it at least once. To avoid the access violation we could change the access from private to public (or 13private to package). But since we are using a special call, some JVM will not execute the code. But if we

    now change the linking to a virtual call, we will call the method A1.g() instead of A.g() (similar

    example can be found in case of super call). The unique way to fix this is to change the access privileges

    and rename the method such that we maintain Java semantics.

     13 We could not find any specification that should rule this out. The updated JVM specification for Java 2

    has no mention of this in the verifier, but the JDK 1.2.1 throws a VerifierException in case we try to

    execute the code.

    7

     Fabio Pellacini and Ioannis Vetsikas

     Inlining Java bytecode CS612 final project report An easier way to deal with these problems is to switch off the Java verifier. Rewriting the class loader can

    do this.

    3.4 Library JavaClass

    The whole project has been implemented using the library JavaClass [3]. The library lets us read bytecode

    files, and modify them quite easily. It has a completely abstract way of dealing with bytecode that lets us

    build prototypes much faster. Of course this abstraction is a little slower than coding with another style, but

    it is worth it, since we do not have to write code for manipulating bytecode ourselves.

    The most important features for our project were the management of the constant pool and of references

    to it and the management of jump instructions in the code for the methods. Not having to deal with

    bytecode indexes and jumping addresses made our life easier especially because we were modifying

    existing bytecode, not writing new bytecode from scratch.

    4 Experimental results

    4.1 Class hierarchy analysis

    14Running the analyzer we get some very interesting results.

    When the analyzer is run even on small programs, it turns out that a big number of classes are being used.

    For example even in an extremely small program that consists only of the „main‟ function, which just prints

    a line on the screen, 10 system classes are loaded (1 of those is instantiated) and 13 system methods in

    those 10 classes are used whereas 93 are not used. It should be also noted that one of those methods is a

    “native” method.

    When the analyzer was run on a slightly bigger program, which finds all the divisors of given numbers,

    the results were more interesting:

     Libraries Other Total

    Classes

    Classes 26 2 28

    Loaded 93% 7%

    Methods 39 5 44

    Used 8.6% 62.5% 9.5%

    Methods 417 3 420

    Not Used 91.4% 37.5% 90.5%

What can be easily seen is that most of the system methods are unused, whereas most of the user program

    methods were in fact used. The 3 methods in the divisor that were not used were prototyping methods.

    Almost all of the classes loaded are system classes. Again a native method is found and, since we print the

    result to the screen, it is the same native method as in the trivial program case. The devirtualizer was able to

    resolve 3 polymorphic calls. In fact the 3 calls where all the „invokevirtual‟ calls that the inliner attempted 15to resolve, which means that all the „virtual‟ calls in the program are in fact monomorphic. There were no

    „invokeinterface‟ calls.

    In order to get a better feeling of what a “real” program would produce we run the analyzer taking as

    input the analyzer itself, which is a sizable program.

     Libraries Other Total

    Classes

    Classes 125 76 201

     14 Almost all the results, which are presented in this part, have been obtained by using the JDK 1.2.1

    system library classes. The results using Microsoft Visual J++ library classes are different in some cases

    but only in the actual numbers and not in the general behavior observed by the analysis. 15 It should be noted that the analyzer does not try to resolve calls in system libraries or calls that invoke a

    system library method.

    8

     Fabio Pellacini and Ioannis Vetsikas

     Inlining Java bytecode CS612 final project report

    Loaded 62% 38%

    Methods 286 315 601

    Used 19% 36% 25%

    Methods 1254 557 1811

    Not Used 81% 64% 75%

As it turns out most of the classes loaded are system classes (libraries), but to a smaller percentage than in

    the previous two cases. There is a low usage of methods in the system libraries (19% only), but there is

    actually a rather low usage of methods in the remaining classes also (36%). The latter might seem strange,

    but it can be easily explained by the fact that we include in the other classes the “JavaClass” library, which 16. What should be noted however is the fact that the program is used for manipulating the bytecode filesrealizes that only 69 classes, since 7 classes are the ones we wrote for the analyzer, are used from the over

    250 classes found in the “JavaClass” library. Also only 125 system are loaded from the over 2000 classes in

    JDK 1.2.1. Native methods are again found, but this time also dynamic class and method loading methods

    are invoked (use of reflection) and thus it is uncertain whether the program can safely devirtualize.

    However, by setting the security level of the program to „Optimize Always‟, we force the devirtualizer part

    to work. The results are really stunning and show why this kind of analysis is very useful:

     Total number of Calls Number of Percentage

    attempted to be resolved Resolved Calls Resolved

    Virtual Calls 337 329 97.6%

    Interface Calls 0 0 -

     Initial Number Final Number Change Percentage Change Virtual Calls 1231 902 -329 -26.7% Interface Calls 8 8 0 0% Special Calls 374 703 +329 +88.0%

    Static Calls 162 162 0 -

The impressive result is the fact that out of the 337 calls that the program attempts to resolve, 329 are

    monomorphic and only 8 are polymorphic, which means that 97.6% of the calls are successfully resolved

    and thus inlining can further optimize those calls. Overall, the number of special calls has nearly doubled,

    whereas the number of virtual calls has been decreased by more that 25%.

    4.2 Inliner

    In this section we will give some results to show how inlining speeds up Java bytecode. We run the

    inliner on codes that execute a lot of times the same short method. All the test cases were run using the

    JDK1.2.1 JVM and the last MS JVM on Windows NT systems with the JIT active. The reason why we use

    the JIT for our test is that we want to get an overall speedup of Java bytecode, so our solution should work

    with the JIT.

    4.2.1 Trivial programs

    The test cases were built to execute short methods that do some computation on data local to the object

    referenced. The results can be found in the following table (the results are taken using MS JVM).

     Devirtualization Inlined Inlined recursively Prog. 1 2% 31% 62% Prog. 2 1% 30% -

     16 We could have asked the analyzer to treat the JavaClass library classes as library classes. In that case, it

    would have counted them together with the system classes, and therefore the inliner would not have

    analyzed several hundred calls to JavaClass methods and thus we would not have been able to observe the

    huge number of calls that were resolved.

    9

     Fabio Pellacini and Ioannis Vetsikas

     Inlining Java bytecode CS612 final project report Prog. 3 1% 26% -

As can be easily seen, there is almost no speedup obtained for changing the call type from virtual to

    special, which has also been found in [2]. In dealing with Java bytecode, the reason is clearly the fact that

    the work needed for calling a function in Java depends a lot on the fact that we need to create a new stack

    frame, verify the correctness of the call (and maybe something else). The difference in the resolution of the

    function call is negligible compared to the rest of the work.

    After inlining once we get clearly better running times up to 30%; the speedup is even better in the case

    when we inline more than once.

    We run another code in which we try to compare the execution time of two different JVMs given the

    inlining optimization. The results are in the following table.

     JDK 1.2.1 MS 1 Call 0% 42% No calls 0% 42%

The row labeled “1 Call” contains the speedup obtained after inlining; the line labeled “No calls” contains

    the speedup of a Java source code that has no function calls, so it is the theoretical limit for the inlined code.

    As shown the JDK JVM seems not to be affected by the optimization, but more importantly it seems not to

    be affected also in the code that uses no function calls. This means that in certain cases (and we stress

    certain limited cases) the JVM is able to inline the code. In the MS JVM, the speedup is significant, but

    more importantly, we also achieve the theoretical limit in inlining the code. This means that the work we do

    on the stack, which is needed to inline, is negligible compared to the actual execution time of a function

    call and thus no further analysis is necessary to try to minimize that cost.

    4.2.2 Complex matrix multiplication

    We execute the inliner on a complex naïve matrix multiplication program, which multiplies two matrices

    256x256.

     JDK1.2.1 (ms) MS JVM (ms) JDK 1.2.1 MS JVM inlined

    inlined 2 calls 12500 22100 12000 9300

    4% 58% 1 call 12200 11200 11200 8900

    8% 19% No calls 11300 10000 - - No calls, no 4800 5000 - - objects

We wrote 4 versions of the code. The first three use matrices of complex objects, and differ by the

    number of function called in the inner “for loop”. As can be seen we actually get speedup for the two JVMs

    that are even better than the theoretical limit. We have no explanation for this effect, since we are running

    the code using JIT and we have no control on what it is doing.

    Another important aspect is that, if we use two matrices of real numbers instead a matrix of complex

    numbers, the speed increases by a factor of two. This could be an effect of null pointer exception checked

    on every element in the matrix or just the way Java uses in order to refer to an object on the heap.

    5 Related works

    There is a lot of work related to OOP optimization using inlining. Some of it can be found in the papers

    available on the course Web page [4]. As far as we know, there is little work done in optimizing Java

    bytecode. One of the reasons is that anyway Java bytecode is slower than a native compilation.

    In this section we will cover two of the most significant example of work in this area.

    10

Report this document

For any questions or suggestions please email
cust-service@docsford.com