Join us on:

Reliability and Monitoring

Session Chair: Klaus Ostermann, Aarhus University
How Java VM Can Get More from a Hardware Performance Monitor
Hiroshi Inoue, IBM Tokyo Research Laboratory
Toshio Nakatani, IBM Tokyo Research Laboratory

This paper describes our sampling-based profiler that exploits a hardware performance monitor (HPM) available in the processor to collect information on running Java applications for use by the Java VM. Our profiler provides two novel features: Java-level events profiling and lightweight context-sensitive event profiling. For the former feature, we propose new techniques to leverage a sampling facility of the HPM for generating the object creation profile and the lock activity profile. The exploitation of the HPM sampling facility is the key to achieve much smaller overhead compared to the existing JVMTI-based profilers. To sample the object creations by the HPM, which can sample only hardware events such as executed instructions or cache misses, we correlate the object creations with the store instructions for the Java object headers. For the lock activity profile, we introduce an instrumentation-based technique, called ProbeNOP, which uses the special NOP instruction whose execution is counted by the HPM. For the latter feature, we propose a new technique called CallerChaining, which detects the calling context of HPM events based on the stack frame pointer values. In contrast to the existing techniques, our approach does not impose additional runtime overhead to identify the calling contexts. We show that it can detect the calling contexts in many programs including a large commercial application. Our proposed techniques enable both programmers and runtime systems to get more valuable information from the HPM to understand and optimize the programs without adding significant runtime overhead.

A Concurrent Dynamic Analysis Framework for Multicore Hardware
Jungwoo Ha, The University of Texas at Austin
Matthew Arnold, IBM Research
Stephen M Blackburn, Australian National University
Kathryn S McKinley, The University of Texas at Austin

Software has spent the bounty of Moore's law by solving harder problems and exploiting abstractions, such as high-level languages, virtual machine technology, binary rewriting, and dynamic analysis. Abstractions make programmers more productive and programs more portable, but slow them down. Since Moore's law is now delivering multiple cores instead of faster processors, future systems must either bear a relatively higher cost for abstractions or use some cores to help tolerate abstraction costs.

This paper presents the design, implementation, and evaluation of a novel concurrent, configurable dynamic analysis framework that efficiently utilizes multicore cache architectures. It introduces Cache-friendly Asymmetric Buffering (CAB), a lock-free ring-buffer that implements efficient communication between application and analysis threads. We guide the design and implementation of our framework with a model of dynamic analysis overheads. Our framework implements precise and sampled event processing and is analysis-neutral. We evaluate our framework with five popular and diverse analyses, and show performance improvements for even analyses with very modest overhead.

Efficient inter-core communication is central to high performance parallel systems and we believe the CAB design gives insight to the subtleties and difficulties of attaining it for dynamic analysis and for other parallel software.

Inferred Call Path Profiling
Todd Mytkowicz, Department of Computer Science/University of Colorado/Boulder
Devin Coughlin, Department of Computer Science/University of Colorado/Boulder
Amer Diwan, University of Colorado at Boulder

Prior work has found call-path profiles to be useful for optimizers and programmer-productivity tools. Unfortunately, previous approaches for collecting path profiles are expensive: they need to either execute additional instructions (to track calls and returns) or they need to walk the stack. The state-of-the-art techniques for call-path profiling slow down the program by 7% (for C programs) and 20% (for Java programs). This paper describes an innovative technique that collects minimal information from the running program and later (offline) infers the full call paths from this information.

The key insight behind our approach is that readily available information during program execution—the height of the call stack and the current executing function—are good indicators of calling context. We call this pair a context identifier. Because more than one call path may have the same context identifier, we show how to disambiguate context identifiers by changing the sizes of function activation records. This disambiguation has no overhead in terms of executed instructions.

We evaluate our approach on the SPEC CPU 2006 C++ and C benchmarks. We show that collecting context identifiers slows down programs by 0.2% (geometric mean). We can map these context identifiers to the correct unique call path 89% of the time for C++ programs and 97% of the time for C programs.

Please email any questions to . This e-mail address is being protected from spambots. You need JavaScript enabled to view it