Lab-5538: The Real-Time Java™ Platform Programming Challenge: Taming Timing Troubles

Expected Duration: 120 minutes

Exercise 4: More Debugging of Deadline-Misses using the Thread Scheduling Visualizer

Expected duration: 15 minutes

Important note

The tracing tool used by this exercise only works on Solaris OS as it is based on the DTrace tool. If running on Linux you can still perform the visualization part of the exercise by using the pre-existing log files from a failed run that are in the exercises directory:

  • /export/home/lab5538/realtimejava/exercises/Ex4/log25145.jrt3
  • /export/home/lab5538/realtimejava/exercises/Ex4/log25278.jrt3

The goal of this exercise is to use the "Thread Scheduling Visualizer" (TSV) to examine the scheduling behavior of your application from Exercise 3, to determine where, and why, a deadline miss is being encountered by your periodic asynchronous event handler.


Background Information

 

Read the material in the presentation, up to the slide titled "Exercise 4".

The Sun Java RTS Netbeans module provides a simple interface to execute an application with tracing enabled and then launch TSV to display the generated log file. If not using this module you will need to launch the application manually under the control of the crecord shell script, using the simple.d and contentions.d DTrace scripts. You will then need to launch TSV itself. Consult the TSV documentation for details of this process.


Steps to Follow

 
  1. Start the Netbeans IDE if not already started.
  2. There are three ways to proceed with this exercise:
    1. If you completed Exercise 3 and wish to use your solution to that then select the project Ex3 and make it the main project by right-clicking on it and selecting Set as Main Project.
    2. If you wish to use a provided solution to Exercise 3 then select the project Ex4 and make it the main project by right-clicking on it and selecting Set as Main Project.
    3. Otherwise, if you are on Linux and unable to perform the tracing part of the exercise, then go to the File menu, select Open and navigate to the pre-defined log file: /export/home/lab5538/realtimejava/exercises/Ex4/log25145.jrt3 and open it. This launches TSV and you can skip to step 4.
    As the example we will be using project Ex4.
  3. From the Debug menu select TSV Recording which displays the TSV configuration dialog:

    <Missing graphic: TSV Configuration Dialog>

    Ensure that the correct application is selected (in this case Ex4.jar) and that the TSV script is simple.d, and click Run. The output window will show something like the following at the end:

    Handler read 0.117643
    Execution completed, post-processing in progress
    Log generated in file log25145.jrt3
    
    Note:If no deadline was missed then close the TSV application and repeat this step. If you can't get a deadline to miss refer back to step 1 and use the existing log file.
  4. The TSV application is now displaying your log file:

    <Missing graphic: TSV Main Window>

    TSV initially displays threads in the order that they were created, with the oldest at the bottom and the most recent at the top. The bottom thread is always the main thread. The top-most threads are those created by your application. In between are all the threads created by the VM, and the core class libraries. If you place the mouse over a section of thread activity the thread details, such as the name, are shown in the lower-right section of the screen. For example:

    <Missing graphic: TSV thread info>

  5. Identify the different threads that you see. The following image identifies the main threads of general interest here:

    <Missing graphic: TSV thread labels>

    Starting from the bottom we have:
    • Two real-time RTGC threads. You can see these threads execute approximately periodically and mostly in unison.
    • In the middle of the screen we have another real-time thread (all in red) and this is the NoHeapRealtimeServerThread. This thread is one of a number of threads that can be created by the VM to execute asynchronous event handlers — in this case no-heap ones. This is the thread that is executing our handler and as we can see the execution pattern is somewhat strange: there are regular bursts of periodic activity followed by long periods of no activity.
    • The green thread is the WatcherThread. It's a non-realtime (hence all green) thread used by the VM for its own periodic activities, and it isn't of interest to us here.
    • Next we have another, highly-periodic, real-time thread. This is the HighResTimerThread, and as it's name suggests it is associated with high-resolution timers — in fact this is the thread that executes when your Timer expires and causes your handler to be released.
    • Finally, at the top we have the ConsumerThread: our non-real-time thread that takes the data from the DataBuffer. It is periodically executing as well, but the interesting thing about its execution is the color! On each execution it starts out green, goes to red, then back to green — which means that it is executing with real-time priority most of the time!
  6. Zoom in to cover the start of two executions of the ConsumerThread, so that you can see in more detail how the threads are being scheduled. It should look something like this:

    <Missing graphic: TSV consumer zoom>

    Note how the NoHeapRealtimeServerThread stops executing around the same time that the ConsumerThread starts executing with real-time priority. As you know from the presentation non-real-time threads can execute at real-time priority due to priority inheritance. Which means that the code in our event handler must be trying to acquire a monitor lock that is owner by the ConsumerThread. You can verify that this is the case by using TSV with a different monitoring script.

    Note: Running multiple instances of TSV can consume a lot of memory and possibly lead to errors when processing a log file. It is recommended that you close any open instances of TSV before launching a new instance.

  7. From the Debug menu select TSV Recording which displays the TSV configuration dialog:

    <Missing graphic: TSV Configuration Dialog - contention>

    This time select the contentions.d TSV script, and click Run.

    If you are unable to see deadline misses in the application, or if you are on Linux and unable to perform the tracing part of the exercise, then go to the File menu, select Open and navigate to the pre-defined log file: /export/home/lab5538/realtimejava/exercises/Ex4/log25278.jrt3 and open it. This launches TSV showing the output generated by the contentions.d script.

  8. The TSV application is now displaying the contentions log file:

    <Missing graphic: TSV Contentions>

    This looks very similar to the previous TSV display, but notice that there's an extra row of green at the bottom (this may show up as blue/purple on some systems). This isn't an extra thread but a graphical display of contention — the green sections indicate the periods of time when there was contention on a lock. If you look closely you will see that the periods of contention seem to line up with the times that the ConsumerThread is executing with real-time priority, and the NoHeapRealtimeServerThread is not executing — the following image shows them next to each other for clarity:

    <Missing graphic: TSV contention zoom>

    This confirms our theory that there is lock contention causing the ConsumerThread to inherit a real-time priority, and preventing our handler from executing. But where is the contention?

  9. If you hover the mouse over a contention indicator a popup window tells you which object the contention occurred on:

    <Missing graphic: TSV Contention Detail>

    So, as you can see, the problem is contention that is occurring on an instance of java.util.Vector. If we wanted to probe deeper with TSV we could use the contentionstacks.d script to actually capture the stack trace at the time the contended lock comes under contention, and see where in the code the problem was arising. If you have time feel free to try this for yourself. But for the purposes of this exercise we can simply turn our attention to the source code as we know that there's one obvious object used by both our handler and the consumer thread: the DataBuffer.
  10. Open the DataBuffer class in an editing pane by navigating to it in the project source tree, and double-clicking it. Here's the code for that class:
    public class DataBuffer {
        final Vector dataset;
    
    
         public void add(Double d) {
            dataset.add(d);
         }
    
    
         public Double take() {
             synchronized(dataset) {
                 if (dataset.size() > 0) {
                     process(); // simulate data processing activity
                     return dataset.remove(0);
                 }
             }
             return null;
         }
    
         /**
          * Simulates a data-processing operation - in this case a simple and
          * slow bubble-sort
          */
         private void process() {
    

    As you can see a DataBuffer contains a Vector, and Vector is a synchronized class: one where all methods synchronize on the current instance. The DataBuffer.add method simply calls the Vector.add method — so the handler will need to acquire the lock of the Vector. Meanwhile the consumer thread calls DataBuffer.take which directly synchronizes on the Vector instance. That in itself need not be a problem, after all that is why we have priority-inheritance to assist with these situations where different priority threads can synchronize on the same object. The main problem is that take also calls process which, as per the comments, performs a slow bubble-sort of all the data. Consequently, the consumer thread is holding the lock for a very long time — over a second! — and this causes the handler to miss many deadlines as it tries to execute every 50ms.

    So how can we fix this? That's the subject of the next exercise.

Solution

 

There is no "solution" for this exercise as it involved analysis only.


Summary

 

In this exercise you have explored further the capabilities of the TSV and seen how you can use the basic TSV scheduling information to theorize about potential timing problems. You've then seen how more sophisticated TSV scripts can be used to explore these theories, for example by showing when contention on monitor locks occurs.  

Back to top
Next exercise