New visualization to help better understand the Spark
Before, we showed in Spark1.4.0New visual function;Release 1.4: the Spark SparkR,
exhibited on the tungsten plan"[Chinese]), in order to better understand the Spark the behavior of the application.Then this topic, this blog will highlight for understanding the Spark Streaming applications and introducing new visual function.We've updated the Spark in the UI Streaming TAB to display the following information:
; The timeline view and event rate statistics, schedule delay statistics as well as the previous batch
; Each batch of all details of the JOB
In addition, in order to understand the context of job execution, Streaming operation directed acyclic execution graph visualization (execution DAG visualization Increased Streaming information).
Let's pass a Streaming application example analysis in detail from the beginning to the end have a look at these new functions.
The timeline and histogram of dealing with the trend
When we debugging a Spark Streaming applications, we want to see more data is be received at a rate of about what and how much is the processing time of each batch.Streaming TAB in the new UI allows you to easily see the current value and the trend of 1000 batches before.When you run A Streaming application, if you go to visit the Spark in the UI Streaming TAB, you will see something similar to figure 1 below (red letters, for example, [A], is our comments, and is not A part of the UI).
Figure 1: the Spark in the UI Streaming TAB
The first line (marked [A]) shows the current state of the Streaming applications.In this example, the application has run at 1 second batch interval for nearly 40 minutes;Below it is Input rate (Input rate) of the timeline (marked as [B]), shows the Streaming applications from the source of all its receive data at a speed of about 49 events per second.In this case, the time axis shows the location in the middle (labeled [C]) has obvious drop, at an average rate in the timeline where the end of the application again.If you want to get more detailed information, you can click on the Input Rate (close to [B]) beside the drop-down list to show the timeline of each source, as shown in figure 2 below:
Figure 2 shows the application has two sources of (SocketReceiver - 0 and
SocketReceiver - 1), one of the leading to a fall in the receiving rate, because it is in the process of receiving data stopped for a period of time.
And down the page (marked as [D] in figure 1), the Processing Time, Processing Time), according to the timeline about the batch is processed in an average 20 milliseconds to complete, and batch interval (in this case is 1 s) less than the cost of Processing Time, means scheduling latency (is defined as: a batch before waiting for batch Processing is completed, is marked as [E]) is almost zero, because the batch is created has been dealt with.The schedule delay is Streaming quote program is the key to stable, UI new functions make it easier to monitor.
Refer to figure 1 again, you may wonder, why some of the batch to the right take longer to complete (note that the [F] in figure 1).You can through the analysis of the UI easily.First of all, you can click on the timeline view batch point with longer time, it will be at the bottom of the page to create a list of detailed information about the complete batch.