Using Performance Charts

Prev Next

Standard Charts

A good place to start troubleshooting issues is with Catchpoint's standard performance charts, since each chart combines related metrics gathered during testing. The following standard charts are available (availability varies depending on the type of test evaluated):

Response Times This chart includes Response and Webpage Response times, as well as the Availability percentage. This gives a quick look to see if there are any major issues. For instance, high values for both Response and Webpage Response time could result from a problem with the DNS server, connectivity to the host server, or the application itself. In this case, your next step might be to request the DNS chart (to see if it is DNS related), the Network chart (to see if the problem is connectivity related) or you could look at the Components chart to review all components together. An expected value for Response Time but a high value for Webpage Response time could indicate there is an issue with the elements referenced in the HTML of the test, so your next step might be to request the Webpage chart to examine the webpage metrics in closer detail.

DNS Performance The DNS chart illustrates the time it took to establish a connection to the DNS server and compares it to the Server Response time for the webpage. A higher than expected DNS time with no increase in Server Response indicates an issue with the DNS server or between the Catchpoint node and the DNS server. If there is no change in DNS time, but the Server Response and TTFB are high, the problem may have been with the network or application. An appropriate next step would be to draw a Network chart to see if it is connectivity related, or a Components chart to examine all component-related metrics together, which allows you to measure DNS times against the other components.

Network Connectivity The Network chart can help you confirm or rule out a problem with the network between the Node and the Server of the Primary URL. It is a very useful chart if the Enable Ping Monitor option is turned on for a test; otherwise, the chart simply displays a subset of the metrics from Components chart. The Network chart includes the Connect, Wait and Load times from the HTTP request to the Test's URL, as well as Ping Round Trip and % Ping Packet Loss. Ping metrics are particularly helpful in ruling out a network issue. If there is an increase in packet loss or ping round trip time, this is a definite indicator that your test is experiencing network issues.

Request Components The Components chart includes metrics that make up the HTTP transaction (DNS, Connect, SSL, Wait, Load, Response and # Test Errors). This view provides a quick way to troubleshoot DNS, Connectivity, and Application issues at the same time. A rise in all values usually indicates an overall network connectivity issue (where moving to the Network chart to see ping-related metrics is a logical next step), while rises in individual metrics would point to other issues.

Webpage Performance This chart can quickly reveal problems related to webpage content and includes the following metrics: Response, Webpage Response, DOM Load Time, # of External Hosts, % Availability, % Content Availability, and # JS Errors per Page. # of External Hosts will be higher if the page includes tags that deliver dynamic content. If the chart shows that Content Availability is dropping, you should review the Waterfall data to identify which element is failing. # JS Error per Page shows the average number of JavaScript errors, which could be an indicator of coding issues if the value is high.

Customizing Charts

While standard charts provide a great place to start with troubleshooting issues, it is often necessary to tailor charts to pinpoint a problem. There are two additional chart types to go along with the standard charts:

  • Scatterplot: The scatterplot chart displays the exact time measured at each node for a test during a given time period for a single metric. This shows you the raw data for each metric, instead of averaging the data into a single line. Each successful test is displayed as a blue dot, and each unsuccessful test is displayed as a red diamond.
  • Statistical Chart: The statistical chart analyzes a single metric using multiple statistical calculations. The statistical chart displays the following values: Average, Median, 99th percentile, 95th percentile, 85th percentile, 75th percentile, Standard Deviation, IQR, Geometrical Mean and Geometrical Standard Deviation. This allows you to analyze trends using different calculations, which may react more quickly to changes in performance.

Once a chart is displayed, you can use the settings sections to refine the data. Performance chart settings allow you to adjust certain parameters like:

  • Time Frame: This defaults to the last hour, but it can also view the last month of data, or you can select a range other than up to the present using the calendar selectors. You can also specify the interval at which the data is represented.
  • Chart Dimension: Used for changing what parameters are represented by the x and y-axis.
  • Breakdown:  Allows you to split the chart into multiple graphs or split data into multiple lines on the graph. The breakdown has an additional option available when viewing a single metric that splits the first breakdown parameter into columns.
  • Statistical Values: Lets you to analyze trends using different calculations, which may react more quickly to changes in performance.
  • Trim Timing Metrics: With this setting, you can view values in defined ranges, and you can choose to view "Only Failures" or "Only Successes".
  • Time Filter: Can be used to ignore certain times that may not be of interest.
  • IP Filter: Lets you hone in on specific IP addresses.
  • Tracepoint Filter: With this filter, you can choose which Insight tracepoints to leave out.
  • Location Filter: Filters out selected nodes to help identify problems in specific regions or ISPs.

Next to the test type drop-down selector, there are two buttons labeled "Host" and "Zone". When a chart has been drawn, you can use "Host" as a filter to select what hosts to display data from. Zone work similarly to Host, and the following metrics are available for Zone charts:

Webpage

The metrics on the Webpage menu examine the data collected for the zone during the test run.

  • WEBPAGE RESPONSE (MS) — the time of loading the requests of the zone, from start to end - not counting gaps where no requests occurred.
  • # CONNECTIONS — the number of TCP connections created by the requests in the zone.
  • BOTTLENECK TIME (MS) — number of milliseconds the requests of the zone were the only requests occurring.
  • % BOTTLENECK — what % of Document Complete for the page was the zone a bottleneck for.
  • # ITEMS (TOTAL) — the number of requests matching the zone.

Availability

The metrics on the Availability menu examine any errors for the zone encountered during the test run.

  • % AVAILABILITY — percentage of the time where at least one request of the zone loaded successfully.
  • % CONTENT AVAILABILITY — percentage of the time where all of the requests for the zone loaded successfully.
  • # CONTENT LOAD ERRORS — the number of requests for the zone that failed.
  • # ZONE FAILURES — the number of times a host failed.

Standard Visualizations

Visualizations provide different views and customization for charts.  These visualizations can be found directly right to the heat map chart icon.  Below are the standard visualizations:

  • Data Bars: Provides — a bar graph of the average value for metrics chosen for each test
  • Historical KPIs — Provides a historical comparison of the average for the metrics chosen
  • Sparklines â€” Provides line chart using the selected metrics of the time frame chosen
  • Table â€” Creates a table of time, test, and the metrics chosen.

Custom Visualizations

Custom Visualization can be used to tailor standard visualizations or to create a brand new visualization.  To create a custom visualization, please follow the steps below:

  • Hover over the visualization chart tab
  • Select "Create Visualization" at the bottom
  • Name the custom visualization and enter a description
  • Select a chart icon
  • (Optional) Check off Draft, Raw Data, and/or Hide Summary Data
  • Select Dimension and Breakdowns preferred
  • (Optional) On the right, select the library to include
  • Add script for the custom visualization
  • On the lower right corner, save the custom visualization

Troubleshooting

DNS Issues If DNS values increase, but other metrics on the Components chart have not changed, this means that the issue lies with the DNS server. The first step is to identify if the problem is across all the nodes, or specific to only specific nodes.

  • If all nodes are experiencing DNS increase, and your DNS servers are not distributed, you might be experiencing network problems or performance/load problems with your DNS servers.
  • If only certain nodes are experiencing problems, and your DNS servers are not distributed, there might be network issues between the nodes and your DNS servers.
  • If only certain nodes are experiencing problems and your DNS servers are distributed in different cities and data centers, there might be network issues between the nodes and certain DNS servers or there might be performance/load issues at certain DNS servers.

In all of the above situations, you should perform the following tasks:

  1. View the Debug on Error data in the waterfall if the On Failure: Debug option is enabled for the affected tests. When this option is enabled, the system automatically runs the Ping, Trace Route and DNS Traversal monitor when the test experiences a failure or Alerts are set up for the test using the Timing: DNS subtype
  2. Run a DNS Traversal targeting the nodes experiencing the problem using the Instant Test tool. This will show you the location from which DNS is resolved, and the performance of the authoritative DNS name servers. Test results also display the results of ping packets sent to each of the servers, which can help you identify if there are network issues to the DNS.
  3. Run a traceroute to the DNS Servers targeting the nodes experiencing the problem using the Instant Test tool to see if there are network issues between the node and the DNS server.

Connectivity Issues If the test is experiencing connectivity issues, the following metrics will show an increase in value above their normal averages: Connect, Wait, Load, Ping Round Trip, Throughput and/or % Ping Packet Loss. Connectivity issues can be caused by problems at the Host side, the Internet or the network route between the node and the host, or at the node. Once you have determined a network issue is occurring, you need to determine to which of the following the issue is related:

  • The host of the test URL
  • A general internet networking issue affecting the communication between the node and the host
  • Node connectivity issues

To investigate further, you should use the Breakdown by Location chart setting to determine which nodes are experiencing problems. Once you have narrowed down the list of nodes with connectivity problems, use the Instant Test tool to perform a traceroute to the test URL and targeting the nodes experiencing connectivity problems. You can also look at the waterfall data for one of the intervals experiencing the problem and review the Debug on Error data if available. This will help in situations where the problem occurred in the past but has stopped since. The data from the traceroute can help you identify where the connectivity problem resides. You can also perform a traceroute from the host server to the targeted node’s IP address to determine if there are any issues in that direction.

Application Issues If your Components chart shows high values only for Wait and/or Load times, this may indicate a problem with the application handling the request. If your application or a system the application communicates with in order to handle the request is experiencing a problem, the first metric to be affected will be Wait time. Wait measures the time the agent waited to get a response back from the server after it connected to the server and sent the request. The main factor that affects this metric is the application that is processing the request. It is very unlikely that your tests would experience increased load times without other results being affected. Generally, a spike in the load just indicates that the application is experiencing heavy volume and is not handling it properly. There can be several reasons for high Wait and Load times:

  • The application server is overloaded
  • The server is not handling request volume efficiently
  • The application is relying on a database, file server or other resource that is having performance issues
  • There is a bug in the code
  • There was a new application release, and the new release is slower in processing the request
  • The application or server is configured incorrectly

If wait and load times are increasing, you may want to look at a Waterfall chart to further investigate the metrics that have an impact on how the request is retrieved. You could also set up additional metrics using Catchpoint Insight to troubleshoot the issue further.

Investigating Application Issues with Catchpoint® Insight
If something is wrong with the application, you will need to investigate through your internal monitoring system. However, you could use Catchpoint Insight to retrieve KPI metrics about key factors that affect the response of your application to a request, such as CPU utilization, I/O, request volume, if elements were cached or retrieved, or the time to get data from a resource (for example, a database, a file server or an API). For example, if CPU utilization is high, this may indicate an overloaded server. If the application issue is caused by an external application or server, or your application uses a distributed system, it may be difficult to identify which server, or application, is providing the data. You could use another feature of Catchpoint Insight, Tracepoints, to retrieve trace information about the applications and systems involved in handling the request, and review this data in the Waterfall chart. For example, you could retrieve the Hostname of the web server handling the request and the name of the database to which that server connected to handle the request.

Webpage Content Issues If you have determined that the performance problems are caused by the content of the webpage, you should look at the waterfall data for the time interval in which the issue occurred to examine each element individually. This can help identify whether the issue is caused by one of the following:

  • Certain requests or hosts are failing
  • Certain requests or hosts have poor performance
  • JavaScript errors are affecting performance

When certain requests or hosts are failing, or have slow response times, this can affect the webpage performance. This is especially problematic when the requests are external script calls. External script calls are considered blocking calls by most browsers, which causes the browser to only load this request. As a result, the webpage response is delayed by the amount of time that this request took. You can troubleshoot the request performance issue using the same tips described above for troubleshooting the main test URL:

  • If only DNS values are high, this could indicate an issue with the DNS server of the URL in question or between the Catchpoint node and the DNS server.
  • If only Connect values are high, this could indicate an issue with the host server of the particular URL.
  • If the Wait and/or Load values are high, this could indicate an application issue.
  • If all components spiked, this could indicate a network issue. To locate the issue, run a traceroute to the host in question using the Instant Test tool.

Failures When the % Availability metric is below 100%, this means that Catchpoint was unable to load the test properly. In this situation, you could use the Components chart to identify the type of failure or review the Waterfall chart to find the specific element on the page that failed. In addition to the component metrics, the Components chart includes a graph that displays the number of times that the test encountered the following failures: DNS, Connection, SSL, Response (indicated by 400 and 500 Response Codes) and invalid tests. This graph shows you the relationship between the different failures, which helps you identify the location and severity of the problem.

  • If your chart shows DNS and Connect failures, follow the steps for troubleshooting DNS and Connect performance issues as listed above. See Troubleshooting DNS Issues and Troubleshooting Connectivity Issues.
  • If your chart shows SSL failures, examine the Waterfall chart to identify the type of failure and the on which it occurred host server it occurred. Once you have identified the element(s) experiencing a failure, validate that everything is set up properly on the server(s) in question.
  • If your chart shows Response failures, examine the Waterfall chart to identify the element(s) experiencing a failure and the response code retrieved from the server for that element (see Failure Codes for a list of the possible response codes).
  • If your chart shows invalid tests, this means that the URL did not meet the limits set up by Catchpoint. For example, Catchpoint does not allow URLs that redirect more than 5 times in a row.

Footnote

*Note that filtering tracepoints that contain comma(s) in its name is not supported. Catchpoint uses a comma delimited list to filter tracepoints and a comma will split a tracepoint into two separate tracepoints.