To have the sampling driver installed, you need to re-start the install process under the root account or contact your administrator. VTune — Wikipedia Summary — Microarchitecture Exploration Window: You can also create a custom analysis type based on the hardware event-based sampling collection. You may need kernel header sources and other additional software to build and load the kernel drivers on Linux. Unless data is being collected by the VTune Amplifier, there will be no latency impact on system performance. Summary — Memory Consumption Window: Event Count ijtel Hardware Events Window: When the VTune Amplifier collects an event, it attributes not only that event but the entire sampling interval prior to it often 10, to 2, events to the current code context. Results may vtjne depending on the nature of the analysis and the code to which it is applied.
|Published (Last):||17 August 2015|
|PDF File Size:||1.11 Mb|
|ePub File Size:||9.44 Mb|
|Price:||Free* [*Free Regsitration Required]|
Conclusion 1. You can also purchase the full version there. In this VTune amplifier tutorial, we examine only Algorithm Analysis intended for the analysys of an application work on the algorhithm level.
There is also the power efficiency analysis. You can find more details on the Intel official website. Related services 2. Usage Example Hotspots This type of analysis is intended to identify the most labor-intensive parts of the source code. We used the Hotspots analysis to find the cause for this behavior. We received the following results after launching the server via the built-in Hotspots analyzer: The results clearly show that the time is mostly consumed by the srv::CSocketTransport::Receive functions: On the Tasks and Frames tab we can see the thread distribution of the CPU load: The diagram clearly shows that the CPU load was mostly caused by the first tree threads.
VTune allows filtering the required thread for its detailed examination. We examine threadstartex 0xb90 in our case. Software also allows filtering the required part of the graph and provides the information on the stack in this part of the graph.
Thus only this part is displayed in the graph. The stack for this part is shown to the right of the graph. By default, VTune displays the part of the code whose execution consumes the most time. In our case, it is consumed while waiting for the response from the client, by the WSARecv function.
Thus we can make a conclusion that the problem is on the client side. For the client side analysis, we use remote analysis tools provided by VTune. First we need to create an installation package for a remote computer. To see an example of an analysis launching from the command line, select the required analysis type on the local machine, Hotspots in our case, and click the CommandLine button in the bottom right. A dialog window opens. It contains the command for launching this type of analysis with the current settings.
You can have a look at the ccommand line interface following this link. To examine the function in details, open the Bottom-up tab, where the list of the most capacious functions is displayed. While a big file is being written, if its parts do not follow each other in sequence, then according to the application logic, the whole cache is discarded, i.
But in our case, it appeared that almost all parts did not follow each other in sequence, and the cache size was 8MB. Because of this, on each attempt of the system to write 64KB, the client was sending 8MB. And the time was mostly consumed by the allocation of the buffer, which became clear from the results of the VTune analysis. Thus, with the help of the Hotspots analysis, we were able to localize an error in the logic of our application.
Related services Custom. NET Development Services Lightweight Hotspots This type of analysis is the most effective for the analysis of the code with a great number of small but frequently called functions. Unlike the Hotspots analysis, it is less resource-consuming. The load reduction allows setting the lower selection interval less than with the Hotspots analysis 1 ms, for example. We will use this type of analysis on a small test application. This application calculates the Nth Fibonacci number.
But it was slightly altered for test purposes. We received the following results after making all the preparatory operations and launching the Lightweight Hotspots analysis for the application: We can see that VTune found the modification easily.
But there is one more frequently called function: the BigInteger::Normalize class method. But in orded to get the exact information on the difference in the execution time, we will use the VTune integrated result comparison tool.
We received the following results: I. Read also: Analyzing Network Activities with Bro IDS and Intel Critical Stack Concurrency Analysis This type of analysis shows how the application uses the available logical processors and allows finding the potential candidates for parallelization.
As an example, we used an application that creates a certain number of files with a set size in a folder. This type of analysis is launched just as the previous ones, so will get right to the VTune results: VTune distributes the CPU load according to the following modes: - Idle — all the processors are in the standby mode, no process is executed.
The overall results show that the application works ineffectively. To improve the processors load we add another thread to the application. After launching the analysis one more time, we received the following results: We can see that the application uses the available resources more effectively after the modification. Thus using the Concurrency analysis you can see how effectively your application uses the available resources. Locks and Waits This type of analysis is meant for revealing one of the most widespread reasons for ineffective usage of paralleling in an application: the incorrect usage of synchronization objects.
This type of analysis was used to solve the problem with the server upload speed slowdown it was used in the Hotspots example. The slowdown occurred after adding the thread pool for the operation of writing to server.
We received the following results after launching the Locks and Waits analysis: The only unknown object from the Top Waiting Objects list was CriticalSection 0xac. The objects listed above it are the sync objects for the threads that were used in the server before the new feature introduction.
To deal with the critical section, we move to the Bottom-up tab: The results show that the idle time in this critical section corresponds to the writing operation. After searching the stack, we found out that this section got locked right on the execution of a new thread. After analyzing the whole stack of the thread, we found out that the locking of the section could be moved to the top of the stack. We received the following results after launching the Locks and Waits analysis for the modified version once more: As we can see the Critical Section 0xac is absent from the Top Waiting Objects list, and the upload to server became much faster.
Thus the Locks and Waits analysis allows effectively configuring a multi-thread application. Intel Vtune Amplifier alternatives And now an untypical for a Vtune tutorial part. While the software is a good profiler, not everyone can afford it, that is why we have performed a research to find other utilities for application performance analysis. As VTune integrates into Visual Studio, we examined the profilers that work in this environment. Some alternatives were found.
The first and, perhabs, the best alternative is the profiler integrated into Visual Studio It provides 4 types of analysis: CPU sampling — estimates the CPU load; Instrumentation — counts the number of function calls and their execution time.
NET memory allocation — tracks the allocation of memory; Concurrency — examines the sync objects. You can find more details on this profiler here.
The second option we found is CodeAnalyst Performance Analyzer. Conclusion After analyzing the VTune main features, we can say that it has an easy-to-use interface and very rich functionality.
This article is just a short VTune tutorial and not features were considered. To study the project in details, we recommend reading the information on the official website. We can make a conclusion that using VTune you can easily create an optimal configuration for your application.
Memory Access Analysis for Cache Misses and High Bandwidth Issues
Vtune Tutorial: How to Use Intel Vtune Amplifier
Build and Install the Sampling Drivers for Linux* Targets