Главная / Программирование / Introduction to performance optimization using Intel SW tools

Introduction to performance optimization using Intel SW tools - ответы на тесты Интуит

Правильные ответы выделены зелёным цветом.
Все ответы: The course concentrates mostly on application performance improvements with Intel Compiler and VTune Amplifier.
The Control Unit functions are
(1) instruction decoding
(2) ALU operating
(3) data transfer
(4) instruction execute
(5) ALU startup
What is VTune™ Performance Analyzer for?
(1) optimizing applications
(2) analyzing application performance
(3) decreasing application size
(4) speedup application compilation
What compilers Intel® provides?
(1) C/C++
(2) Java
(3) Oberon
(4) Forth
(5) Fortran
(6) C#
What is the Loop Stream Detector for?
(1) for dividing big loops into a set of small
(2) for eliminating the loops
(3) for making small loops run faster
(4) for eliminating the sampling and decoding instructions for small loops
Loop vectorization is
(1) change scalar operations to vector
(2) raster to vector image conversion
(3) internal processor function
(4) compiler optimization
What is the processor core?
(1) a part of the processor fetching and executing instructions
(2) a part of the processor working with shared cache
(3) a part of the processor executing arithmetical operations
OpenMP is:
(1) compiler option linked to output binary
(2) technology to execute code on multiple computers
(3) technology for parallel computing on the shared memory systems
(4) technology for parallel computing on the distributed memory systems
What of the following could be considered as a good style of programming?
(1) unconditional jumps
(2) loop exit operators
(3) modularity
(4) labels
(5) short variable names
(6) none of the answers
What disadvantage does static profiler has?
(1) working only with static variables
(2) ignores pointer variables
(3) do not supports full type system
(4) not precise branch prediction
(5) it is necessary to run static profiler for the second time before the program execution
System bus used for
(1) data transfer
(2) CPU parts interconnection
(3) command execution
(4) data storage
VTune supports:
(1) only C/C++
(2) only languages, supported by Intel compiler
(3) only languages, supported by Microsoft Visual Studio
(4) none of the answers
Internal representation is
(1) assembler code
(2) graph at the most cases
(3) object-file code
Loop invariant code motion
(1) finding and deleting expressions independent from iteration variables
(2) finding and moving outside the loop expressions independent from loop iteration variables
(3) moves loop invariants to another loop
(4) none of the answers
SIMD is:
(1) computation principle provides data parallelism
(2) instruction system for multiuser access
(3) "simple data multiple instructions"
(4) a type of computer memory
(5) "Single instruction multiple data"
Choose the characteristic corresponding to distributed memory systems:
(1) Each processor is completely autonomous. There is a communications medium.
(2) All processors are equidistant from the memory. Communication with the memory via a common data bus.
(3) Memory is physically distributed among processors. Single address space is supported at the hardware level
OpenMP uses the following model of parallel execution:
(1) refork model: each time parallel loop starts new threads are created
(2) fork-join model: all threads are created at the first parallel region and then used
(3) queue model: each parallel task is queued; different executors are taking tasks from this queue concurrently
What is variable scope?
(1) part of the program where no such name can be defined
(2) memory range can be used as a pointer value
(3) variable value range
(4) context where this variable is defined
(5) basic block where this variable is defined
Dynamic profiler benefits are
(1) statistics is not used
(2) more precise branch prediction
(3) basic block dynamic estimation
(4) more optimizations are available
CPU timer speed is
(1) frequency of the timer which provides the synchronization inside the processor
(2) value of the processor clock frequency
(3) speed of the most shorten command
(4) minimal quant of the equations
(5) sychro-impulse frequency, send by timer
What analysis types are included in VTune?
(1) Hotspots
(2) Locks and Waits
(3) Valgrind
(4) Concurrency
Data flow analysis is
(1) analysis of bus data transfer
(2) method to avoid data loading to cache memory
(3) collecting information on variable's values
What optimization is inverse for loop fusion?
(1) loop unswitching
(2) loop unfolding
(3) loop interchange
(4) loop unrolling
(5) loop distribution
What is vector instruction for the compiler?
(1) adding vectors
(2) vector multiply by matrix
(3) vector folding
(4) vector substraction
(5) vector power
What qualities does distributed memory systems have?
(1) good scalability
(2) different latency for the different parts of memory
(3) good inter processor communication
(4) the high cost of cache subsystem synchronization
What could be performed to save the last state of the variable into master thread after the parallel block?
(1) add this variable to private set
(2) add this variable to lastprivate set
(3) add this variable to lastshared set
(4) this kind of behavior is not allowed by the technology
What are disadvantages of the procedural-level optimizations?
(1) they could be used for procedures and not for functions
(2) only constant length arrays are used
(3) can't be used in a highly loaded functions
(4) every function call is "black box" for them
(5) can't work with record type
Dynamic data is useful when
(1) loop iterator changes frequently
(2) loop iterator value can be out of data type range
(3) you need to create large data structure which size in unknown at compile time
(4) you need to store an integer and a string at the same variable
(5) you have no run statistics
What of the following will not cause any change in processor performance?
(1) CPU clock
(2) branch prediction quality
(3) operation system
What are locks and waits for?
(1) detecting critical memory parts and corresponding critical code parts for each piece of memory
(2) collecting thread wait count and time
(3) collecting function call sequence
SSA-form is
(1) representation where each function executed only once
(2) representation where each variable is used only once
(3) representation where each variable is set only once
(4) final representation after whole set of optimizations
What is loop peeling?
(1) optimization transforming a loop into two or more loops
(2) optimization tries to simplify loop by detaching useless iterations
(3) optimization tries to simplify loop by detaching extreme iterations
What size do xmm registers have?
(1) 16 bit
(2) 32 bit
(3) 64 bit
(4) 128 bit
(5) 256 bit
What disadvantages does distributed memory systems have?
(1) slow inter processor communication
(2) poor scalability
(3) the high cost of cache subsystem synchronization
(4) different latency for the different parts of memory
nowait directive is used for:
(1) avoid "Press any key to exit" at the end of the program
(2) tell the compiler to start executing before init ends
(3) to disable synchronization at the end of the loop
(4) to disable additional delays in the threads
What is node in call graph?
(1) program and the time of the call
(2) program without a time of the call
(3) variable of the program
(4) function of the program
(5) basic block
Choose the correct statement(s)
(1) Register count is usually much greater than the number of variables in the program
(2) Brush painting method is used for register allocation
(3) One of the basic code generator's tasks is register allocation
Superscalar is
(1) processor, specialized for scalar operations
(2) processor that execute more than one operation at a tick
(3) none of the answers
What may be cause of ineffective resource utilization?
(1) resource concurrency
(2) bottlenecks
(3) big amount of data dependencies
(4) sequential command execution
May one compiler have two different Front Ends?
(1) no
(2) only when both of them for the same language
(3) only when Petrov criteria is satisfied
(4) only for two, not for three
(5) yes
How loop unrolling is provided?
(1) big loop is divided into big number of small sequential loops
(2) by grouping small iterations into one big
(3) by substitution functions to the call sites
What is packed data type?
(1) data type without zero bits
(2) data packed by Huffman
(3) a special type used for archivation
(4) vector component data type
(5) scalar forming data type
What are multi-threading applications pros?
(1) memory amount is decreased according to the kernel count
(2) computational resources are increased according to the kernel count
(3) processor instruction set is decreased
What directive is used to avoid incorrect concurrent usage of the lval variable?
(1) SEMAPHORE
(2) TRASACTION
(3) ATOMIC
(4) CHECK SHARED
(5) CONTROL SHARED
Static call graph is
(1) program calls on representative data set
(2) call graph without dynamical variables
(3) call graph, built at compilation of the program
(4) call graph, built statically
(5) such notion is not exists in this course
Register allocation includes
(1) variable the live range identification
(2) mark the most notable registers
(3) sort registers by size
(4) allocate memory for register block
What is used to send data between the processor and the memory or between the processor and the devices?
(1) system registers
(2) ALU
(3) system bus
(4) RAM
What is the part of the syntax analysis in the compiler?
(1) grammar analysis
(2) puncting analysis
(3) polymorphical analysis
(4) lexical analysis
(5) protosyntax analysis
Choose the correct statements for the code: DO I=1,N S1 A(I) = B(I) + 1 S2 B(I+1) = A(I) – 5 END DO
(1) there is the dependency - <S1,S2>
(2) there is loop dependency - <S2,S1>
(3) there is data dependency inside the loop
(4) there is control dependency inside the loop
(5) there is no dependencies inside the loop
What is /Qvec-report used for?
(1) to drive vectorization during the compilation
(2) to report vectorization at the execution time
(3) to report vectorization at the compile time
(4) do drive vectorization heuristics
What is an automatic parallelization propose?
(1) to utilize multi-processor resources without code rewrite
(2) to make more qualified than manual parallelization
(3) to free programmer from the difficult and tedious manual parallelization
What option used to determine multi-thread iteration distribution?
(1) SCHEDULE
(2) TT
(3) FORALL
(4) THREADS
This command line parameter is used to enable inter-file optimization
(1) /Qipf
(2) /Qmulti-file
(3) /Qipo
(4) /Om
How data dependencies are used in the code generation?
(1) to create missing parts of the code
(2) to determine the register reuse abillity
(3) to take interference functions to the separate graph
(4) to find a code didn't match the representative data array
The ability to perform multiple operations at a tick is
(1) vectorization
(2) hardware prefetch
(3) superscalarity
(4) pipeline
(5) hyperthreading
What criteria of connecting statements into a list inside Intel compiler
(1) previous and next
(2) minimal work principle
(3) next lexem principle
(4) equivalence principle
(5) principle of non- equivalence
When the dependency <S1,S2> is anti-dependence?
(1) S1 X=… S2 …=X
(2) S1 …=X S2 X=…
(3) S1 X=… S2 X=…
What are necessary conditions for auto-parallelization?
(1) the absence of dependencies within the loop
(2) the absence of the loop nesting
(3) special manual preparation is always required
What is alias analysis?
(1) analysis when the information from different procedures is aliased
(2) analysis aliased to build
(3) search for the variables that could point to the same memory cell
(4) search for the functions aliased temporary
(5) search and analysis of the functions, executed in parallel
How structure field reordering could affect the application performance?
(1) by simplifying the loops
(2) by making results less precise
(3) by decreasing cache misses
(4) by reducing the number of conditional branches
Hardware prefetching used for
(1) parallel calculations
(2) predict the address of required data and load it into the cache
(3) increase bandwidth of the processor
Basic blocks are
(1) blocks of the visual program constructor
(2) code, provides most of equations
(3) code without jumps and labels
(4) entrance program code
(5) entrance function code
What is iteration vector?
(1) integer vector, each of the components represents an iteration variable value in order of loop nesting
(2) integer vector, each of the components represents an iteration variable value in order of increase
(3) integer vector, each of the components represents an iteration variable value after the iterations
What directive will force compiler to parallelize following loop?
(1) #pragma concurrent call
(2) #pragma concurrentize
(3) #pragma prefer serial
(4) #pragma serial
What is inlining?
(1) variable substitution
(2) common expression substitution
(3) global equation substitution
(4) function body substitution
When the linked list is stored inside the memory
(1) object placement can be not sequential
(2) elements can be destroyed any time by garbage collector
(3) list cannot contain more than five elements
In out-of-order execution instructions scheduled according
(1) to their order in pipeline
(2) to the evaluation of their operands
(3) to the branch prediction
Choose the correct statements
(1) statement is a minimal independent unit of the programming language
(2) program is a sequence of statements
(3) variable is minimal statement
(4) statements could be arranged lexically and by data flow graph
(5) statement is a tree of statements
(6) statement consist of the expressions
What is constant folding?
(1) vector replace by scalar
(2) one of the scalar optimizations
(3) operation is inverse for vectorizing unfolding
(4) iteration dimension decrease
What is OpenMP?
(1) is a software interface that supports multi-platform programming for multiprocessor computation systems with shared memory
(2) is a software interface that checks parallelization correctness
(3) is a software interface to estimate parallelization profitability
What is memory diambiguation?
(1) search for the objects, that may overlap in memory
(2) search for the objects, taking too much memory
(3) search for the objects, that could be not initialized
Number of ticks, required to transfer one unit from the memory is
(1) latency
(2) system clock
(3) bandwidth
Control flow graph
(1) determines the order of the statements in the source program
(2) determines all paths that would be passed during equations
(3) determines possible ways of control passing from one block to another
(4) determines all possible ways of control passing
What is FLOW dependency?
(1) READ after WRITE
(2) WRITE after WRITE
(3) READ after READ
(4) WRITE after READ
How parallelization in Intel compiler is implemented?
(1) The multiple instances of loop thread function are executed in different streams with different values of the boundaries
(2) Iterative loop space is divided into several parts and each is given to the separate thread
(3) Multiple function instances are run in the same thread with different values of the boundaries
(4) Iteration space is divided into parts, this parts are processed sequentially
What is - ansi-alias for?
(1) to enable ANSI aliasing in optimizatoin
(2) to diable ANSI aliasing in optimization
(3) to allow providing some more aggressive optimizations
Tree of expressions is
(1) paragraph in language manual
(2) a short way to define a language syntax
(3) short identifier to remind the meaning of the statement
(4) equations notation
(5) tree with exact lexems at the leafs
How prefetch can be invoked?
(1) with compiler directive
(2) with compiler option for automatic instruction placement
(3) using intrinsic
(4) buying and installation of special prefetch program
How compiler determines a case when it is better to perform inlining?
(1) only by compiler directives and source code pragmas
(2) only by directives
(3) it is performed always when is not disabled by options
(4) there is an heuristic method for that
(5) inline is not performed while it is not demanded by pragmas
Operations in a expressions tree
(1) are not exist
(2) could not be placed at leafs
(3) must be deleted at generation phase
(4) must be complete lexems
What is used to suggest function for inline?
(1) command line option -Qinline<func>
(2) pragma #pragma inline before the body of the function
(3) keyword forceinline in function definition
(4) keyword inline in function definition
SSA is
(1) SSe Alignment
(2) Simple Singles Alignment
(3) Static Single Assignment
(4) Sign Standard Association
What is function cloning?
(1) create clone-function, executed at the other device
(2) create clone-function, executed on built-in video chip
(3) copying function body and using separate optimizations for new instances
(4) parallel execution on cluster and super-computer
Which of the following is required to keep the equation equivalence
(1) equivalent input leads to equivalent output
(2) instruction scheduling is independent of the input data
(3) results obtained in the same order
(4) equations use the same processor-dependent instruction set
What is the goals of ALU
(1) instruction decoding
(2) drive itself
(3) data transfer
(4) ariphmetical operations
(5) device interconnection
What kind of information is obtainable via VTune?
(1) where the time is spent
(2) why the program is not effective
(3) where the dead code is
(4) where the code has wrong formatting
(5) where to improve the code
What platforms are supported by Intel compilers?
(1) Windows
(2) Linux
(3) MacOS X
(4) FreeBSD
(5) Solaris
What is required for most of the loop optimizations
(1) definite total iteration count
(2) no jumps outside the loop
(3) no if operators inside the loop
(4) no unknown function calls inside the loop
MMX technology provides:
(1) a set of instructions to operate packed integer data types
(2) program package for multimedia
(3) additional registers
(4) fast floating point operation set
(5) additional processor module for audio and video conversion
What seriously limit modern system performance?
(1) memory amount
(2) processor cpu clock rate
(3) memory access speed
(4) 2 level cache existence
For parallelization it is required to:
(1) use compiler command line option -Qomp. Compiler will choose regions to parallelize
(2) code could be enclosed by #pragma omp parallel start
(3) code for parallelization have to be moved to separate function, marked with intrinsic __omp_parallel
(4) parallel code have to be combined to blocks started with pragma #pragma omp parallel
What of the following could be considered as a bad style of programming?
(1) loop constructions
(2) global variable usage
(3) long functional names
(4) if operators
(5) none of the answers
What is the source for branch prediction in static profiler
(1) input data
(2) base block analysis
(3) iterprocedural analysis
(4) inlining-analysis
(5) run statistics
What is CPU speed?
(1) number of data transfers
(2) average command execution time
(3) bus data transfer speed
(4) number of tasks, executed concurrently
What operation system VTune supports?
(1) OS/2
(2) VAX-VMS
(3) PDP-11
(4) Windows
(5) Linux
Expression is
(1) assignment
(2) expression tree
(3) constant
(4) variable
Why performance is improved when invariant is moved out of the loop?
(1) because invariant code has no sense and could be deleted
(2) because it is more effective to calculate them in another loop
(3) because this values stay unchanged each iteration and could be evaluated outside of the loop only once
Which of the following command line options will build a binary for any processor?
(1) -QxSSE4_1
(2) -arch:SSE3
(3) -QaxSSE3_1
(4) -QxSSE3
(5) -arch:SSE2_2
(6) -QaxSSE4_2
(7) -QxSSE2
Choose the characteristic corresponding to shared memory systems:
(1) Each processor is completely autonomous. There is a communications medium
(2) All processors are equidistant from the memory. Communication with the memory via a common data bus
(3) Memory is physically distributed among processors. Single address space is supported at the hardware level
What pragma is used to parallelize loop:
(1) #pragma omp parallel for
(2) #pragma omp parallel while
(3) #pragma omp single
(4) #pragma omp set parallel for
What benefits would give correct code formatting?
(1) inlining is faster
(2) variable scopes are separated
(3) it is easier to read source code
(4) compiler optimizations are simplified
(5) none of the answers
What is required for dynamic profiling
(1) collect run statistics
(2) temporary remove all static variables
(3) add heap freeing procedure to the beginning of the program
(4) choose the most representative input data
(5) check program code for inlining
Choose the correct statement
(1) hardware prefetch mechanism tries to guess a memory access plan to load the data before it will be actually accessed
(2) caching technique uses spatial locality principle
(3) cache aliasing occurs when the data placement is good and registers are loaded without any instructions
What are functions of the Hotspots?
(1) detecting places with potentially ineffective code
(2) showing thread activity
(3) showing microarcitecture problems
(4) provides binary instrumentation of the user program
Set Uses[b] contains:
(1) variables, defined inside a block
(2) variables used in the block, but have no definitions within the block
(3) definitions, that reaches "b" block
Why performance could be increased after the loop distribution?
(1) because of memory reference improving
(2) because of loop iteration count decrease
(3) because of working with big arrays in parallel
What of the following is required to execute vector operation?
(1) vectors should form the complete basis in n-dimensional space
(2) vector collinearity absence
(3) vector normalizing
(4) at least one vector module is not zero
(5) none of the answers
What qualities does shared memory systems have?
(1) good scalability
(2) different latency for the different parts of memory
(3) good inter processor communication
(4) the high cost of cache subsystem synchronization
As a default all variables except local function variables and loop iterators are add to
(1) private set
(2) shared set
(3) lastprivate set
(4) firstprivate set
What are disadvantages of the procedural-level optimizations?
(1) they could be used for procedures and not for functions
(2) they have no disadvantages
(3) they can't use pointer type variable
(4) global variables properties is unknown
(5) vectorization is not allowed
Dynamic memory allocation is bad for
(1) data fragmentation
(2) no information passing between dynamic and static objects
(3) run statistics cannot be collected
(4) code loses its determinism
(5) arrays got the slower part of the memory
Modern Intel processors are
(1) CISC
(2) RISC
(3) hybrid of CISC and RISC
What event corresponds processor clock ticks?
(1) L2_LINES_IN.SELF.DEMAND
(2) BUS_TRANS_ANY.ALL_AGENTS
(3) CPU_CLK_UNHALTED.CORE
(4) none of answers
Statement M dominates N if
(1) there is a path from M to N
(2) there is a path from N to M
(3) any path from M goes through N
(4) any path to N goes through M
(5) any path from N goes through M
Choose the code resulting to the loop peeling for: p = 10; for (i=0; i<10; ++i) { y[i] = x[i] + x[p]; p = i; }
(1) p = 10; for (i=1; i<9; ++i) { y[i] = x[i] + x[p]; p = i; }
(2) y[0] = x[0] + x[10]; for (i=1; i<10; ++i) { y[i] = x[i] + x[i-1]; }
(3) y[0] = x[0] + x[10]; for (i=1; i<9; ++i) { y[i] = x[i] + x[i-1]; }
What size do ymm registers have?
(1) 16 bit
(2) 32 bit
(3) 64 bit
(4) 128 bit
(5) 256 bit
What disadvantages does shared memory systems have?
(1) slow inter processor communication
(2) poor scalability
(3) the high cost of cache subsystem synchronization
(4) different latency for the different parts of memory
What directive is used to create synchronization point?
(1) critical
(2) barrier
(3) atomic
(4) master
(5) sync
(6) stop
What information is corresponding to vertexes in a call graph?
(1) syntax construction nesting
(2) how functions calling each other
(3) program calls
(4) system utility calls
(5) system calls
Choose the correct statement(s)
(1) code generator aligns basic blocks in the memory
(2) code generator helps to generate big projects
(3) code generator performs machine optimizations
(4) code generator generates missing parts of the functions
Choose the wrong statement
(1) compiler is a part of microprocessor providing program translation to native code or assembler
(2) compiler translates source code to assembler or native code
(3) only compiler could open to the high level developer new processor abilities such as additional registers or new commands
What is critical code?
(1) code executed more frequently
(2) code, which could be deleted without any changes in results
(3) wrong code
(4) code, which is not proved to be correct
(5) undocumented code
To convert a compiler to different internal representation it is necessary to correct
(1) input data
(2) Front End
(3) Back End
(4) almost all the parts of the compiler
(5) only representation itself
When full loop unrolling is applicable?
(1) there's no such optimization
(2) to unroll small loops
(3) to unroll big loops
What is happened to zero bits in packed data type?
(1) nothing
(2) meaningless zero bits are omited
(3) packed by Huffman
(4) vectorized
(5) scalarized
What are multi-threading applications cons?
(1) developing is more complex
(2) thread creation has its overhead
(3) memory amount is decreased according to the kernel count
(4) data races/ resource concurrency
(5) thread synchronization overhead
What directive is used to mark a piece of code to be executed by master thread only?
(1) SOLO
(2) MASTER
(3) ONLYONE
(4) OWNER
(5) SUPER
(6) CREATOR
Dynamic call graph is
(1) call graph with pointer variables
(2) base block graph
(3) call graph, built during the program execution
(4) call graph, dynamically determines the operation system and uses necessary system calls
(5) statistical graph with temporal data at the nodes
Interference graph is built
(1) for register allocation
(2) for reverse engineering
(3) for the code contains mistakes
(4) for code without representative data array
Time latency (for RAM) is
(1) frequency of synchronizing impulses
(2) number of ticks required to get one unit of data from memory
(3) number of data units are allowed to send the processor at a tick
(4) number of operations the processor is able to perform at a tick
What is the part of the syntax analysis in the compiler?
(1) sematical analysis
(2) denotation analysis
(3) prevential analysis
(4) singular analysis
(5) grammar analysis
Required condition for dependency between S1 and S2 are the following:
(1) both of them use the same memory cell and modify it
(2) both of them use the same memory cell and at least one writes
(3) there is a path during the execution from S1 to S2
(4) there is no path during the execution from S1 to S2
What is __alignof__ used for?
(1) to align source text
(2) to tell compiler how to align objects
(3) to get the infromation on data type alignment
(4) to get the information on alignment of the variables
What information does /Qpar-report3 output?
(1) information on dependencies prevent vectorization
(2) information on iteration order
(3) reasons preventing code parallelization
(4) informs whether the parallelization is unprofitable
What of the following is schedule type?
(1) dispatch
(2) nodispatch
(3) runtime
(4) static
(5) serial
(6) ordered
This command line parameter is used to disable interprocedural optimizations
(1) /Qip-
(2) /Qip-disable
(3) /Qipo-disable
(4) /Qno-ipo
What instruction scheduling is useful for?
(1) precise branch prediction
(2) instruction parallelism increase
(3) compare program checkpoints against the schedule
(4) create interference graph
Bandwidth is
(1) maximum number of units that could be send to the processor at a time
(2) maximum number of operations that the processor can perform at a time
(3) maximum number of commands that could be loaded to the pipeline at a time
Statements could be arranged
(1) lexicographically
(2) graphosematically
(3) polydinamically
(4) semiiterationally
(5) semidenotationally
When the dependence <S1,S2> is output-dependence
(1) S1 X=… S2 …=X
(2) S1 …=X S2 X=…
(3) S1 X=… S2 X=…
Is it hard to measure optimization profitability?
(1) yes, because some performance effects are hard to estimate
(2) yes, because the iteration number can be unknown
(3) no, because auto-parallelization is loop permutation transformation
(4) no, because all the data are available at compile time
Aliasing could be occurred between
(1) any variables
(2) only pointer variables
(3) any functions
(4) any programs
What are the aims for the structure splitting?
(1) moving rarely used fields to separate structure
(2) group the fields by the data type
(3) group the fields by the basic blocks
(4) group the fields by its names
Pipeline is
(1) technique that increases a number of processed instructions at a time
(2) method predicting the next required data address
(3) mechanism to avoid the processor idle
Basic blocks are contained by
(1) data flow graph
(2) base block graph
(3) base equations graph
(4) equations graph
(5) main equations graph
What is required for loop dependency between S1 и S2 in a nested set?
(1) statement S1 on iteration i, statement S2 on iteration j access the same memory cell
(2) one of them writes to this cell
(3) both of them writes to this cell
What directive will force compiler to parallelize following loop if it is safe?
(1) #pragma prefer concurrent
(2) #pragma concurrentize
(3) #pragma prefer serial
(4) #pragma serial
What is the goals of inlining?
(1) function call overhead decrease
(2) analysis simplify
(3) naming simplify
(4) simplification of dereferencing
(5) more short names
How dynamic linked list memory placement can be improved?
(1) by disabling the garbage collector
(2) using speed inlining
(3) using containers
(4) using multiple processes
Number of units could be sent to the processor at once is
(1) latency
(2) memory amount and external memory rate
(3) bandwidth
Basic block is
(1) linear program chunk without jumps and labels
(2) a group of sequential instructions
(3) instructions with one previous
(4) a group of expressions inside one statement
(5) group of statements at the algorithm block diagram
What is OUTPUT dependency?
(1) READ after WRITE
(2) WRITE after WRITE
(3) READ after READ
(4) WRITE after READ
How auto-parallelization is connected with other optimizations in Intel compiler?
(1) when auto-parallelization is used no loop optimizations are available
(2) all loop optimizations are run before auto parallelization
(3) some loop optimizations are run after auto parallelization
What is demanded by ANSI aliasing?
(1) pointer can be dereferenced to the variable of the same
(2) pointer can be dereferenced to the object of any type
(3) pointer can be dereferenced to the object of primitive type
(4) pointer can be dereferenced to the object of the compatible type
Choose the correct statements
(1) expression is a tree of expressions with ending expression at its leafs
(2) every expression has list of predecessors and successors
(3) expressions could be arranged lexically and by data flow graph
(4) boundary expressions are constants and variables
(5) statement consist of the expressions
Loop optimizations are:
(1) permutation transformations
(2) data transformation
(3) variable transformation
What is passed as an argument to loop parallelizing function in Intel compiler?
(1) loop boundaries
(2) only thread number to calculate the rest of parameters
(3) all used objects
What is taken into account during the memory disambiguation?
(1) language properties
(2) Local Point To analysis
(3) interprocedural analysis
(4) program performance analysis
(5) attributes and command line options are set by developer
Leafs in expressions tree
(1) are the same with the other nodes
(2) satisfies lexigraphical order
(3) must be deleted at generation phase
(4) may contain variables
(5) not exists
What cons does prefetch has?
(1) software prefetch may displace useful data out of the cache
(2) prefetch is ineffectively implemented and its usage will slow down the application
(3) hardware support is always required
How developer could drive inlining process?
(1) by command line options and the source code of the program
(2) only by program source code
(3) this process could be only enabled or disabled in general from the command line
(4) this process in unmanaged due to specific algorithms
The advantages of SSA form:
(1) program is more compact
(2) def-use chains are obvious
(3) special intrinsic functions are used
(4) co-processor registers are used
(5) vector registers are used, that boosts the performance
What could be used to force function inline?
(1) #pragma forceinline before the definition
(2) #pragma forceinline before the call
(3) #pragma inline before the definition
(4) #pragma inline before the call
Choose the scalar optimization:
(1) excessive branching removal
(2) loop invariant code motion
(3) interprocedural code motion
(4) register coloring
(5) high speed inline
What is partial inlining?
(1) partial function substitution
(2) substitution defines pointer variable with name of the function inlined
(3) using basic block optimizations without any code transformation
(4) dynamical code substitution at runtime
Dependency is
(1) connection between the statements caused by same output value
(2) connection between the statements which not allow to change their executional order
(3) connection between the statements caused by same input variables
(4) connection between the statements caused by same output variable
Registers are
(1) memory of ALU
(2) memory of CPU
(3) interface data memory
What is the requirements of VTune?
(1) use only С++
(2) each source file must be not more than 150 kb
(3) header files must match to the VTune standard
(4) none of the answers
What is the functions of the compiler Front End ?
(1) program translation from source to internal representation
(2) scalar optimization
(3) code generation
(4) linking object files to binary
(5) semantical analysis
Choose code fragments which are good for optimizing
(1) for(i=0;i<U;i++) a[i]=b[i];
(2) i=0; do { a[i]=b[i]; i++; } while(i<U);
(3) for(i=0;i<U;i++) { a[j]=b[i]; j+=c*i; }
(4) for(i=0;i<n;i++) { a[i]=i; if(i<t) break; }
SSE is:
(1) technology applies single instruction to multiple data
(2) technology to execute code at server side
(3) streaming SIMD processor extension
(4) server configuration extension
(5) programming language
What types multiprocessor systems could be divided into?
(1) shared memory systems
(2) distributed memory systems
(3) random memory access systems
(4) non-uniform memory access systems
When using OpenMP variables behave as follows:
(1) inside a block all variables for different threads use separate memory address and could not interact
(2) all variables inside parallel blocks use the same addresses, so user could control concurrent workflow
(3) variables could be marked to have different or same memory cell
How global variables usage affects?
(1) none
(2) too few names are left for the local
(3) code size is increased because of the long global names
(4) harder to optimize this code
(5) harder to read and understand this code
Static profiler used
(1) for statically typed languages
(2) for languages without pointers
(3) to analyze the application without running
(4) to collect run statistics
(5) to estimate how static memory map is
x86 speed factors are
(1) CPU timer speed
(2) memory size and the speed of external memory access
(3) instruction set and instruction execution speed
(4) efficiency of the internal memory and register usage
(5) pipeline quality
(6) branch prediction quality
(7) hardware prefetch quality
(8) vector instruction quality
(9) parallelization and multi-core technology
What abilities does VTune has?
(1) Microsoft Visual Studio integration
(2) multicore and multiprocessor support
(3) wrong memory access detection
(4) processor event collecting
(5) processor wasted energy counting
Choose scalar optimizations
(1) constant folding
(2) copy propagation
(3) common subexpressions elimination
(4) loop invariant code motion
What is loop invariant?
(1) expression, independent of index variables
(2) expression, stay unchanged each iteration
(3) expression, used to check the condition for loop stop
(4) expression with global variables
What is condition for vectorization?
(1) loop dependency absence
(2) usage of special data types
(3) usage of <vector> module
(4) dependent instruction order after optimization kept the same
Choose the characteristic corresponding to non-uniform memory access systems:
(1) Each processor is completely autonomous. There is a communications medium
(2) All processors are equidistant from the memory. Communication with the memory via a common data bus
(3) Memory is physically distributed among processors. Single address space is supported at the hardware level
What identifier is not reserved for OpenMP:
(1) nowait
(2) prefork
(3) schedule
(4) reduction
What is the aim when the program is divided into functions and procedures?
(1) standartization
(2) serilization
(3) rasterization
(4) decompozition
(5) abstraction
Dynamic profiler differs from static
(1) static profiler has more precise branch prediction
(2) dynamic profiler uses statistics
(3) dynamic profiler doesn't uses statistics
(4) dynamic profiler works with pointers
(5) dynamic profiler performs seed inlining
Memory, which is directly accessed by processor is
(1) Random Access Memory
(2) system register
(3) system bus
What is profiling?
(1) binary instrumentation
(2) events collecting
(3) rebuild of the entire project
(4) use manual writing
To know what variables could be used inside the block, it is necessary to estimate:
(1) Uses[b]
(2) Killed[b]
(3) Reaches[b]
(4) Defsout[b]
What could be the reason for loosing performance while processing a big loop?
(1) big amount of operations
(2) if the loop works with big amount of arrays it may cause cache splitting
(3) register splitting by big amount of iteration variables
May four different variables became components of the same vector after the vectorization?
(1) never
(2) may
(3) only if they could be arranged lexigraphically
(4) only for add operation
(5) only for multiply operation
What qualities does non-uniform memory access systems have?
(1) good scalability
(2) different latency for the different parts of memory
(3) good inter processor communication
(4) the high cost of cache subsystem synchronization
Schedule clause accepts the following arguments:
(1) static
(2) dynamic
(3) variant
(4) object
(5) guided
(6) int
(7) realtime
(8) runtime
What are disadvantages of the procedural-level optimizations?
(1) variable serialization is not enough
(2) memory usage is not optimal
(3) procedure parameter properties is unknown
(4) they have no disadvantages
(5) loop optimization is not provided
Dynamic memory allocation
(1) leads to run statistics ignore
(2) uses memory manager
(3) used to distinguish basic blocks
(4) required for frequently changed variables
(5) necessary to maintain structured data
Why register access latency is lower than RAM?
(1) registers are placed inside CPU
(2) registers are placed inside the very fast cache memory
(3) they are accessed parallel with equations
What event corresponds wrong branch prediction?
(1) L2_LINES_IN.SELF.DEMAND
(2) INST_RETIRED.ANY
(3) BR_INST_RETIRED.MISPRED
(4) none of answers
Dominance frontier is
(1) bound between dominated and not dominated nodes
(2) bound between dominated and dominating nodes
(3) a set of nodes, dominated by one
What is loop unrolling for?
(1) to decrease conditional branches during the loop execution
(2) to divide a big loop into sequential small loops
(3) to decrease loop nesting
(4) to remove procedure calls
How many xmm registers does emm64t support?
(1) 4
(2) 8
(3) 16
(4) 32
(5) 64
What disadvantages does non-uniform memory access systems have?
(1) slow inter processor communication
(2) poor scalability
(3) the high cost of cache subsystem synchronization
(4) different latency for the different parts of memory
How many threads could enter the critical section at a time?
(1) 0
(2) 1
(3) 2
(4) any, because OpenMP sets no limit to the thread number
Why call graph may be considered as not full?
(1) properties of the some used libraries are unknown
(2) pointer data type is used
(3) single assignment is used
(4) full call statistics is unknown
(5) system call statistics is unknown
Choose the correct statement(s)
(1) code generator estimates jump distances
(2) code generator builds the code by representative data array
(3) code generator controls dynamic memory
Superscalarity is
(1) Ability to operate with vectors
(2) Ability to process more than one operation at a tick
(3) none of the answers
What conditions can prevent vectorization?
(1) nothing
(2) dependency between different iterations
(3) data dependencies
(4) lack of data dependencies
(5) lack of iteration dependencies
What part of the compiler depends on a language most?
(1) Front End
(2) Back End
(3) internal representation
(4) code generation
(5) profiling
Choose the correct statements for this code: S1 PI = 3.14 S2 R = 5 S3 AREA = PI*R **2
(1) <S1,S2,S3> is equivalent to <S1,S3,S2>
(2) <S1,S2,S3> is equivalent to <S2,S1,S3>
(3) there is the dependency - <S1,S3>
(4) there is the dependency - <S1,S2>
(5) there is the dependency - <S2,S3>
Packed data type operations are
(1) pack and unpack operations only
(2) operations that could be removed from code
(3) vector operations
(4) operations with omiting zero bits
(5) they are abstract for assembler doesn't have any
What directive marks sequential execution block?
(1) SERIAL
(2) ORDERED
(3) SOLO
(4) MASTER
(5) guided
Dynamical call graph
(1) do not give any advantages
(2) do not use static data
(3) uses abstract system calls
(4) built at program execution
(5) is a dense
What is corresponding entity for the interference graph colors?
(1) most loaded parts of the program
(2) less compatible functions
(3) registers
(4) arithmetical and logical data conversions
Superscalarity is
(1) ability to perform multiple operations at a tick
(2) frequency of synchronizing impulses of the microchip
(3) number of ticks are required to read one unit from memory
(4) maximum number of data elements could be send to the processor at a time
What is input data for syntax analysis?
(1) resulting syntax
(2) BNF-form of the result
(3) program source
(4) representative data
(5) only syntax and nothing else
What are normalized loops?
(1) loop from 0 to N with step 1
(2) loop from 1 to N with any step
(3) loop from 1 to N with step 1
Why it is recommended to arrange fields in structure by decrease of their size?
(1) to improve performance
(2) to beautify
(3) to decrease structure size after the alignment
What kind of optimization is the auto-parallelization?
(1) loop non-permutation
(2) non-loop permutation
(3) loop permutation
(4) non-loop non-permutation
What of the following is schedule type?
(1) random
(2) round-robin
(3) guided
(4) dynamic
(5) serial
(6) concurrent
What kind of interprocedural optimization is used by default?
(1) no interprocedural optimizations at all
(2) optimizations without pointer variables
(3) single file optimizations
(4) multiple file optimizations
How does instruction planning performed?
(1) by changing the instruction order
(2) by creating branch probability plan
(3) by creating program checkpoint schedule array
(4) by ordering instructions inside the interference graph
Superscalar is
(1) multicore
(2) processor with multiple execution units
(3) processor with "out of order execution" feature
How the statements are connected inside the Intel compiler
(1) by the adjacency matrix
(2) by Petrov table
(3) by data flow graph
(4) by denotational semantics
(5) none of the answers
When the dependence <S1,S2> is true dependence
(1) S1 X=… S2 …=X
(2) S1 …=X S2 X=…
(3) S1 X=… S2 X=…
What directive suggest the compiler to not parallelize following loop?
(1) #pragma prefer concurrent
(2) #pragma no concurrentize
(3) #pragma prefer serial
(4) #pragma serial
Points to analysis is
(1) alias analysis itself
(2) a part of alias analysis
(3) index variables analysis
(4) loop invariant analysis
(5) none of the answers
Why is the pointer chasing useful?
(1) extra links may lead to unexpected memory freed
(2) dereference process is not determined, it's better to avoid non-determinism
(3) each dereference involves the cache load, it is significant for big structures
(4) it is necessary to avoid extra memory free operations
(5) pointer chasing is just code formatting convention which is not related to the program execution speed
In a fully-associative memory
(1) each block is translated to any part of the cache
(2) each block has its own place inside the cache
(3) line of the corresponding cache is calculated from the lower part of the address and the exact place in the line is chosen associatively
Basic blocks are
(1) entrance blocks
(2) function signatures
(3) function bodies, where the most of the equations are provided
(4) body of the main function
(5) library header files
Normalized loops are used to?
(1) simplify equations
(2) improve code readability
(3) to unify loops
What parallel library does Intel compiler use?
(1) OpenMP
(2) VTune
(3) STL
What are disadvantages of inlining?
(1) code expansion
(2) naming complexity for user
(3) execution time increase
(4) call overhead increase
Linked list worse than array for
(1) increased cache misses for sequentially processed elements
(2) decreased available data type set
(3) link maintain overhead
(4) link values are always changing and the elements are moved across the memory
Vectorization is parallelization technique when
(1) the processor applies one operation to multiple data
(2) the processor applies different operations in parallel
(3) the processor applies one operation to multiple data sequentially
Nodes of control flow graph are
(1) expressions
(2) statements
(3) basic blocks
(4) functions and procedures of the application
(5) variables
Is there any dependence in this code? DO I=1,N S1 A(I+1) =F(I) S2 F(I+1) = A(I) END DO
(1) no dependence
(2) loop-carried dependence
(3) loop-independent dependence
What is loop parallelizing function in Intel compiler?
(1) initial loop
(2) initial loop with bounds set by function parameters
(3) only one loop iteration
When permutation transformations are not allowed?
(1) if there any objects that could be not loaded at a time
(2) if there any objects that probably overlap in memory
(3) if there any objects that take too much memory
Type of cache, where any memory block could be loaded into any part of the cache
(1) direct mapping memory
(2) fully-associative cache
(3) reverse mapping memory
Def-use graph nodes are
(1) basic blocks, defining and using the same variable
(2) statements defining and using the same variable
(3) expressions defining and using the same variable
What is ANTI dependency?
(1) READ after WRITE
(2) WRITE after WRITE
(3) READ after READ
(4) WRITE after READ
What is "prefetch"?
(1) loading data from relatively slow memory into the cache after the memory is required by processor
(2) loading data from cache into memory before the memory is required by processor
(3) loading data from relatively slow memory into the cache before the memory is required by processor
(4) memory address fetch before the memory will be required
What is the meaning of restrict attribute at pointer definition in С/С++?
(1) any pointer could be linked to this memory
(2) only this pointer could have its value
(3) only this pointer and the expression, based on this pointer could link to its memory cell
Constants in expressions tree
(1) must be hold at leafs
(2) not exists
(3) could be deleted without any results changing
(4) linked to lexigraphical order
During the VTune analysis some of the functions is missed. Why could it happened?
(1) big amounts of interprocedural data leads to function name aggregation
(2) function was deleted during inlining
(3) function name was changed to system and VTune hides this name
(4) it is typical error for global scope
(5) function name was overlapped by reserved system word
SSA-form is:
(1) ability to access SSE
(2) single assignment form
(3) special form with MMX instructions
(4) formalized vector extension
What option is used to disable inline?
(1) /Qb<0>
(2) /Qnoinline
(3) /Qinline<0>
"Dead code" may be caused by
(1) segmentation fault
(2) optimizing transformations
(3) execution slowdown
(4) usage of long identifier names
What interprocedural optimization is specific to C++?
(1) deconstructization
(2) deobjectization
(3) devirtualization
(4) relation canceling
(5) depolymorphysm
The dependency between S1 and S2 persist if
(1) statements S1 and S2 modifies the same variable at the same basic block
(2) statements S1 and S2 modifies the same variable at different basic blocks
(3) basic block of S1 dominates S2 and both of them writes the same memory cell
(4) statement S2 is followed by S1 and both of them reads the same variable
Cache levels differ by
(1) the speed of access
(2) fist level for scalar, second for superscalar and the third is for vectors
(3) by dimensions: one, three and four dimensional cache
(4) they are almost equal and not differ
Is there any dependence in this code? DO I=1,N S1 A(I)=… S2 …=A(I) END DO
(1) no dependence
(2) loop-carried dependence
(3) loop-independent dependence
Transforming optimization keeps the equation equivalence if
(1) optimization do not delete and do not add any code
(2) control flow graph is unchanged
(3) dependent statements order is unchanged
(4) blocks with dependent statements are untouched
Vectorization is
(1) collecting a program characteristics such as procedure execution time, branch mispredition rate, cache splitting etc.
(2) program source translation to assembler or native code
(3) process of converting from scalar representation where each operation using scalars to vector representation where one operation could use a vector operands
What is __declspec(align(n)) pragma used for?
(1) to tell the compiler that the program couldn't be compiled when the alignment is wrong
(2) to tell the compiler how to align variables in memory
(3) to tell the compiler that all the types must be aligned identically
(4) to disallow array alignment when its size is less than "n"
Array is better than linked list for
(1) no link maintain overhead
(2) can contain elements with different size
(3) modeled according to the statistics collected
(4) can contain "record" data type