Главная / Программирование / Introduction to performance optimization using Intel SW tools

Introduction to performance optimization using Intel SW tools - ответы на тесты Интуит

Правильные ответы выделены зелёным цветом.
Все ответы: The course concentrates mostly on application performance improvements with Intel Compiler and VTune Amplifier.

The Control Unit functions are

(1) instruction decoding

(2) ALU operating

(3) data transfer

(4) instruction execute

(5) ALU startup

What is VTune™ Performance Analyzer for?

(1) optimizing applications

(2) analyzing application performance

(3) decreasing application size

(4) speedup application compilation

What compilers Intel® provides?

(1) C/C++

(2) Java

(3) Oberon

(4) Forth

(5) Fortran

(6) C#

What is the Loop Stream Detector for?

(1) for dividing big loops into a set of small

(2) for eliminating the loops

(3) for making small loops run faster

(4) for eliminating the sampling and decoding instructions for small loops

Loop vectorization is

(1) change scalar operations to vector

(2) raster to vector image conversion

(3) internal processor function

(4) compiler optimization

What is the processor core?

(1) a part of the processor fetching and executing instructions

(2) a part of the processor working with shared cache

(3) a part of the processor executing arithmetical operations

OpenMP is:

(1) compiler option linked to output binary

(2) technology to execute code on multiple computers

(3) technology for parallel computing on the shared memory systems

(4) technology for parallel computing on the distributed memory systems

What of the following could be considered as a good style of programming?

(1) unconditional jumps

(2) loop exit operators

(3) modularity

(4) labels

(5) short variable names

(6) none of the answers

What disadvantage does static profiler has?

(1) working only with static variables

(2) ignores pointer variables

(3) do not supports full type system

(4) not precise branch prediction

(5) it is necessary to run static profiler for the second time before the program execution

System bus used for

(1) data transfer

(2) CPU parts interconnection

(3) command execution

(4) data storage

VTune supports:

(1) only C/C++

(2) only languages, supported by Intel compiler

(3) only languages, supported by Microsoft Visual Studio

(4) none of the answers

Internal representation is

(1) assembler code

(2) graph at the most cases

(3) object-file code

Loop invariant code motion

(1) finding and deleting expressions independent from iteration variables

(2) finding and moving outside the loop expressions independent from loop iteration variables

(3) moves loop invariants to another loop

(4) none of the answers

SIMD is:

(1) computation principle provides data parallelism

(2) instruction system for multiuser access

(3) "simple data multiple instructions"

(4) a type of computer memory

(5) "Single instruction multiple data"

Choose the characteristic corresponding to distributed memory systems:

(1) Each processor is completely autonomous. There is a communications medium.

(2) All processors are equidistant from the memory. Communication with the memory via a common data bus.

(3) Memory is physically distributed among processors. Single address space is supported at the hardware level

OpenMP uses the following model of parallel execution:

(1) refork model: each time parallel loop starts new threads are created

(2) fork-join model: all threads are created at the first parallel region and then used

(3) queue model: each parallel task is queued; different executors are taking tasks from this queue concurrently

What is variable scope?

(1) part of the program where no such name can be defined

(2) memory range can be used as a pointer value

(3) variable value range

(4) context where this variable is defined

(5) basic block where this variable is defined

Dynamic profiler benefits are

(1) statistics is not used

(2) more precise branch prediction

(3) basic block dynamic estimation

(4) more optimizations are available

CPU timer speed is

(1) frequency of the timer which provides the synchronization inside the processor

(2) value of the processor clock frequency

(3) speed of the most shorten command

(4) minimal quant of the equations

(5) sychro-impulse frequency, send by timer

What analysis types are included in VTune?

(1) Hotspots

(2) Locks and Waits

(3) Valgrind

(4) Concurrency

Data flow analysis is

(1) analysis of bus data transfer

(2) method to avoid data loading to cache memory

(3) collecting information on variable's values

What optimization is inverse for loop fusion?

(1) loop unswitching

(2) loop unfolding

(3) loop interchange

(4) loop unrolling

(5) loop distribution

What is vector instruction for the compiler?

(1) adding vectors

(2) vector multiply by matrix

(3) vector folding

(4) vector substraction

(5) vector power

What qualities does distributed memory systems have?

(1) good scalability

(2) different latency for the different parts of memory

(3) good inter processor communication

(4) the high cost of cache subsystem synchronization

What could be performed to save the last state of the variable into master thread after the parallel block?

(1) add this variable to private set

(2) add this variable to lastprivate set

(3) add this variable to lastshared set

(4) this kind of behavior is not allowed by the technology

What are disadvantages of the procedural-level optimizations?

(1) they could be used for procedures and not for functions

(2) only constant length arrays are used

(3) can't be used in a highly loaded functions

(4) every function call is "black box" for them

(5) can't work with record type

Dynamic data is useful when

(1) loop iterator changes frequently

(2) loop iterator value can be out of data type range

(3) you need to create large data structure which size in unknown at compile time

(4) you need to store an integer and a string at the same variable

(5) you have no run statistics

What of the following will not cause any change in processor performance?

(1) CPU clock

(2) branch prediction quality

(3) operation system

What are locks and waits for?

(1) detecting critical memory parts and corresponding critical code parts for each piece of memory

(2) collecting thread wait count and time

(3) collecting function call sequence

SSA-form is

(1) representation where each function executed only once

(2) representation where each variable is used only once

(3) representation where each variable is set only once

(4) final representation after whole set of optimizations

What is loop peeling?

(1) optimization transforming a loop into two or more loops

(2) optimization tries to simplify loop by detaching useless iterations

(3) optimization tries to simplify loop by detaching extreme iterations

What size do xmm registers have?

(1) 16 bit

(2) 32 bit

(3) 64 bit

(4) 128 bit

(5) 256 bit

What disadvantages does distributed memory systems have?

(1) slow inter processor communication

(2) poor scalability

(3) the high cost of cache subsystem synchronization

(4) different latency for the different parts of memory

nowait directive is used for:

(1) avoid "Press any key to exit" at the end of the program

(2) tell the compiler to start executing before init ends

(3) to disable synchronization at the end of the loop

(4) to disable additional delays in the threads

What is node in call graph?

(1) program and the time of the call

(2) program without a time of the call

(3) variable of the program

(4) function of the program

(5) basic block

Choose the correct statement(s)

(1) Register count is usually much greater than the number of variables in the program

(2) Brush painting method is used for register allocation

(3) One of the basic code generator's tasks is register allocation

Superscalar is

(1) processor, specialized for scalar operations

(2) processor that execute more than one operation at a tick

(3) none of the answers

What may be cause of ineffective resource utilization?

(1) resource concurrency

(2) bottlenecks

(3) big amount of data dependencies

(4) sequential command execution

May one compiler have two different Front Ends?

(1) no

(2) only when both of them for the same language

(3) only when Petrov criteria is satisfied

(4) only for two, not for three

(5) yes

How loop unrolling is provided?

(1) big loop is divided into big number of small sequential loops

(2) by grouping small iterations into one big

(3) by substitution functions to the call sites

What is packed data type?

(1) data type without zero bits

(2) data packed by Huffman

(3) a special type used for archivation

(4) vector component data type

(5) scalar forming data type

What are multi-threading applications pros?

(1) memory amount is decreased according to the kernel count

(2) computational resources are increased according to the kernel count

(3) processor instruction set is decreased

What directive is used to avoid incorrect concurrent usage of the lval variable?

(1) SEMAPHORE

(2) TRASACTION

(3) ATOMIC

(4) CHECK SHARED

(5) CONTROL SHARED

Static call graph is

(1) program calls on representative data set

(2) call graph without dynamical variables

(3) call graph, built at compilation of the program

(4) call graph, built statically

(5) such notion is not exists in this course

(1) variable the live range identification

(2) mark the most notable registers

(3) sort registers by size

(4) allocate memory for register block

What is used to send data between the processor and the memory or between the processor and the devices?

(1) system registers

(2) ALU

(3) system bus

(4) RAM

What is the part of the syntax analysis in the compiler?

(1) grammar analysis

(2) puncting analysis

(3) polymorphical analysis

(4) lexical analysis

(5) protosyntax analysis

Choose the correct statements for the code: DO I=1,N S1 A(I) = B(I) + 1 S2 B(I+1) = A(I) – 5 END DO

(1) there is the dependency - <S1,S2>

(2) there is loop dependency - <S2,S1>

(3) there is data dependency inside the loop

(4) there is control dependency inside the loop

(5) there is no dependencies inside the loop

What is /Qvec-report used for?

(1) to drive vectorization during the compilation

(2) to report vectorization at the execution time

(3) to report vectorization at the compile time

(4) do drive vectorization heuristics

What is an automatic parallelization propose?

(1) to utilize multi-processor resources without code rewrite

(2) to make more qualified than manual parallelization

(3) to free programmer from the difficult and tedious manual parallelization

What option used to determine multi-thread iteration distribution?

(1) SCHEDULE

(2) TT

(3) FORALL

(4) THREADS

This command line parameter is used to enable inter-file optimization

(1) /Qipf

(2) /Qmulti-file

(3) /Qipo

(4) /Om

How data dependencies are used in the code generation?

(1) to create missing parts of the code

(2) to determine the register reuse abillity

(3) to take interference functions to the separate graph

(4) to find a code didn't match the representative data array

The ability to perform multiple operations at a tick is

(1) vectorization

(2) hardware prefetch

(3) superscalarity

(4) pipeline

(5) hyperthreading

What criteria of connecting statements into a list inside Intel compiler

(1) previous and next

(2) minimal work principle

(3) next lexem principle

(4) equivalence principle

(5) principle of non- equivalence

When the dependency <S1,S2> is anti-dependence?

(1) S1 X=… S2 …=X

(2) S1 …=X S2 X=…

(3) S1 X=… S2 X=…

What are necessary conditions for auto-parallelization?

(1) the absence of dependencies within the loop

(2) the absence of the loop nesting

(3) special manual preparation is always required

What is alias analysis?

(1) analysis when the information from different procedures is aliased

(2) analysis aliased to build

(3) search for the variables that could point to the same memory cell

(4) search for the functions aliased temporary

(5) search and analysis of the functions, executed in parallel

How structure field reordering could affect the application performance?

(1) by simplifying the loops

(2) by making results less precise

(3) by decreasing cache misses

(4) by reducing the number of conditional branches

Hardware prefetching used for

(1) parallel calculations

(2) predict the address of required data and load it into the cache

(3) increase bandwidth of the processor

Basic blocks are

(1) blocks of the visual program constructor

(2) code, provides most of equations

(3) code without jumps and labels

(4) entrance program code

(5) entrance function code

What is iteration vector?

(1) integer vector, each of the components represents an iteration variable value in order of loop nesting

(2) integer vector, each of the components represents an iteration variable value in order of increase

(3) integer vector, each of the components represents an iteration variable value after the iterations

What directive will force compiler to parallelize following loop?

(1) #pragma concurrent call

(2) #pragma concurrentize

(3) #pragma prefer serial

(4) #pragma serial

What is inlining?

(1) variable substitution

(2) common expression substitution

(3) global equation substitution

(4) function body substitution

When the linked list is stored inside the memory

(1) object placement can be not sequential

(2) elements can be destroyed any time by garbage collector

(3) list cannot contain more than five elements

In out-of-order execution instructions scheduled according

(1) to their order in pipeline

(2) to the evaluation of their operands

(3) to the branch prediction

Choose the correct statements

(1) statement is a minimal independent unit of the programming language

(2) program is a sequence of statements

(3) variable is minimal statement

(4) statements could be arranged lexically and by data flow graph

(5) statement is a tree of statements

(6) statement consist of the expressions

What is constant folding?

(1) vector replace by scalar

(2) one of the scalar optimizations

(3) operation is inverse for vectorizing unfolding

(4) iteration dimension decrease

What is OpenMP?

(1) is a software interface that supports multi-platform programming for multiprocessor computation systems with shared memory

(2) is a software interface that checks parallelization correctness

(3) is a software interface to estimate parallelization profitability

What is memory diambiguation?

(1) search for the objects, that may overlap in memory

(2) search for the objects, taking too much memory

(3) search for the objects, that could be not initialized

Number of ticks, required to transfer one unit from the memory is

(1) latency

(2) system clock

(3) bandwidth

Control flow graph

(1) determines the order of the statements in the source program

(2) determines all paths that would be passed during equations

(3) determines possible ways of control passing from one block to another

(4) determines all possible ways of control passing

What is FLOW dependency?

(1) READ after WRITE

(2) WRITE after WRITE

(3) READ after READ

(4) WRITE after READ

How parallelization in Intel compiler is implemented?

(1) The multiple instances of loop thread function are executed in different streams with different values of the boundaries

(2) Iterative loop space is divided into several parts and each is given to the separate thread

(3) Multiple function instances are run in the same thread with different values of the boundaries

(4) Iteration space is divided into parts, this parts are processed sequentially

What is - ansi-alias for?

(1) to enable ANSI aliasing in optimizatoin

(2) to diable ANSI aliasing in optimization

(3) to allow providing some more aggressive optimizations

Tree of expressions is

(1) paragraph in language manual

(2) a short way to define a language syntax

(3) short identifier to remind the meaning of the statement

(4) equations notation

(5) tree with exact lexems at the leafs

How prefetch can be invoked?

(1) with compiler directive

(2) with compiler option for automatic instruction placement

(3) using intrinsic

(4) buying and installation of special prefetch program

How compiler determines a case when it is better to perform inlining?

(1) only by compiler directives and source code pragmas

(2) only by directives

(3) it is performed always when is not disabled by options

(4) there is an heuristic method for that

(5) inline is not performed while it is not demanded by pragmas

Operations in a expressions tree

(1) are not exist

(2) could not be placed at leafs

(3) must be deleted at generation phase

(4) must be complete lexems

What is used to suggest function for inline?

(1) command line option -Qinline<func>

(2) pragma #pragma inline before the body of the function

(3) keyword forceinline in function definition

(4) keyword inline in function definition

SSA is

(1) SSe Alignment

(2) Simple Singles Alignment

(3) Static Single Assignment

(4) Sign Standard Association

What is function cloning?

(1) create clone-function, executed at the other device

(2) create clone-function, executed on built-in video chip

(3) copying function body and using separate optimizations for new instances

(4) parallel execution on cluster and super-computer

Which of the following is required to keep the equation equivalence

(1) equivalent input leads to equivalent output

(2) instruction scheduling is independent of the input data

(3) results obtained in the same order

(4) equations use the same processor-dependent instruction set

What is the goals of ALU

(1) instruction decoding

(2) drive itself

(3) data transfer

(4) ariphmetical operations

(5) device interconnection

What kind of information is obtainable via VTune?

(1) where the time is spent

(2) why the program is not effective

(3) where the dead code is

(4) where the code has wrong formatting

(5) where to improve the code

What platforms are supported by Intel compilers?

(1) Windows

(2) Linux

(3) MacOS X

(4) FreeBSD

(5) Solaris

What is required for most of the loop optimizations

(1) definite total iteration count

(2) no jumps outside the loop

(3) no if operators inside the loop

(4) no unknown function calls inside the loop

MMX technology provides:

(1) a set of instructions to operate packed integer data types

(2) program package for multimedia

(3) additional registers

(4) fast floating point operation set

(5) additional processor module for audio and video conversion

What seriously limit modern system performance?

(1) memory amount

(2) processor cpu clock rate

(3) memory access speed

(4) 2 level cache existence

For parallelization it is required to:

(1) use compiler command line option -Qomp. Compiler will choose regions to parallelize

(2) code could be enclosed by #pragma omp parallel start

(3) code for parallelization have to be moved to separate function, marked with intrinsic __omp_parallel

(4) parallel code have to be combined to blocks started with pragma #pragma omp parallel

What of the following could be considered as a bad style of programming?

(1) loop constructions

(2) global variable usage

(3) long functional names

(4) if operators

(5) none of the answers

What is the source for branch prediction in static profiler

(1) input data

(2) base block analysis

(3) iterprocedural analysis

(4) inlining-analysis

(5) run statistics

What is CPU speed?

(1) number of data transfers

(2) average command execution time

(3) bus data transfer speed

(4) number of tasks, executed concurrently

What operation system VTune supports?

(1) OS/2

(2) VAX-VMS

(3) PDP-11

(4) Windows

(5) Linux

Expression is

(1) assignment

(2) expression tree

(3) constant

(4) variable

Why performance is improved when invariant is moved out of the loop?

(1) because invariant code has no sense and could be deleted

(2) because it is more effective to calculate them in another loop

(3) because this values stay unchanged each iteration and could be evaluated outside of the loop only once

Which of the following command line options will build a binary for any processor?

(1) -QxSSE4_1

(2) -arch:SSE3

(3) -QaxSSE3_1

(4) -QxSSE3

(5) -arch:SSE2_2

(6) -QaxSSE4_2

(7) -QxSSE2

Choose the characteristic corresponding to shared memory systems:

(1) Each processor is completely autonomous. There is a communications medium

(2) All processors are equidistant from the memory. Communication with the memory via a common data bus

(3) Memory is physically distributed among processors. Single address space is supported at the hardware level

What pragma is used to parallelize loop:

(1) #pragma omp parallel for

(2) #pragma omp parallel while

(3) #pragma omp single

(4) #pragma omp set parallel for

What benefits would give correct code formatting?

(1) inlining is faster

(2) variable scopes are separated

(3) it is easier to read source code

(4) compiler optimizations are simplified

(5) none of the answers

What is required for dynamic profiling

(1) collect run statistics

(2) temporary remove all static variables

(3) add heap freeing procedure to the beginning of the program

(4) choose the most representative input data

(5) check program code for inlining

Choose the correct statement

(1) hardware prefetch mechanism tries to guess a memory access plan to load the data before it will be actually accessed

(2) caching technique uses spatial locality principle

(3) cache aliasing occurs when the data placement is good and registers are loaded without any instructions

What are functions of the Hotspots?

(1) detecting places with potentially ineffective code

(2) showing thread activity

(3) showing microarcitecture problems

(4) provides binary instrumentation of the user program

Set Uses[b] contains:

(1) variables, defined inside a block

(2) variables used in the block, but have no definitions within the block

(3) definitions, that reaches "b" block

Why performance could be increased after the loop distribution?

(1) because of memory reference improving

(2) because of loop iteration count decrease

(3) because of working with big arrays in parallel

What of the following is required to execute vector operation?

(1) vectors should form the complete basis in n-dimensional space

(2) vector collinearity absence

(3) vector normalizing

(4) at least one vector module is not zero

(5) none of the answers

What qualities does shared memory systems have?

(1) good scalability

(2) different latency for the different parts of memory

(3) good inter processor communication

(4) the high cost of cache subsystem synchronization

As a default all variables except local function variables and loop iterators are add to

(1) private set

(2) shared set

(3) lastprivate set

(4) firstprivate set

What are disadvantages of the procedural-level optimizations?

(1) they could be used for procedures and not for functions

(2) they have no disadvantages

(3) they can't use pointer type variable

(4) global variables properties is unknown

(5) vectorization is not allowed

Dynamic memory allocation is bad for

(1) data fragmentation

(2) no information passing between dynamic and static objects

(3) run statistics cannot be collected

(4) code loses its determinism

(5) arrays got the slower part of the memory

Modern Intel processors are

(1) CISC

(2) RISC

(3) hybrid of CISC and RISC

What event corresponds processor clock ticks?

(1) L2_LINES_IN.SELF.DEMAND

(2) BUS_TRANS_ANY.ALL_AGENTS

(3) CPU_CLK_UNHALTED.CORE

(4) none of answers

Statement M dominates N if

(1) there is a path from M to N

(2) there is a path from N to M

(3) any path from M goes through N

(4) any path to N goes through M

(5) any path from N goes through M

Choose the code resulting to the loop peeling for: p = 10; for (i=0; i<10; ++i) { y[i] = x[i] + x[p]; p = i; }

(1) p = 10; for (i=1; i<9; ++i) { y[i] = x[i] + x[p]; p = i; }

(2) y[0] = x[0] + x[10]; for (i=1; i<10; ++i) { y[i] = x[i] + x[i-1]; }

(3) y[0] = x[0] + x[10]; for (i=1; i<9; ++i) { y[i] = x[i] + x[i-1]; }

What size do ymm registers have?

(1) 16 bit

(2) 32 bit

(3) 64 bit

(4) 128 bit

(5) 256 bit

What disadvantages does shared memory systems have?

(1) slow inter processor communication

(2) poor scalability

(3) the high cost of cache subsystem synchronization

(4) different latency for the different parts of memory

What directive is used to create synchronization point?

(1) critical

(2) barrier

(3) atomic

(4) master

(5) sync

(6) stop

What information is corresponding to vertexes in a call graph?

(1) syntax construction nesting

(2) how functions calling each other

(3) program calls

(4) system utility calls

(5) system calls

Choose the correct statement(s)

(1) code generator aligns basic blocks in the memory

(2) code generator helps to generate big projects

(3) code generator performs machine optimizations

(4) code generator generates missing parts of the functions

Choose the wrong statement

(1) compiler is a part of microprocessor providing program translation to native code or assembler

(2) compiler translates source code to assembler or native code

(3) only compiler could open to the high level developer new processor abilities such as additional registers or new commands

What is critical code?

(1) code executed more frequently

(2) code, which could be deleted without any changes in results

(3) wrong code

(4) code, which is not proved to be correct

(5) undocumented code

To convert a compiler to different internal representation it is necessary to correct

(1) input data

(2) Front End

(3) Back End

(4) almost all the parts of the compiler

(5) only representation itself

When full loop unrolling is applicable?

(1) there's no such optimization

(2) to unroll small loops

(3) to unroll big loops

What is happened to zero bits in packed data type?

(1) nothing

(2) meaningless zero bits are omited

(3) packed by Huffman

(4) vectorized

(5) scalarized

What are multi-threading applications cons?

(1) developing is more complex

(2) thread creation has its overhead

(3) memory amount is decreased according to the kernel count

(4) data races/ resource concurrency

(5) thread synchronization overhead

What directive is used to mark a piece of code to be executed by master thread only?

(1) SOLO

(2) MASTER

(3) ONLYONE

(4) OWNER

(5) SUPER

(6) CREATOR

Dynamic call graph is

(1) call graph with pointer variables

(2) base block graph

(3) call graph, built during the program execution

(4) call graph, dynamically determines the operation system and uses necessary system calls

(5) statistical graph with temporal data at the nodes

Interference graph is built

(1) for register allocation

(2) for reverse engineering

(3) for the code contains mistakes

(4) for code without representative data array

Time latency (for RAM) is

(1) frequency of synchronizing impulses

(2) number of ticks required to get one unit of data from memory

(3) number of data units are allowed to send the processor at a tick

(4) number of operations the processor is able to perform at a tick

What is the part of the syntax analysis in the compiler?

(1) sematical analysis

(2) denotation analysis

(3) prevential analysis

(4) singular analysis

(5) grammar analysis

Required condition for dependency between S1 and S2 are the following:

(1) both of them use the same memory cell and modify it

(2) both of them use the same memory cell and at least one writes

(3) there is a path during the execution from S1 to S2

(4) there is no path during the execution from S1 to S2

What is __alignof__ used for?

(1) to align source text

(2) to tell compiler how to align objects

(3) to get the infromation on data type alignment

(4) to get the information on alignment of the variables

What information does /Qpar-report3 output?

(1) information on dependencies prevent vectorization

(2) information on iteration order

(3) reasons preventing code parallelization

(4) informs whether the parallelization is unprofitable

What of the following is schedule type?

(1) dispatch

(2) nodispatch

(3) runtime

(4) static

(5) serial

(6) ordered

This command line parameter is used to disable interprocedural optimizations

(1) /Qip-

(2) /Qip-disable

(3) /Qipo-disable

(4) /Qno-ipo

What instruction scheduling is useful for?

(1) precise branch prediction

(2) instruction parallelism increase

(3) compare program checkpoints against the schedule

(4) create interference graph

Bandwidth is

(1) maximum number of units that could be send to the processor at a time

(2) maximum number of operations that the processor can perform at a time

(3) maximum number of commands that could be loaded to the pipeline at a time

Statements could be arranged

(1) lexicographically

(2) graphosematically

(3) polydinamically

(4) semiiterationally

(5) semidenotationally

When the dependence <S1,S2> is output-dependence

(1) S1 X=… S2 …=X

(2) S1 …=X S2 X=…

(3) S1 X=… S2 X=…

Is it hard to measure optimization profitability?

(1) yes, because some performance effects are hard to estimate

(2) yes, because the iteration number can be unknown

(3) no, because auto-parallelization is loop permutation transformation

(4) no, because all the data are available at compile time

Aliasing could be occurred between

(1) any variables

(2) only pointer variables

(3) any functions

(4) any programs

What are the aims for the structure splitting?

(1) moving rarely used fields to separate structure

(2) group the fields by the data type

(3) group the fields by the basic blocks

(4) group the fields by its names

Pipeline is

(1) technique that increases a number of processed instructions at a time

(2) method predicting the next required data address

(3) mechanism to avoid the processor idle

Basic blocks are contained by

(1) data flow graph

(2) base block graph

(3) base equations graph

(4) equations graph

(5) main equations graph

What is required for loop dependency between S1 и S2 in a nested set?

(1) statement S1 on iteration i, statement S2 on iteration j access the same memory cell

(2) one of them writes to this cell

(3) both of them writes to this cell

What directive will force compiler to parallelize following loop if it is safe?

(1) #pragma prefer concurrent

(2) #pragma concurrentize

(3) #pragma prefer serial

(4) #pragma serial

What is the goals of inlining?

(1) function call overhead decrease

(2) analysis simplify

(3) naming simplify

(4) simplification of dereferencing

(5) more short names

How dynamic linked list memory placement can be improved?

(1) by disabling the garbage collector

(2) using speed inlining

(3) using containers

(4) using multiple processes

Number of units could be sent to the processor at once is

(1) latency

(2) memory amount and external memory rate

(3) bandwidth

Basic block is

(1) linear program chunk without jumps and labels

(2) a group of sequential instructions

(3) instructions with one previous

(4) a group of expressions inside one statement

(5) group of statements at the algorithm block diagram

What is OUTPUT dependency?

(1) READ after WRITE

(2) WRITE after WRITE

(3) READ after READ

(4) WRITE after READ

How auto-parallelization is connected with other optimizations in Intel compiler?

(1) when auto-parallelization is used no loop optimizations are available

(2) all loop optimizations are run before auto parallelization

(3) some loop optimizations are run after auto parallelization

What is demanded by ANSI aliasing?

(1) pointer can be dereferenced to the variable of the same

(2) pointer can be dereferenced to the object of any type

(3) pointer can be dereferenced to the object of primitive type

(4) pointer can be dereferenced to the object of the compatible type

Choose the correct statements

(1) expression is a tree of expressions with ending expression at its leafs

(2) every expression has list of predecessors and successors

(3) expressions could be arranged lexically and by data flow graph

(4) boundary expressions are constants and variables

(5) statement consist of the expressions

Loop optimizations are:

(1) permutation transformations

(2) data transformation

(3) variable transformation

What is passed as an argument to loop parallelizing function in Intel compiler?

(1) loop boundaries

(2) only thread number to calculate the rest of parameters

(3) all used objects

What is taken into account during the memory disambiguation?

(1) language properties

(2) Local Point To analysis

(3) interprocedural analysis

(4) program performance analysis

(5) attributes and command line options are set by developer

Leafs in expressions tree

(1) are the same with the other nodes

(2) satisfies lexigraphical order

(3) must be deleted at generation phase

(4) may contain variables

(5) not exists

What cons does prefetch has?

(1) software prefetch may displace useful data out of the cache

(2) prefetch is ineffectively implemented and its usage will slow down the application

(3) hardware support is always required

How developer could drive inlining process?

(1) by command line options and the source code of the program

(2) only by program source code

(3) this process could be only enabled or disabled in general from the command line

(4) this process in unmanaged due to specific algorithms

The advantages of SSA form:

(1) program is more compact

(2) def-use chains are obvious

(3) special intrinsic functions are used

(4) co-processor registers are used

(5) vector registers are used, that boosts the performance

What could be used to force function inline?

(1) #pragma forceinline before the definition

(2) #pragma forceinline before the call

(3) #pragma inline before the definition

(4) #pragma inline before the call

Choose the scalar optimization:

(1) excessive branching removal

(2) loop invariant code motion

(3) interprocedural code motion

(4) register coloring

(5) high speed inline

What is partial inlining?

(1) partial function substitution

(2) substitution defines pointer variable with name of the function inlined

(3) using basic block optimizations without any code transformation

(4) dynamical code substitution at runtime

Dependency is

(1) connection between the statements caused by same output value

(2) connection between the statements which not allow to change their executional order

(3) connection between the statements caused by same input variables

(4) connection between the statements caused by same output variable

Registers are

(1) memory of ALU

(2) memory of CPU

(3) interface data memory

What is the requirements of VTune?

(1) use only С++

(2) each source file must be not more than 150 kb

(3) header files must match to the VTune standard

(4) none of the answers

What is the functions of the compiler Front End ?

(1) program translation from source to internal representation

(2) scalar optimization

(3) code generation

(4) linking object files to binary

(5) semantical analysis

Choose code fragments which are good for optimizing

(1) for(i=0;i<U;i++) a[i]=b[i];

(2) i=0; do { a[i]=b[i]; i++; } while(i<U);

(3) for(i=0;i<U;i++) { a[j]=b[i]; j+=c*i; }

(4) for(i=0;i<n;i++) { a[i]=i; if(i<t) break; }

SSE is:

(1) technology applies single instruction to multiple data

(2) technology to execute code at server side

(3) streaming SIMD processor extension

(4) server configuration extension

(5) programming language

What types multiprocessor systems could be divided into?

(1) shared memory systems

(2) distributed memory systems

(3) random memory access systems

(4) non-uniform memory access systems

When using OpenMP variables behave as follows:

(1) inside a block all variables for different threads use separate memory address and could not interact

(2) all variables inside parallel blocks use the same addresses, so user could control concurrent workflow

(3) variables could be marked to have different or same memory cell

How global variables usage affects?

(1) none

(2) too few names are left for the local

(3) code size is increased because of the long global names

(4) harder to optimize this code

(5) harder to read and understand this code

Static profiler used

(1) for statically typed languages

(2) for languages without pointers

(3) to analyze the application without running

(4) to collect run statistics

(5) to estimate how static memory map is

x86 speed factors are

(1) CPU timer speed

(2) memory size and the speed of external memory access

(3) instruction set and instruction execution speed

(4) efficiency of the internal memory and register usage

(5) pipeline quality

(6) branch prediction quality

(7) hardware prefetch quality

(8) vector instruction quality

(9) parallelization and multi-core technology

What abilities does VTune has?

(1) Microsoft Visual Studio integration

(2) multicore and multiprocessor support

(3) wrong memory access detection

(4) processor event collecting

(5) processor wasted energy counting

Choose scalar optimizations

(1) constant folding

(2) copy propagation

(3) common subexpressions elimination

(4) loop invariant code motion

What is loop invariant?

(1) expression, independent of index variables

(2) expression, stay unchanged each iteration

(3) expression, used to check the condition for loop stop

(4) expression with global variables

What is condition for vectorization?

(1) loop dependency absence

(2) usage of special data types

(3) usage of <vector> module

(4) dependent instruction order after optimization kept the same

Choose the characteristic corresponding to non-uniform memory access systems:

(1) Each processor is completely autonomous. There is a communications medium

(2) All processors are equidistant from the memory. Communication with the memory via a common data bus

(3) Memory is physically distributed among processors. Single address space is supported at the hardware level

What identifier is not reserved for OpenMP:

(1) nowait

(2) prefork

(3) schedule

(4) reduction

What is the aim when the program is divided into functions and procedures?

(1) standartization

(2) serilization

(3) rasterization

(4) decompozition

(5) abstraction

Dynamic profiler differs from static

(1) static profiler has more precise branch prediction

(2) dynamic profiler uses statistics

(3) dynamic profiler doesn't uses statistics

(4) dynamic profiler works with pointers

(5) dynamic profiler performs seed inlining

Memory, which is directly accessed by processor is

(1) Random Access Memory

(2) system register

(3) system bus

What is profiling?

(1) binary instrumentation

(2) events collecting

(3) rebuild of the entire project

(4) use manual writing

To know what variables could be used inside the block, it is necessary to estimate:

(1) Uses[b]

(2) Killed[b]

(3) Reaches[b]

(4) Defsout[b]

What could be the reason for loosing performance while processing a big loop?

(1) big amount of operations

(2) if the loop works with big amount of arrays it may cause cache splitting

(3) register splitting by big amount of iteration variables

May four different variables became components of the same vector after the vectorization?

(1) never

(2) may

(3) only if they could be arranged lexigraphically

(4) only for add operation

(5) only for multiply operation

What qualities does non-uniform memory access systems have?

(1) good scalability

(2) different latency for the different parts of memory

(3) good inter processor communication

(4) the high cost of cache subsystem synchronization

Schedule clause accepts the following arguments:

(1) static

(2) dynamic

(3) variant

(4) object

(5) guided

(6) int

(7) realtime

(8) runtime

What are disadvantages of the procedural-level optimizations?

(1) variable serialization is not enough

(2) memory usage is not optimal

(3) procedure parameter properties is unknown

(4) they have no disadvantages

(5) loop optimization is not provided

Dynamic memory allocation

(1) leads to run statistics ignore

(2) uses memory manager

(3) used to distinguish basic blocks

(4) required for frequently changed variables

(5) necessary to maintain structured data

Why register access latency is lower than RAM?

(1) registers are placed inside CPU

(2) registers are placed inside the very fast cache memory

(3) they are accessed parallel with equations

What event corresponds wrong branch prediction?

(1) L2_LINES_IN.SELF.DEMAND

(2) INST_RETIRED.ANY

(3) BR_INST_RETIRED.MISPRED

(4) none of answers

Dominance frontier is

(1) bound between dominated and not dominated nodes

(2) bound between dominated and dominating nodes

(3) a set of nodes, dominated by one

What is loop unrolling for?

(1) to decrease conditional branches during the loop execution

(2) to divide a big loop into sequential small loops

(3) to decrease loop nesting

(4) to remove procedure calls

How many xmm registers does emm64t support?

(1) 4

(2) 8

(3) 16

(4) 32

(5) 64

What disadvantages does non-uniform memory access systems have?

(1) slow inter processor communication

(2) poor scalability

(3) the high cost of cache subsystem synchronization

(4) different latency for the different parts of memory

How many threads could enter the critical section at a time?

(1) 0

(2) 1

(3) 2

(4) any, because OpenMP sets no limit to the thread number

Why call graph may be considered as not full?

(1) properties of the some used libraries are unknown

(2) pointer data type is used

(3) single assignment is used

(4) full call statistics is unknown

(5) system call statistics is unknown

Choose the correct statement(s)

(1) code generator estimates jump distances

(2) code generator builds the code by representative data array

(3) code generator controls dynamic memory

Superscalarity is

(1) Ability to operate with vectors

(2) Ability to process more than one operation at a tick

(3) none of the answers

What conditions can prevent vectorization?

(1) nothing

(2) dependency between different iterations

(3) data dependencies

(4) lack of data dependencies

(5) lack of iteration dependencies

What part of the compiler depends on a language most?

(1) Front End

(2) Back End

(3) internal representation

(4) code generation

(5) profiling

Choose the correct statements for this code: S1 PI = 3.14 S2 R = 5 S3 AREA = PI*R **2

(1) <S1,S2,S3> is equivalent to <S1,S3,S2>

(2) <S1,S2,S3> is equivalent to <S2,S1,S3>

(3) there is the dependency - <S1,S3>

(4) there is the dependency - <S1,S2>

(5) there is the dependency - <S2,S3>

Packed data type operations are

(1) pack and unpack operations only

(2) operations that could be removed from code

(3) vector operations

(4) operations with omiting zero bits

(5) they are abstract for assembler doesn't have any

What directive marks sequential execution block?

(1) SERIAL

(2) ORDERED

(3) SOLO

(4) MASTER

(5) guided

Dynamical call graph

(1) do not give any advantages

(2) do not use static data

(3) uses abstract system calls

(4) built at program execution

(5) is a dense

What is corresponding entity for the interference graph colors?

(1) most loaded parts of the program

(2) less compatible functions

(3) registers

(4) arithmetical and logical data conversions

Superscalarity is

(1) ability to perform multiple operations at a tick

(2) frequency of synchronizing impulses of the microchip

(3) number of ticks are required to read one unit from memory

(4) maximum number of data elements could be send to the processor at a time

What is input data for syntax analysis?

(1) resulting syntax

(2) BNF-form of the result

(3) program source

(4) representative data

(5) only syntax and nothing else

What are normalized loops?

(1) loop from 0 to N with step 1

(2) loop from 1 to N with any step

(3) loop from 1 to N with step 1

Why it is recommended to arrange fields in structure by decrease of their size?

(1) to improve performance

(2) to beautify

(3) to decrease structure size after the alignment

What kind of optimization is the auto-parallelization?

(1) loop non-permutation

(2) non-loop permutation

(3) loop permutation

(4) non-loop non-permutation

What of the following is schedule type?

(1) random

(2) round-robin

(3) guided

(4) dynamic

(5) serial

(6) concurrent

What kind of interprocedural optimization is used by default?

(1) no interprocedural optimizations at all

(2) optimizations without pointer variables

(3) single file optimizations

(4) multiple file optimizations

How does instruction planning performed?

(1) by changing the instruction order

(2) by creating branch probability plan

(3) by creating program checkpoint schedule array

(4) by ordering instructions inside the interference graph

Superscalar is

(1) multicore

(2) processor with multiple execution units

(3) processor with "out of order execution" feature

How the statements are connected inside the Intel compiler

(1) by the adjacency matrix

(2) by Petrov table

(3) by data flow graph

(4) by denotational semantics

(5) none of the answers

When the dependence <S1,S2> is true dependence

(1) S1 X=… S2 …=X

(2) S1 …=X S2 X=…

(3) S1 X=… S2 X=…

What directive suggest the compiler to not parallelize following loop?

(1) #pragma prefer concurrent

(2) #pragma no concurrentize

(3) #pragma prefer serial

(4) #pragma serial

Points to analysis is

(1) alias analysis itself

(2) a part of alias analysis

(3) index variables analysis

(4) loop invariant analysis

(5) none of the answers

Why is the pointer chasing useful?

(1) extra links may lead to unexpected memory freed

(2) dereference process is not determined, it's better to avoid non-determinism

(3) each dereference involves the cache load, it is significant for big structures

(4) it is necessary to avoid extra memory free operations

(5) pointer chasing is just code formatting convention which is not related to the program execution speed

In a fully-associative memory

(1) each block is translated to any part of the cache

(2) each block has its own place inside the cache

(3) line of the corresponding cache is calculated from the lower part of the address and the exact place in the line is chosen associatively

Basic blocks are

(1) entrance blocks

(2) function signatures

(3) function bodies, where the most of the equations are provided

(4) body of the main function

(5) library header files

Normalized loops are used to?

(1) simplify equations

(2) improve code readability

(3) to unify loops

What parallel library does Intel compiler use?

(1) OpenMP

(2) VTune

(3) STL

What are disadvantages of inlining?

(1) code expansion

(2) naming complexity for user

(3) execution time increase

(4) call overhead increase

Linked list worse than array for

(1) increased cache misses for sequentially processed elements

(2) decreased available data type set

(3) link maintain overhead

(4) link values are always changing and the elements are moved across the memory

Vectorization is parallelization technique when

(1) the processor applies one operation to multiple data

(2) the processor applies different operations in parallel

(3) the processor applies one operation to multiple data sequentially

Nodes of control flow graph are

(1) expressions

(2) statements

(3) basic blocks

(4) functions and procedures of the application

(5) variables

Is there any dependence in this code? DO I=1,N S1 A(I+1) =F(I) S2 F(I+1) = A(I) END DO

(1) no dependence

(2) loop-carried dependence

(3) loop-independent dependence

What is loop parallelizing function in Intel compiler?

(1) initial loop

(2) initial loop with bounds set by function parameters

(3) only one loop iteration

When permutation transformations are not allowed?

(1) if there any objects that could be not loaded at a time

(2) if there any objects that probably overlap in memory

(3) if there any objects that take too much memory

Type of cache, where any memory block could be loaded into any part of the cache

(1) direct mapping memory

(2) fully-associative cache

(3) reverse mapping memory

Def-use graph nodes are

(1) basic blocks, defining and using the same variable

(2) statements defining and using the same variable

(3) expressions defining and using the same variable

What is ANTI dependency?

(1) READ after WRITE

(2) WRITE after WRITE

(3) READ after READ

(4) WRITE after READ

What is "prefetch"?

(1) loading data from relatively slow memory into the cache after the memory is required by processor

(2) loading data from cache into memory before the memory is required by processor

(3) loading data from relatively slow memory into the cache before the memory is required by processor

(4) memory address fetch before the memory will be required

What is the meaning of restrict attribute at pointer definition in С/С++?

(1) any pointer could be linked to this memory

(2) only this pointer could have its value

(3) only this pointer and the expression, based on this pointer could link to its memory cell

Constants in expressions tree

(1) must be hold at leafs

(2) not exists

(3) could be deleted without any results changing

(4) linked to lexigraphical order

During the VTune analysis some of the functions is missed. Why could it happened?

(1) big amounts of interprocedural data leads to function name aggregation

(2) function was deleted during inlining

(3) function name was changed to system and VTune hides this name

(4) it is typical error for global scope

(5) function name was overlapped by reserved system word

SSA-form is:

(1) ability to access SSE

(2) single assignment form

(3) special form with MMX instructions

(4) formalized vector extension

What option is used to disable inline?

(1) /Qb<0>

(2) /Qnoinline

(3) /Qinline<0>

"Dead code" may be caused by

(1) segmentation fault

(2) optimizing transformations

(3) execution slowdown

(4) usage of long identifier names

What interprocedural optimization is specific to C++?

(1) deconstructization

(2) deobjectization

(3) devirtualization

(4) relation canceling

(5) depolymorphysm

The dependency between S1 and S2 persist if

(1) statements S1 and S2 modifies the same variable at the same basic block

(2) statements S1 and S2 modifies the same variable at different basic blocks

(3) basic block of S1 dominates S2 and both of them writes the same memory cell

(4) statement S2 is followed by S1 and both of them reads the same variable

Cache levels differ by

(1) the speed of access

(2) fist level for scalar, second for superscalar and the third is for vectors

(3) by dimensions: one, three and four dimensional cache

(4) they are almost equal and not differ

Is there any dependence in this code? DO I=1,N S1 A(I)=… S2 …=A(I) END DO

(1) no dependence

(2) loop-carried dependence

(3) loop-independent dependence

Transforming optimization keeps the equation equivalence if

(1) optimization do not delete and do not add any code

(2) control flow graph is unchanged

(3) dependent statements order is unchanged

(4) blocks with dependent statements are untouched

Vectorization is

(1) collecting a program characteristics such as procedure execution time, branch mispredition rate, cache splitting etc.

(2) program source translation to assembler or native code

(3) process of converting from scalar representation where each operation using scalars to vector representation where one operation could use a vector operands

What is __declspec(align(n)) pragma used for?

(1) to tell the compiler that the program couldn't be compiled when the alignment is wrong

(2) to tell the compiler how to align variables in memory

(3) to tell the compiler that all the types must be aligned identically

(4) to disallow array alignment when its size is less than "n"

Array is better than linked list for

(1) no link maintain overhead

(2) can contain elements with different size

(3) modeled according to the statistics collected

(4) can contain "record" data type