## 科目:計算機結構【資工系碩士班】 1. (12% total) The table below shows the percentage of MIPS instructions executed by category for average of five SPEC2000 integer programs and five SPEC2000 floating-point programs. | Instruction | MIPS examples | Average CPI | Frequency | | |--------------------|-------------------------|------------------|-----------|----------------| | | | | Integer | Floating-point | | Arithmetic | add, sub, addi | 1.0 clock cycles | 24% | 48% | | Data transfer | lw, sw, lb, sb, lui | 1.4 clock cycles | 36% | 39% | | Logical | and, or, nor, andi, ori | 1.0 clock cycles | 18% | 4% | | Conditional branch | beq, bne, slt, slti | 1.7 clock cycles | 18% | 6% | | Jump | j, jr, jal | 1.2 clock cycles | 4% | 2% | - 1.1 (3%) Using the information for the program SPEC2000int, find the percentage of all memory accesses (both data and instruction) that are for data. - 1.2 (4%) Using the information for the program SPEC2000fp, find the percentage of all memory accesses (both data and instruction) that are for reads. Assume that two-thirds of data transfers are loads. - 1.3 (5%) Compute the effective CPI for MIPS. Average the instruction frequencies for SPEC2000int and SPEC2000fp to obtain the instruction mix. - 2. (10% total) The 32-bit pattern 0xA4180000<sub>sixteen</sub> is expressed in hexadecimal notation. What decimal number does this bit pattern represent? - 2.1 (4%) Assume that it is a two's-complement integer. - 2.2 (6%) Assume that it is a single precision IEEE 754 floating-point number. - 3. (12% total) Consider the computer with three instruction classes and the following CPI measurements: | | CPI for this instruction class | | | | |-----|--------------------------------|---|---|--| | | A | В | C | | | CPI | 1 | 2 | 3 | | Now suppose we measure the code for the same program from two different compilers and obtain the following data: | | Instruction counts (in billions) for each instruction class | | | | |------------|-------------------------------------------------------------|---|---|--| | Code from | A | В | С | | | Compiler 1 | 5 | 1 | 1 | | | Compiler 2 | 11 | 1 | 1 | | Assume that the computer's clock rate is 4 GHz. - 3.1 (6%) Compute the MIPS (million instructions per second) rate for each version of the program. Which code sequence will execute faster according to MIPS? - 3.2 (6%) Find the execution time for each version of the program. Which code sequence will execute faster according to execution time? # 國立中山大學100學年度碩士班招生考試試題 #### 科目:計算機結構【資工系碩士班】 - 4. (20% total) Compare performance for single-cycle, multicycle, and five-stage (IF, ID, EX, MEM, WB) pipelined control using the following instruction mix: 25% loads, 10% stores, 11% branches, 2% jumps, and 52% ALU instructions. Assume that the operation times for the major functional units are the following: - Memory units: 200 picoseconds (ps) - ALU and adders: 100 ps - Register file (read or write): 50 ps In the multicycle implementation, the number of clock cycles required to execute each instruction class is 5, 4, 4, 3, and 3 for loads, stores, ALU instructions, branches, and jumps, respectively. For pipelined execution, assume that half of the load instructions are immediately followed by an instruction that uses the result, that the branch delay on misprediction is 1 clock cycle, and that one-quarter of the branches are mispredicted. Assume that jumps always pay 1 full clock cycle of delay, so their average time is 2 clock cycles. Ignore any other hazards. - 4.1 (3%) What is the clock cycle time for single-cycle, multicycle, and pipelined datapath? - 4.2 (10%) What is the CPI for single-cycle, multicycle, and pipelined design? - 4.3 (4%) What is the average instruction time for single-cycle, multicycle, and pipelined design? - 4.4 (3%) If we can split one stage of the pipelined datapath into two new stages, each with half the operation time of the original stage to shorten the clock cycle time, which stages would you split and what is the new clock cycle time of the pipelined datapath? - 5. (12% total) Consider executing the following code on a five-stage (IF, ID, EX, MEM, WB) pipelined datapath: add \$3, \$4, \$2 sub \$5, \$3, \$1 lw \$6, 200(\$3) add \$7, \$3, \$6 - 5.1 (3%) Identify all of the data dependencies. - 5.2 (2%) Which dependencies are data hazards that will be resolved via forwarding? - 5.3 (2%) Which dependencies are data hazards that will cause a stall? - 5.4 (5%) How many cycles will it take to execute this code? - 6. (12% total) Assume an instruction cache miss rate for a program is 2% and a data cache miss rate is 4%. - 6.1 (6%) If a processor has a CPI of 2 without any memory stalls and the miss penalty is 100 cycles for all misses, determine how much faster a processor would run with a perfect cache that never missed. Use the instruction frequencies for SPEC2000int shown in Problem 1. - 6.2 (6%) Suppose we increase the performance of the processor by doubling its clock rate. Since the main memory speed is unlikely to change, assume that the absolute time to handle a cache miss does not change. How much faster will the processor be with the faster clock, assuming the same miss rate as Problem 6.1? # 國立中山大學100學年度碩士班招生考試試題 ## 科目:計算機結構【資工系碩士班】 - 7. (10%) Increasing associativity requires more comparators, and more tag bits per cache block. Assuming a cache of 4K blocks, a four-word block size, and a 32-bit address, find the total number of sets and the total number of tag bits for caches that are direct mapped, two-way and four-way set associative, and fully associative. - 8. (12% total) Please briefly answer the following questions. - 8.1 (4%) Suppose you want to achieve a speedup of 80 with 100 processors. What fraction of the original computation can be sequential? - 8.2 (4%) What's the main idea of loop unrolling? List at least two aspects that the loop unrolling can help to improve performance. - 8.3 (4%) What are the dynamic branch prediction and the branch prediction buffer?