## 國立中山大學 104 學年度碩士暨碩士專班招生考試試題 科目名稱:計算機結構【資工系碩士班甲組、乙組】 ※本科目依簡章規定「不可以」使用計算機(問答申論題) 題號: 434001 共3頁第1頁 NOTE: If some questions are unclear or not well defined to you, you can make your own assumptions and state them clearly in the answer sheet. 1. (14% total) The following table shows the percentage of MIPS instructions executed by category for average of SPEC2000 integer programs and SPEC2000 floating point programs. | Instruction class | MIPS examples | Average CPI | Frequency | | |--------------------|-------------------------|------------------|-----------|----------------| | | | | Integer | Floating point | | Arithmetic | add, sub, addi | 1.0 clock cycles | 24% | 48% | | Data transfer | lw, sw, lb, sb, lui | 1.4 clock cycles | 36% | 39% | | Logical | and, or, nor, andi, ori | | 18% | 5% | | Conditional branch | beq, bne, slt, slti | 1.7 clock cycles | 18% | 6% | | Jump | j, jr, jal | 1.2 clock cycles | 4% | . 2% | 1.1 (4%) Using the average instruction mix information for the program SPEC2000fp, find the percentage of all memory accesses (both data and instruction) that are for reads. Assume that two-thirds of data transfers are loads. 1.2 (4%) Compute the effective CPI for MIPS. Average the instruction frequencies for SPEC2000int and SPEC2000fp to obtain the instruction mix. - 1.3 (6%) Consider an architecture that is similar to MIPS except that it supports update addressing for data transfer instructions. If we run SPEC2000int using this architecture, some percentage of the data transfer instructions will be able to make use of the new instructions, and for each instruction changed, one arithmetic instruction can be eliminated. If 25% of the data transfer instructions can be changed, which will be faster for SPEC2000int, the modified MIPS architecture or the unmodified architecture? How much faster? (Assume that both architectures have CPI values as given in the above table and that the modified architecture has its cycle time increased by 20% in order to accommodate the new instructions.) - 2. (12% total) Consider two different implementations, I1 and I2, of the same instruction set. There are three classes of instructions (A, B, and C) in the instruction set. I1 has a clock rate of 4 GHz, and I2 has a clock rate of 2 GHz. The following table shows the average number of cycles for each instruction class on I1 and I2. | Class | CPI on I1 | CPI on I2 | C1 Usage | C2 Usage | C3 Usage | |-------|-----------|-----------|----------|----------|----------| | Α | 2 | 1 | 40% | 40% | 50% | | · B | 3 | 2 | 40% | 20% | 25% | | С | 5 | 2 | 20% | 40% | 25% | The table also contains a summary of average proportion of instruction classes generated by three different compilers. C1 is a compiler produced by the makers of I1, C2 is produced by the makers of I2, and C3 is a third-party product. Assume that each compiler uses the same number of instructions for a given program but that the instruction mix is as described in the table. - 2.1 (6%) Using C1 on both I1 and I2, how much faster can the makers of I1 claim I1 is compared to I2? - 2.2 (6%) Which computer and compiler would you purchase if all other criteria are identical, including cost? - 3. (10% total) You are going to enhance a computer, and there are two possible improvements: either make multiply instructions run four times faster than before, or make memory access instructions run two times faster than before. You repeatedly run a program that takes 100 seconds to execute. Of this time, 20% is used for multiplication, 50% for memory access instructions, and 30% for other tasks. 3.1 (4%) What will the speedup be if both improvements are made? 3.2 (6%) You are going to change the program described in Problem 3 so that the percentages are not 20%, 50%, and 30% anymore. Assuming that none of the new percentages is 0, what sort of program would result in a tie with regard to speedup (i.e., the same speedup) between the two individual improvements? Provide both a formula and some examples. ## 國立中山大學 104 學年度碩士暨碩士專班招生考試試題 科目名稱:計算機結構【資工系碩士班甲組、乙組】 ※本科目依簡章規定「不可以」使用計算機(問答申論題) 題號:434001 共3頁第2頁 - 4. (16% total) Figure 1 gives the datapath of a pipelined processor with forwarding and hazard detection. - 4.1 (3%) We have a program of 1000 instructions in the format of "lw, add, lw, add, ..." The add instruction depends (and only depends) on the lw instruction right before it. The lw instruction also depends (and only depends) on the add instruction right before it. If the program is executed on the pipelined datapath of Figure 1, what would be the actual CPI? - 4.2 (3%) What would be the actual CPI for the program in Problem 4.1 without forwarding? - 4.3 (4%) Consider executing the following code on the pipelined datapath of Figure 1. How many cycles will it take to execute this code? lw \$4, 100(\$2) sub \$6, \$4, \$3 add \$2, \$3, \$6 add \$7, \$5, \$2 - 4.4 (6%) With regard to the code in Problem 4.3, explain what the forwarding unit is doing during the sixth cycle of execution. If any comparisons are being made, mention them. - 5. (16% total) We have a program core consisting of several conditional branches. The program core will be executed thousands of times. Below are the outcomes of each branch for one execution of the program core (T for taken, N for not taken). Branch 1: T-T-T-T, Branch 2: N-N-N-N, Branch 3: T-N-T-N, Branch 4: T-T-T-N-T Assume the behavior of each branch remains the same for each program core execution. For dynamic schemes, assume each branch has its own prediction buffer and each buffer initialized to the same state before each execution. List the predictions and calculate the prediction accuracy for the following branch prediction schemes: - 5.1 (5%) Always-taken - 5.2 (5%) 1-bit predictor (initialized to predict taken) - 5.3 (6%) 2-bit predictor as shown in Figure 2 (initialized to weakly predict taken) - 6. (12% total) The average memory access time (AMAT) is possibly useful as a figure of merit for different cache systems. - 6.1 (3%) Find the AMAT for a processor with a 2 ns clock, a miss penalty of 20 clock cycles, a miss rate of 0.06 misses per instruction, and a cache access time (including hit detection) of 1 clock cycle. Assume that the read and write miss penalties are the same and ignore other write stalls. - 6.2 (3%) Suppose we can improve the miss rate to 0.04 misses per reference by doubling the cache size. This causes the cache access time to increase to 1.2 clock cycles. Using the AMAT as a metric, determine if this is a good trade-off. - 6.3 (6%) If the cache access time determines the processor's clock cycle time, which is often the case, AMAT may not correctly indicate whether one cache organization is better than another. If the processor's clock cycle time must be changed to match that of a cache, is this a good trade-off? Assume the processors are identical except for the clock rate and the number of cache miss cycles; assume 1.5 references per instruction and a CPI without cache misses of 2. The miss penalty is 20 cycles for both processors. - 7. (20% total) Consider three processors with different cache configurations: - Cache 1: Direct-mapped with two-word blocks - Cache 2: Direct-mapped with four-word blocks - Cache 3: Two-way set associative with four-word blocks The following miss rate measurements have been made: - Cache 1: Instruction miss rate is 3.75%; data miss rate is 5% - Cache 2: Instruction miss rate is 2%; data miss rate is 4% - Cache 3: Instruction miss rate is 2%; data miss rate is 3% ## 國立中山大學 104 學年度碩士暨碩士專班招生考試試題 ## 科目名稱:計算機結構【資工系碩士班甲組、乙組】 ※本科目依簡章規定「不可以」使用計算機(問答申論題) 題號: 434001 共3頁第3頁 For these processors, one-half of the instructions contain a data reference. Assume that the cache miss penalty is 6 + Block size in words. The CPI for this workload was measured on a processor with Cache 1 and was found to be 2.0. - 7.1 (6%) Assuming a cache of 16K blocks and a 32-bit address, find the total number of sets and the total number of tag bits for Cache 1, Cache 2, and Cache 3. - 7.2 (6%) Determine which processor spends the most cycles on cache misses. - 7.3 (8%) The cycle times for the processors in Problem 7.2 are 400 ps for the first and second processors and 300 ps for the third processor. Determine which processor is the fastest and which is the slowest.