所別:電機工程學系碩士班 電子組 科目:計算機組織 共 2 頁 第 ] 百 \*請在試卷答案卷(卡)內作名 1. There are three processors with different cache configurations: Cache 1—two-way set associative with four-word blocks; Cache 2—direct-mapped with one-word blocks; Cache 3—direct-mapped with four-word blocks. Also, the following miss rate measurements have been made: Cache 1—instruction miss rate is 2% and data miss rate is 3%; Cache 2—instruction miss rate is 3% and data miss rate is 5%; Cache 3—instruction miss rate is 2% and data miss rate is 4%. The cycle times for the three processors are 400 ps for the first and second processor and 300 ps for the third processor. For these processors, one-half of the instructions contain a data reference. Assume that the cache miss penalty is 6+Block size in words. The clock cycles per instruction (CPI) for this workload was measured on a processor with Cache 1 and was found to be 1.85. - (a) Determine which processor spends the most cycles on cache misses. (10%) - (b) Determine which processor is the fastest and which is the slowest. (10%) - 2. In a 15-stage pipelined processor, two bubbles must be inserted for conditional branch instructions, which constitutes 10% of all instructions executed. About 2.5% of all instructions encounter a cache miss when accessing the data memory, causing the pipeline to stall for 20 cycles. What is the effective CPI for this processor? (Hint: A pipeline that is always full (no stalls or bubbles), leads to a CPI of 1 in the long run.) (10%) - 3. Figure 1(a) depicts a multiplexer-based full adder, where X, Y, and C<sub>in</sub> (carry input) are inputs, and S (sum) and C<sub>out</sub> (carry out) are outputs. Assume that the delay of a 4-to-1 multiplexer and an inverter is denoted as T<sub>mux4</sub> and T<sub>inv</sub>, respectively. Also, T<sub>inv</sub> is less than T<sub>mux4</sub>. - (a) Derive the delay of the critical path of the multiplexer-based full adder in terms of $T_{mux4}$ and $T_{inv}$ . (5%) - (b) Assume that 8 multiplexer-based full adders are implemented as an 8-bit ripple-carry adder (RCA) shown in Figure 1(b). Derive the delay of the critical path of the 8-bit RCA in terms of $T_{mux4}$ and $T_{inv}$ . (7%) - (c) Figure 1(c) shows a 16-bit carry-select adder (CSA) which is designed with 8-bit RCAs and multiplexers, where $S_i^1$ ( $S_i^0$ ) denotes the *i*th sum output of an 8-bit RCA with carry input of 1 (0); and $C_j^1$ ( $C_j^0$ ) denotes the *j*th carry output of an 8-bit RCA with carry input of 1 (0). Assume that the delay of a 2-to-1 multiplexer is denoted as $T_{mux2}$ . Derive the delay of the critical path of the 16-bit CSA in terms of $T_{mux4}$ , $T_{mux2}$ , and $T_{inv}$ . (8%) Figure 1: (a)Multiplexer-based full adder. (b) 8-bit ripple-carry adder. (c) 16-bit carry-select adder. 注:背面有試題 多考用 ## 國立中央大學97學年度碩士班考試入學試題卷 所別:電機工程學系碩士班 電子組 科目: 計算機組織 共 三 頁 第 三 頁 \*請在試卷答案卷(卡)內作? 4. Explain why each of the following microprocessor features affect (or do not affect) the processing rate of the chip. (10%) - (a) Clock frequency - (b) Data bus width - (c) Address bus width - (d) Internal cache memory - (e) Coprocessor (internal or external) - 5. (a) What are the main advantages and disadvantages of pipelines? (4%) (b) What is pipeline hazard? (4%) (c) A pipeline machine has four stages, ie., an instruction consists of four phases (e.g., instruction fetch, instruction decode, operand fetch and execute): Stage 1 needs 80 nanoseconds (ns): Stage 2 needs 50 nanoseconds, and so on. The pipeline is shown as follows: How much time is the pipeline machine required to complete ten instructions? (8%) 6. (a) What is the major features of Booth's algorithm? - (4%) - (b) Please list the worst case of Booth's algorithm for A\*B, where A and B are 16-bit data. - (4%) - (c) Please list the best case of Booth's algorithm for A\*B, where A and B are 16-bit data. - (4%) - 7. Consider the machine with three instruction classes X, Y, Z. Now suppose we measure the code for the same program from two different compilers A and B, and obtain their instruction counts. Assume that the machine's clock rate is 500 MHz. - (a) What is the execution time for two compilers? (4%) (b) What is the MIPS for each version of the program? (4%) (c) If the machine is a 4-way VLIW machine, what is the MOPS (million operations per second) of this machine? (4%) | Instruction class | CPI for this instruction class | | | |-------------------|--------------------------------|--|--| | X | 1 | | | | Y | 2 | | | | Code from | Instruction counts (in billions) for each instruction class | | | | | |------------|--------------------------------------------------------------|---|---|-----|--| | | X | Y | | Z | | | Compiler A | 5 | 1 | • | . 1 | | | Compiler B | 10 | 1 | | 1 | | 涟:背面有試題