Initial commit

5年前1次提交

README.mdInitial commit5年前

README.md

stSNWxkQe

关于

<pre class="prettyprint linenums lang-py">mem={} for i in range (0,2**10): mem[i]=0 res={} for i in range (0,32): res[i]=0 while(1): a=input("请输入第一个操作数(小于2**8大于0的数)：") aa=int(a) if aa < 2**8 and aa>0 : mem[0]=aa break while(1): b=input("请输入第二个操作数（小于2**8大于0的数）：") bb=int(b) if bb < 2**8 and bb>0 : mem[1]=int(bb) break while(exit): str_ops=input("") str_ops=str_ops.replace(","," ") ops=str_ops.split(' ') args=[] for i in ops: args.append(i) args[0]=args[0].lower() if args[0]=='load': r1=int(args[1][1:]) m1=int(args[2][1:]) res[r1]=mem[m1] if args[0]=='add': r3=int(args[1][1:]) r1=int(args[2][1:]) r2=int(args[3][1:]) res[r3]=res[r1]+res[r2] if args[0]=='store': r3=int(args[1][1:]) m3=int(args[2][1:]) mem[m3]=res[r3] print(mem[m3]) </pre> Summary 1.1 Introduction Computer technology has gotten improvement incredibly in the past 65 years. Today people can purchase a computer which performances much more brilliantly than that of 1985, using 1/2000 of the price 30 years ago. This rapid improvement has come both from advances in the technology used to build computers and from innovations in computer design. Microprocessor was invented in late 1970s. The ability of the microprocessor to ride the improvements in integrated circuit technology led to a higher rate of performance improvement. The increasing fraction of the computer business being based on microprocessors, the virtual elimination of assembly language programming and the creation of standardized vendor-independent operating systems, all the three factors made it possible to develop successfully a new set of architectures with simpler instructions, called RISC (Reduced Instruction Set Computer) architectures, in the early 1980s. The RISC-based computers raised the performance bar, forcing prior architectures to keep up or disappear. However, the 17-year hardware renaissance ended in 2003 because of the twin hurdles of maximum power dissipation of air-cooled chips and the lack of more instruction-level parallelism to exploit efficiently. In 2004, Intel joined others in declaring that the road to higher performance would be via multiple processors per chip rather than via faster uniprocessors, which become an milestone that signals a historic switch from relying solely on instruction-level parallelism (ILP) to data-level parallelism (DLP) and thread-level parallelism (TLP). This text is about the architectural ideas and accompanying compiler improvements that made the incredible growth rate possible in the last century, the reasons for the dramatic change, and the challenges and initial promising approaches to architectural ideas, compilers, and interpreters for the 21st century. At the core is a quantitative approach to computer design and analysis that uses empirical observations of programs, experimentation, and simulation as its tools. It is this style and approach to computer design that is reflected in this text. The purpose of this chapter is to lay the quantitative foundation on which the following chapters and appendices are based. 1.2 Classes of Computers These changes in computer use have led to five different computing markets, including Personal mobile device (PMD), Desktop, Server, Cluster/warehouse-scale computer and Embedded, each characterized by different applications, requirements, and computing technologies. PMD PMD, with the cost occupying a prime concern, is the term we apply to a collection of wireless devices with multimedia user interfaces such as cell phones, tablet computers and so on. The key characteristics of media applications contain responsiveness, predictability, the need to minimize memory and the need to use energy efficiently. Therefore, to optimize memory size plays an important role in such cases. Desktop Computing The first, and probably still the largest market in dollar terms, is desktop computing. Throughout this range in price and capability, the desktop market tends to be driven to optimize price-performance. As a result, the newest, highest-performance microprocessors and cost-reduced microprocessors often appear first in desktop systems. Desktop computing also tends to be reasonably well characterized in terms of applications and benchmarking, though the increasing use of Web-centric, interactive applications poses new challenges in performance evaluation. Servers Servers play a role which provide larger-scale and more reliable file and computing services, and gradually become the backbone of large-scale enterprise computing, replacing the traditional mainframe. For severs, three key characteristics are availability, which is critical, scalability and efficient throughput. Clusters/Warehouse-Scale Computers Clusters are collections of desktop computers or servers connected by local area networks to act as a single larger computer. Each node runs its own operating system, and nodes communicate using a networking protocol. The largest of the clusters are called warehouse-scale computers (WSCs), in that they are designed so that tens of thousands of servers can act as one. First, price-performance and power are critical to the tremendous WSCs. Second, WSCs are related to servers, in that availability is significant. And third, Compared to the supercomputers, WSCs emphasize interactive applications, large-scale storage, dependability, and high Internet bandwidth. Embedded Computers Embedded computers are found in everyday machines. People separate the non-embedded and embedded computers by the ability to run third-party software. Embedded computers have the widest spread of processing power and cost. Performance requirements do exist, of course, but the primary goal is often meeting the performance need at a minimum price, rather than achieving higher performance at a higher price. Classes of Parallelism and Parallel Architectures There are basically two kinds of parallelism in applications: 1.       Data-Level Parallelism (DLP) 2.       Task-Level Parallelism (TLP) Computer hardware in turn can exploit these two kinds of application parallelism in four major ways: 1.       Instruction-Level Parallelism exploits data-level parallelism at modest levels with compiler help using ideas like pipelining and at medium levels using ideas like speculative execution. 2.       Vector Architectures and Graphic Processor Units (GPUs) exploit data-level parallelism by applying a single instruction to a collection of data in parallel. 3.       Thread-Level Parallelism exploits either data-level parallelism or task-level parallelism in a tightly coupled hardware model that allows for interaction among parallel threads. 4.       Request-Level Parallelism exploits parallelism among largely decoupled tasks specified by the programmer or the operating system. In these ways, people placed all computers into one of four categories: 1.       Single instruction stream, single data stream (SISD) 2.       Single instruction stream, multiple data streams (SIMD) 3.       Multiple instruction streams, single data stream (MISD) 4.       Multiple instruction streams, multiple data streams (MIMD) 1.3 Defining Computer Architecture The task the computer designer faces is a complex one: Determine what attributes are important for a new computer, then design a computer to maximize performance and energy efficiency while staying within cost, power, and availability constraints. We’ll quickly review instruction set architecture before describing the larger challenges for the computer architect. Instruction Set Architecture: The Myopic View of Computer Architecture We use the term instruction set architecture (ISA) to refer to the actual programmer-visible instruction set in this book. The ISA servers as the boundary between the software and hardware. These are seven aspects for the introduction of ISA. 1.       Class of ISA 2.       Memory addressing 3.       Addressing modes 4.       Types and sizes of operands 5.       Operations 6.       Control flow instructions 7.       Encoding an ISA Genuine Computer Architecture: Designing the Organization and Hardware to Meet Goals and Functional Requirements The implementation of a computer has two components: organization and hardware. The term organization includes the high-level aspects of a computer’s design, such as the memory system, the memory interconnect, and the design of the internal processor or CPU (central processing unit). Hardware refers to the specifics of a computer, including the detailed logic design and the packaging technology of the computer. In this book, the word architecture covers all three aspects of computer design-instruction set architecture, organization or microarchitecture, and hardware. 1.4 Trends in Technology An instruction set architecture should adapt to rapid changes in computer technology. Integrated circuit logic technology develop rapidly with transistor count on a chip increased by about 40% to 55%. Semiconductor DRAM is the foundation of main memory, develops fast, but the increasing difficulty of efficiently manufacturing even smaller DRAM cells slows down the developing speed. Semiconductor Flash is the standard storage device in PMDs, and its rapidly increasing popularity has fueled its rapid growth rate in capacity. Magnetic disk technology is central to sever and warehouse scale storage and disks are very cheap. Network performance depends both on the performance of switches and on the performance of the transmission system.  New efficient design needs that technology improves continuously and reaches a certain point. Bandwidth or throughput is the total amount of work done in a given time. Latency or response time is the time between the start and the completion of an event. Performance is the primary differentiator for microprocessors and networks, capacity is generally more important than performance for memory and disks. But their trend suggests that bandwidth grows by at least the square of the improvement in latency. 1.5 Trends in Power and Energy in Integrated Circuits Power is the biggest challenge facing the computer designer for nearly every class of computer. Power and Energy: A System Perspective Three primary concerns should be focused on by a system designer. First is that what the maximum power a processor ever requires is. Second is that what the sustained power consumption is. And third concern is the energy and energy efficiency. Energy and Power within a Microprocessor Modern microprocessor offer many techniques to improve energy despite flat clock rates and constant supply voltages including doing nothing well, dynamic voltage-frequency scaling (DVFS), design for typical case and overlooking. Because the processor is just a portion of the whole energy cost of a system, it can make sense to use a faster, less energy-efficient processor to allow the rest of the system to go into a sleep mode. This strategy is known as race-to-halt. 1.6 Trends in Cost The cost of a manufactured computer component decreases over time even without major improvements in the basic implementation technology. Besides the increase of volumes and commoditization contribute to the decreasing cost. As for cost of integrated circuit, the manufacturing process dictates the wafer cost, wafer yield, and defects per unit area, so the sole control of the designer is die area. The designer could affect die size, and hence cost, both by what functions are included on or excluded from the die and by the number of I/O pins. The cost of a mask set is also important. Designers may incorporate reconfigurable logic to enhance the flexibility of a part or choose to use gate arrays and thus reduce the cost implications of masks. 1.7 Dependability Infrastructure providers started offering service level agreements (SLAs) or service level objectives (SLOs) to guarantee that their networking or power service would be dependable. System alternate between two states of service with respect to an SLA. First is the service accomplishment, where the service is delivered as specified. And second is service interruption, where the delivered service is different from the SLA. 1.8 Measuring, Reporting, and Summarizing Performance The only consistent and reliable measure of performance is the execution time of real programs, and that all proposed alternatives to time as the metric or to real programs as the items measured have eventually led to misleading claims or even mistakes in computer design. To overcome the danger of placing too many eggs in one basket, collections of benchmark applications, called benchmark suites, are a popular measure of performance of processors with a variety of applications. Of course, such suites are only as good as the constituent individual benchmarks. Nonetheless, a key advantage of such suites is that the weakness of any one benchmark is lessened by the presence of the other benchmarks. The goal of a benchmark suite is that it will characterize the relative performance of two computers, particularly for programs not in the suite that customers are likely to run. The guiding principle of reporting performance measurements should be reproducibility—list everything another experimenter would need to duplicate the results. Rather than pick weights, we could normalize execution times to a reference computer by dividing the time on the reference computer by the time on the computer being rated, yielding a ratio proportional to performance. SPEC uses this approach, calling the ratio the SPEC ratio. 1.9 Quantitative Principles of Computer Design          This section introduces important observations about design, as well as two equations to evaluate.          Take Advantage of Parallelism          Taking advantage of parallelism is one of the most important methods for improving performance. Below are three examples.          First, in the use of parallelism at the system level, to improve the throughput performance on a typical server benchmark, such as SPEC Web or TPC-C, multiple processors and multiple disks can be used. The workload of handling requests can then be spread among the processors and disks, resulting in improved throughput.          Second, taking advantage of parallelism among instructions is critical to achieving high performance. One of the simplest ways to do this is through pipelining.          Third, Parallelism can also be exploited at the level of detailed digital design. For example, set-associative caches use multiple banks of memory that are typically searched in parallel to find a desired item.          Principle of Locality          The principle of locality: Programs tend to reuse data and instructions they have used recently. A widely held rule of thumb is that a program spends 90% of its execution time in only 10% of the code.          Temporal locality and special locality are two different types of locality.          Focus on the Common Case          The most important and pervasive principle of computer design is to focus on the common case: In making a design trade-off, favor the frequent case over the infrequent case. And focusing on the common case operates on resource allocation and performance as well.          Amdahl’s Law          With Amdahl’s law, we can calculate the performance gain which can be obtained by improving some portion of a computer. And Amdahl’s law defines the speedup that can be gained by using a particular feature.          Speedup depends on two factors: The fraction of the computation time in the original computer that can be converted to take advantage of the enhancement and the improvement gained by the enhanced execution mode.          Amdahl’s law expresses the law of diminishing returns: the incremental improvement in speedup gained by an improvement of just a portion of the computation diminishes as improvements are added.          The Processor Performance Equation          This chapter gives us several processor performance equations. To use the processor performance equation as a design tool, we need to be able to measure the various factors. 1.10 Putting It All Together: Performance, Price, and Power Looking at measures of performance and power-performance in small servers using the SPEC power benchmark. Comparing three Dell PowerEdge servers: 1.       PowerEdge R710 of 6 cores and 12 GB of DRAM. 2.       PowerEdge R815 of 24 cores and 16 GB of DRAM. 3.       PowerEdge R815 of 48 cores and 32 GB of DRAM. The performance and price-performance winner is the PowerEdge R815 with 48 cores. In second place is the R815 with 24 cores, and the R710 with 12 cores is in last place. But adding power reverses the results. The price-power-performance trophy goes to Intel R710, the 48-core R815 comes in last place. 1.11 Fallacies and Pitfalls We make a conclusion about the fallacies and pitfalls of this chapter. The fallacies are: 1.       Multiprocessors are a silver bullet. 2.       Hardware enhancements that increase performance improve energy efficiency or are at worst energy neutral. 3.       Benchmarks remain valid indefinitely. 4.       The rated mean time to failure of disks is 1,200,000 hours or almost 140 years, so disks practically never fail. 5.       Peak performance tracks observed performance. The pitfalls are: 1.       Falling prey to Amdahl’s heartbreaking law. 2.       A single point of failure. 3.       Fault detection can lower availability. 1.12 Concluding Remarks          The introduction of the whole books.          In Chapter 1, it introduces a number of concepts and provided a quantitative framework that reader will expand upon throughout the book.          In Chapter 2, we start with the all-important area of memory system design.          In Chapter 3, we look at instruction-level parallelism (ILP), of which pipelining is the simplest and most common form.          Chapter 4 is new to this edition, and it explains three ways to exploit data-level parallelism.          Chapter 5 focuses on the issue of achieving higher performance using multiple processors, or multiprocessors. Chapter 6 is also new to this edition. We introduce clusters and then go into depth on warehouse-scale computers (WSCs), which computer architects help design. 1.13 Historical Perspectives and References          It is the end of Chapter 1.

README.md

0 B

邀请码

加入我们
官网邮箱：gitlink@ccf.org.cn

QQ群

公众号