Talk:Instruction pipelining/Archive 1

This is an archive of past discussions about Instruction pipelining. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Archive 1

How old is the concept?

The concept of pipelining is not that modern. 6502 of 1975 had pipelining. --matusz 11:03, 15 Mar 2005 (UTC)

Actually, it goes back a lot farther than that. The original pipelined computer was the IBM Stretch, conceived in 1955. NickP 13:01, 2 December 2005 (UTC)

double wikipedia entry

is this the same as Instruction pipelining?

Thanks for the heads up. I put merge tags on them. I expect that instruction pipelining will become a redirect to this page after any additional content in it has been absorbed into this article. Any volunteers? Deco 00:22, 27 May 2005 (UTC)

As I understand it, pipelining is a generic concept appliciable to many different things. Instruction pipelining is the way it's used in CPUs/GPUs etc to get some degree of parallelism. Therefore pipelining and instruction pipelining is not the same thing. Please correct me if I'm wrong. eKIK 19:57, 4 October 2005 (UTC)

Actually, imagine I have one pipeline which will change a value at the adress A. Before that happens maybe pipeline number 2 will already have worked with this value. So, pipeline number 2 already used a value that was not "finished" yet. Thus, aren't such collisions happening all the time when using instruction pipelining? Thanks, Abdull

I read through both articles, and theres no longer any reason to keep "Instruction Pipelining". IMO it should be deleted and the current "pipelineing" article should be renamed to "Instruction Pipelining" to distinquish it from "Software Pipelining" and other related applications of pipelining.

Less talking, more editing... The instruction pipeline article is totally redundant, and this one is of higher quality. I've set the other to redirect here. -- uberpenguin 02:51, 18 December 2005 (UTC)

Pipeline analogy

One frustrating aspect of this entry is that it doesn't offer a simple and concrete idea of the principle of piplining right up front. Would the following analogy (or something like it) help, or would it serve as a distraction?

Imagine a laundry room with a washing machine and a dryer. The washer takes 30 minutes to wash, and the dryer takes 60 minutes to dry. To complete three loads of laundry in this scenario would take 3.5 hours: 30 minutes for the first wash load and 60 minutes for the three dry cycles (after the first washing, all washings can occur concurrently with the drying).

Now, if the laundry room had two dryers and the clothes were moved between them every 30 minutes, it would take only 2.5 hours to complete three loads of wash: after 90 minutes each washer and dryer would be filled, and two more 30-minute cycles would empty the "pipeline."

In both scenarios, it takes 90 minutes for one load of wash to come out finished. However, the throughput of these scenarios are different: the first one can output a finished load every 60 minutes, while the second scenario can output a finished load every 30 minutes.

Such analogies aren't unheard of here: see Voltage#Hydraulic_analogy.

—Kairotic 16:08, 30 October 2006 (UTC)

I'm sorry, I can't follow your analogy, and I still don't get what pipelining is about. —Dudboi 13:02, 5 November 2006 (UTC)

That analogy isn't very good - try this instead: You have to wash, dry, and fold laundry. Washing takes 30 minutes; drying takes 60 minutes; folding takes 15 minutes. Without pipelining, you would wash a load, dry it, and fold it, and then wash the next load. That means that one load of laundry finishes every 105 minutes. With pipelining, once the first load has been washed, you start washing the next load of laundry. Once the first load has been dried, you start drying the second load, and the third load starts being washed. This way, a load finishes every 60 minutes (limited by the time it takes to dry). What Kairotic was trying to add to this was the idea that you could split the dryer into 2 steps, each of which takes half as long (that way, the longest step is just 30 minutes, and you would finish a load of laundry every 30 minutes). I wouldn't worry about that in the "simple" explanation of pipelining. --CTho 00:10, 6 November 2006 (UTC)

Pipeline latency

I removed the statements that pipelines increase latency. The time it takes to execute an instruction is the same whether or not a pipeline is used. The appearance of increased latency as 'the next instruction has to travel all the way through the pipeline before its result becomes available and the processor appears to "work" again' (to quote a line from the article) is an illusion. The next instruction has to travel all the way through the pipeline anyway; you just don't notice it when the pipeline is full since "work" is being done as the result of previous instructions. In terms of the cute animation, the time it takes to make each object is the same whether or not others are being worked on simultaneously. --Rick Sidwell 18:18, 10 Jan 2005 (UTC)

Pipelines don't increase latency? On the contrary, the latency is guaranteed to increase. In a non-pipelined process the latency is a single clock cycle. The single clock cycle's period is determined by the time to traverse all the logic involved in carrying out the instruction from the input registers to the output registers. With a pipelined process you have to go through all the same logic as well as all the pipeline registers, with the clock rate being determined by the pipeline stage with the greatest logic depth. (I'm including routeing in the logic depth here). In a non-pipelined process you are traversing less logic, and the clock rate is as high as the logic can handle. In a pipelined process you're traversing more logic, and most of it is being clocked slower than it could handle, due to all the pipeline stages having to use the same clock. Gantlord 14:33, 18 December 2006 (UTC)

last paragraph of first section

I don't think you need 4 independent instructions to keep a non-forwarding 5 stage pipeline full. Assuming 2 dependent ops, you get:

f d e m w
  f     d e m w

That's 2 stalls if you can read from the register file the same cycle you write it, otherwise 3.

--CTho 15:44, 26 December 2005 (UTC)

I think this is wrong: "Because each stage performs only a small part of the overall computation, clock speed can be increased tremendiously."

Pipelining increases the average number of instructions executed per clock cycle and not the clock speed itself.

"Think" what you like, but I know it typically increases the maximum clock speed too, as the inter-register logic depth is reduced. The minimum clock period is governed by the largest logic and routing delay between two registers within the pipeline. When you increase the number of stages in a pipeline, and assuming you're competent enough to balance the logic depths well, you decrease the minimum clock period and hence increase the maximum clock frequency. See the large numbers of stages present in the Prescott Pentium 4 and the resultant clock period increases. You can expect the cost of this to come in increased area requirements (those registers don't come free) and in higher power consumption (more logic, transitioning more often). Pipelining is good, but it's not magic, it's a way to trade power, area and latency for throughput. Gantlord 14:33, 18 December 2006 (UTC)

It does both.  Consider a single-cycle design vs a pipelined design - both have an IPC of ~1,
but the pipelined design can have a much higher clock speed.
--CTho 16:55, 27 December 2005 (UTC)

One stage pipeline?

The article states that the MOS Tech 6502 has a one stage pipeline, i thought that all pipes needed to be of size 2 or more. furthermore, there is no mention of any pipe in MOS Technology 6502 at all. This is seems odd 129.78.208.4 01:50, 1 March 2007 (UTC)

shakeel

yar this pipe line is not good for data processing tasks it is only useful for aritmetic processing so why are htey mostly used —The preceding unsigned comment was added by 202.83.169.218 (talk) 06:04, 3 May 2007 (UTC).

Don't we need a definition?

I mean the diagram on the right of the main article is very suggestive, but in my opinion we should define the concept in a few words before explaining in detail and list advantages / disadvantages... Something simple that includes the words "overlapping" and "parallel". I could come up with something, I'll wait to see other opinions.

(On a side note, the Content of the Talk page is placed down the page not including the first (presumably chronological as well) discussions. Would it be all right if I define headlines for those and move the Content right in the start of the Talk page? (I'm not sure yet how to do that...))
--Adsp 21:58, 16 May 2007 (UTC)

Yes, a clear definition would be nice. Got any texts or sources so we can get a rounded definition?

As for headings. Just add in a heading (between double equal signs) that fix the post. If one isn't obvious then just make it "Old discussion" or something. Cburnett 01:47, 17 May 2007 (UTC)

I would replace the second sentence "Pipelining reduces cycle time of a processor" which I don't agree with. The cycle time is the same, an isolated instruction is not executed faster, but I do agree that instruction throughput is increased.

In "Computer Organization and Design: the Hardware/Software Interface" by Hennessy and Patterson, "Pipelining is an implementation technique in which multiple instructions are overlapped in execution".

I would insert "successive instructions in a program sequence will overlap in execution" somewhere in the very beginning (the first two sentences).

I would also move "Advantages of pipelining" after Content.

And I would change the paragraph starting with "The instruction cycle is easy to implement..." up to "Processors with pipelining are organised inside into (stages)" (why "stages" in brackets?) with something like

A non-pipeline architecture is inefficient because some CPU components (modules) are idle while another module is active during the instruction cycle. Pipelining does not completely cancel out idle time in a CPU but making those modules work in parallel (or concurrently) improves program execution significantly.

Links to parallel computing and concurrency would be nice in See also paragraph.

--Adsp 12:54, 21 May 2007 (UTC)

I've just made the changes I proposed above. Mainly a proper definition in support of the diagram. But also minor changes like moving the Advantages paragraph down and including it to the Content - unfortunately this will make tracking the changes more difficult when comparing to the previous version, sorry about that.

--Adsp 13:24, 25 May 2007 (UTC)

wave pipelines

Alright, this doesn't exactly fit in the currenty organization, but i think it belong somewhere on wikipedia: wave pipelines.

Firstly pipelines are digital logic concepts, not computer concepts. A programmer never needs to concern themselves with pipelines, or more generally any aspect of the micro-architecture of the processor. Furthermore, pipelines exist in digital logic devices other than computer microprocessors, such as DSPs and so forth. They can exist in televisions, radios, you name it. Pipelines are features of the micro-architecture of a processor, not the architecture, and micro-architecture is the digital logic of the processor. So pipelines are inherently digital logic concepts, not computing concepts.

An "instruction pipeline" is a digital-logic pipeline, which exists at a pretty low-level, exposed to a much higher-level: the instruction level. Nothing is really done at that higher level, though. It all exists at the lower level.

At this low level, registers (D flip-flops) are inserted between stages of combinatorial digital logic, and these registers are clocked synchronously. That is a pipeline. The only thing that makes a pipeline an "instruction pipeline" is that the first stage is an instruction decoder.

A wave pipeline is like a pipeline, except instead of inserting registers between stages of combinatorial logic, you balance the delay of each stage, and use this delay as your data buffer. This does a few things:

reduces circuitry,
instead of your maximimum clock speed being determined by the maximum delay, it is determined by the difference between the minimum delay and maximum delay,
which is always smaller than the maximum delay (because minimum delay is non-negative), thus the achievable clock speed is higher.

The drawback is that it's more difficult to implement because you have to match the circuit delays.

Anycase, there's no article on wikipedia on wave pipelines, and I think there should be one. But the distinction between a wave pipeline and the more conventional register pipeline is a digital logic distinction, and I don't know how to fit that into the current organization, which discusses pipelines at a higher level of abstraction. Kevin Baas^talk 16:44, 24 February 2008 (UTC)

I have to nitpick with your assertion that programmer never has to concern themselves with pipelines. I think it is more accurate to say that you don't have to be concerned about pipelines to program. Instruction-level optimization does certainly involve understanding of the pipeline and branch prediction (and caches, etc.).

You should start with making a new section in this article on physical implementations, or something. One subsection for the standard pipeline with registers and another for wave pipelines. Cburnett (talk) 17:05, 24 February 2008 (UTC)

I'm kinda wary about this, as everything in the CPU technologies template is about synchronous, clocked circuits that are used in commercial CPUs. as far as i know, wave pipelining is not used in commercial CPUs. Also, if a section on wave pipelines is included, then a section on micro-pipelines should be put in, which are strictly asynchronous, and, again, not currently used in commercial CPUs. Kevin Baas^talk 17:38, 24 February 2008 (UTC)

I'm somewhat compelled to put it in the Pipeline (computing) article instead, as that article seems to be more hardware-based. Kevin Baas^talk 17:46, 24 February 2008 (UTC)

I created a section in the pipeline (computing) article called "Implementations", with three subcategories:

Buffered, Synchronous pipelines
Buffered, Asynchronous pipelines
Unbuffered pipelines

The first of which, "Buffered, Synchronous pipelines", briefly describes a conventional pipeline uses in today's microprocessors, and the last of which, "Unbuffered pipelines", briefly describes wave pipelines. Kevin Baas^talk 18:03, 2 March 2008 (UTC)

Disadvantages of Pipelining

This section doesn't clearly discuss the topic. It's wording should changed to reflect the topic directly then compare to the alternative rather than list the alternative's advantages and expect the read to interpret how it's related to the topic. —Preceding unsigned comment added by 66.158.176.208 (talk) 03:52, 24 April 2008 (UTC)

History of pipeining

Zuse's Z3 as described in, Raul Rojas, "Konrad Zuse's Legacy: The Architecture of the Z1 and Z3", IEEE Annals of the History of Computing, Vol. 19, No. 2, 1997, looks to be pipelined. See the diagram on page 10. The paper seems accessible here: [1]

Do people agree? Should I change the article and add this paper as a reference? Serviscope Minor (talk) 04:25, 21 August 2008 (UTC)

"Zuse took great care to save execution time by overlapping the fetch stage of the next instruction with the write-back stage of the current one." -- Looking at Figure 5, there is shown that the WB and IF ops are happening at the same time -- a primitive (and very short :) use of pipelining. Although the potential implications at this stage may have not been fully realised. User A1 (talk) 09:10, 21 August 2008 (UTC)

Post Script -- nice bit of research! User A1 (talk) 09:11, 21 August 2008 (UTC)

Branch delays

I'm a little confused about the following statement from the disadvantages section, "A non-pipelined processor executes only a single instruction at a time. This prevents branch delays (in effect, every branch is delayed) and problems with serial instructions being executed concurrently. Consequently the design is simpler and cheaper to manufacture."

If your point is to say that that things like branch prediction aren't necessary for nonpipeline processors, then that should be mentioned in the first sentence. As it is written, it seems like the disadvantage is that multiple instructions are delayed; instructions which a non-pipelined processor wouldn't even be working on yet. What does everyone think?--68.32.17.238 (talk) 04:49, 13 February 2011 (UTC)

Instruction pipeline vs. Simultaneous multithreading

The graphic in the examples section called "Generic 4-stage pipeline" shows 4 Threads right? Green = Thread 1, Purple = Thread 2, Blue = Thread 3, Red = Thread 4. If so, does this mean that this graphic is showing two things together (instruction pipeline + SMT)? If not, where's the difference? — Preceding unsigned comment added by U1xn (talk • contribs) 15:23, 1 April 2011 (UTC)

April 2012 rewrite

I've rewritten the first half of this, taking material from the introduction into a more methodical first section entitled Introduction, rewording Advantages and disadvantages, and omitting discussion of flip-flops, which are way outside the scope. I added the comparison to an assembly line and added the simplest example I could devise. It strikes me that Example 1 and Example 2 now do nothing but illustrate the "hazard" with a more complex example, and some of Complications is also superfluous, but I am less bold about deleting stuff than about adding rephrasing, so I invite input from other editors of this article before doing anything further. Spike-from-NH (talk) 01:02, 16 April 2012 (UTC)

I have indeed eliminated the part of Complications that had become superfluous, and retained the rest under the heading Special situations.

Unless objections are raised in the next couple of days, I will delete Example 1 and Example 2 in favor of the more bare-bones example I gave while defining "hazards." I will not delete the big example with its two multi-color illustrations.

I was not able to determine what "forwarding" is from the prior text of this article, and the Wikipedia disambiguation page links here. I feel I may have defined it incorrectly. Separately, I'll try to say a word about out-of-order execution in the prose. Spike-from-NH (talk) 21:47, 17 April 2012 (UTC)

PS--Tenative solution is in place. Spike-from-NH (talk) 13:12, 27 April 2012 (UTC)

Another query

The intro carefully says the focus of the article is computers "and other digital electronic devices". Can anyone give me an example of another digital device that uses an instruction pipeline (which, according to our article, it would do so as to achieve sequential execution) and explain why it, too, is not a computer? Spike-from-NH (talk) 00:27, 27 April 2012 (UTC)

Hearing no comment, I will delete this phrase. Spike-from-NH (talk) 17:11, 15 May 2012 (UTC)

"Every Microprocessor??"

"Every microprocessor manufactured today uses at least 2 stages of pipeline. (The Atmel AVR and the PIC microcontroller each have a 2 stage pipeline). Intel Pentium 4 processors have 20 stage pipelines."

I don't like the absolute word "Every" in that sentence. To my knowledge, the 8051 and its derivatives are usually not pipelined, and the 8051 is still one of the most popular micro-controllers around. At very least, the speed of the 8051 is not dependent on conditional branches like the PIC, or ARM. --69.138.232.32 (talk) 13:30, 5 May 2008 (UTC)

I agree that the 8051 is still popular. However, (a) the original non-pipelined 8051s are no longer manufactured -- Intel no longer manufactures any 8051 (neither does AMD[2]), and (b) I've been told that the Dallas/Maxim 8051 derivatives are pipelined -- that's why they are advertised as "12 times faster"[3]. Does anyone still manufacture non-pipelined CPUs? --68.0.124.33 (talk) 20:07, 1 December 2008 (UTC)

Are you saying that for example the NXP P80C554SFBD, which takes 6 clocks per CPU cycle, is pipelined? I think not. --Bdijkstra (talk) 20:12, 25 May 2012 (UTC)

Does slowest step imply full set of sub-instructions?

" This allows the computer's control circuitry to issue instructions at the processing rate of the slowest step" Here it's implying that the slowest step encapsulates all the sub-instructions and is generic for all instructions in the architecture, I find the word "slow" to be in need of a better term. Perhaps issuing instructions at the processing rate of the entire sub-instruction set of the pipeline is more clear? ChazZeromus (talk) 01:09, 23 April 2011 (UTC)

The term "sub-instruction" is ambiguous and not mentioned in the article. I assume that you mean the operation performed by a single stage. The slowest step can be defined without referring to sub-instructions and is simply the slowest step of all the potential steps that can be performed at any stage. So it encompasses (not encapsulates) all stages (not sub-instructions) and is general (not generic). The terms "issuing" and "processing rate" of your suggestion are ambiguous and not mentioned in the article (any more). --Bdijkstra (talk) 21:33, 25 May 2012 (UTC)

Why does "superpipelined" redirect here?

Not used or defined in article. 86.159.197.174 (talk) 17:28, 26 August 2014 (UTC)

Note that page's history. It was a separate article until 2005, when it was redirected here. I take it that "superpipelining" is a follow-on technique to pipelining, in which the most time-consuming individual stages are replaced by multiple stages, each of which is shorter in duration and presumably can be executed in parallel. Spike-from-NH (talk) 21:23, 26 August 2014 (UTC)

"frequently used in CPUs but avoided in real-time systems"

Aren't most CPUs now pipelined, and aren't most real-time systems based on CPUs? See "11-stage pipeline on the Cortex-R7", for example... 46.218.234.67 (talk) 14:20, 22 December 2016 (UTC)

The comment is well-taken. This sentence ends, "in which latency is a hard constraint." I worked on processors where latency was a hard constraint, and even conditional execution could disturb refresh of the display monitor and make the image jitter. That was last century, and in the real world, the "hard constraint" between the start and end of an instruction is made insignificant by producing a processor that is one hundred times faster. I'll reword this paragraph. Spike-from-NH (talk) 14:39, 22 December 2016 (UTC)