A first tag is assigned to a branch instruction. Dependent on the type of branch instruction, a second tag is assigned to an instruction in the branch delay slot of the branch instruction. The second tag may equal the first tag if the branch delay slot is unconditional for that branch, and may equal a different tag if the branch delay slot is conditional for the branch. If the branch is mispredicted, the first tag is broadcast to pipeline stages that may have speculative instructions, and the first tag is compared to tags in the pipeline stages. If the tag in a pipeline stage matches the first tag, the instruction is not cancelled. If the tag mismatches, the instruction is cancelled.
Method For Identifying Basic Blocks With Conditional Delay Slot Instructions
A first tag is assigned to a branch instruction. Dependent on the type of branch instruction, a second tag is assigned to an instruction in the branch delay slot of the branch instruction. If the branch is mispredicted, the first tag is broadcast to pipeline stages that may have speculative instructions, and the first tag is compared to tags in the pipeline stages to determine which instructions to cancel. The assignment of tags for a fetch group of concurrently fetched instructions may be performed in parallel. A plurality of branch sequence numbers may be generated, and one of the plurality may be selected for each instruction responsive to the cumulative number of branch instructions preceding that instruction within the fetch group. The selection may be further responsive to whether or not the instruction is in a conditional delay slot.
Selection Of Link And Fall-Through Address Using A Bit In A Branch Address For The Selection
David A. Kruckemyer - Mountain View CA, US Daniel C. Murray - San Jose CA, US
Assignee:
Broadcom Corporation - Irvine CA
International Classification:
G06F009/32
US Classification:
712242, 712234, 711220
Abstract:
A link address/sequential address generation circuit is provided for generating a link/sequential address. The circuit receives the most significant bits of at least two addresses: a first address of a first set of bytes including a branch instruction and a second address of a second set of bytes contiguous to the first set. The least significant bits of the branch PC (those bits not included in the most significant bits of the addresses received by the circuit) are used to generate the least significant bits of the link/sequential address and to select one of the first address and the second address to supply the most significant bits.
Method For Cancelling Conditional Delay Slot Instructions
A first tag is assigned to a branch instruction. Dependent on the type of branch instruction, a second tag is assigned to an instruction in the branch delay slot of the branch instruction. If the branch is mispredicted, the first tag is broadcast to pipeline stages that may have speculative instructions, and the first tag is compared to tags in the pipeline stages to determine which instructions to cancel. The assignment of tags for a fetch group of concurrently fetched instructions may be performed in parallel. A plurality of branch sequence numbers may be generated, and one of the plurality may be selected for each instruction responsive to the cumulative number of branch instructions preceding that instruction within the fetch group. The selection may be further responsive to whether or not the instruction is in a conditional delay slot.
Clock Gating Of Sub-Circuits Within A Processor Execution Unit Responsive To Instruction Latency Counter Within Processor Issue Circuit
Vincent R. von Kaenel - Palo Alto CA, US David A. Kruckemyer - Mountain View CA, US
Assignee:
Broadcom Corporation - Irvine CA
International Classification:
G06F001/32
US Classification:
713324, 712220, 713322
Abstract:
A processor may include an execution circuit, an issue circuit coupled to the execution circuit, and a clock tree for clocking circuitry in the processor. The issue circuit issues an instruction to the execution circuit, and generates a control signal responsive to whether or not the instruction is issued to the execution circuit. The execution circuit includes at least a first subcircuit and a second subcircuit. A portion of the clock tree supplies a plurality of clocks to the execution circuit, including at least a first clock clocking the first subcircuit and at least a second clock clocking the second subcircuit. The portion of the clock tree is coupled to receive the control signal for collectively conditionally gating the plurality of clock, and is also configured to individually conditionally gate at least some of the plurality of clocks responsive to activity in the respective subcircuits of the execution circuit. A system on a chip may include several processors, and one or more of the processors may be conditionally clocked at the processor level.
Comparing Operands Of Instructions Against A Replay Scoreboard To Detect An Instruction Replay And Copying A Replay Scoreboard To An Issue Scoreboard
Tse-Yu Yeh - Milpitas CA, US David A. Kruckemyer - Mountain View CA, US Robert Rogenmoser - Santa Clara CA, US Robert Stepanian - San Francisco CA, US
Assignee:
Broadcom Corporation - Irvine CA
International Classification:
G06F009/38
US Classification:
712217, 712219
Abstract:
An apparatus for a processor includes a first scoreboard, a second scoreboard, and a control circuit coupled to the first scoreboard and the second scoreboard. The control circuit is configured to update the first scoreboard to indicate that a write is pending for a first destination register of a first instruction in response to issuing the first instruction into a first pipeline. The control circuit is configured to update the second scoreboard to indicate that the write is pending for the first destination register in response to the first instruction passing a first stage of the pipeline. Replay may be signaled for a given instruction at the first stage. In response to a replay of a second instruction, the control circuit is configured to copy a contents of the second scoreboard to the first scoreboard. In various embodiments, additional scoreboards may be used for detecting different types of dependencies.
Cache Coherent Protocol In Which Exclusive And Modified Data Is Transferred To Requesting Agent From Snooping Agent
David A. Kruckemyer - Mountain View CA, US Joseph B. Rowlands - Santa Clara CA, US
Assignee:
Broadcom Corporation - Irvine CA
International Classification:
G06F 13/00
US Classification:
711145, 711144, 711146
Abstract:
A system may include two or more agents, at least some of which may cache data. In response to a read transaction, a caching agent may snoop its cached data and provide a response in a response phase of the transaction. Particularly, the response may include an exclusive indication used to represent both exclusive and modified states within that agent. In one embodiment, the agent responding exclusive may be responsible for providing the data for a read transaction, and may transmit an indication of which of the exclusive or modified state that agent had the data in concurrent with transmitting the data.
Link And Fall-Through Address Formation Using A Program Counter Portion Selected By A Specific Branch Address Bit
David A. Kruckemyer - Mountain View CA, US Daniel C. Murray - San Jose CA, US
Assignee:
Broadcom Corporation - Irvine CA
International Classification:
G06F 9/32
US Classification:
712242, 712234, 711220
Abstract:
A link address/sequential address generation circuit is provided for generating a link/sequential address. The circuit receives the most significant bits of at least two addresses: a first address of a first set of bytes including a branch instruction and a second address of a second set of bytes contiguous to the first set. The least significant bits of the branch PC (those bits not included in the most significant bits of the addresses received by the circuit) are used to generate the least significant bits of the link/sequential address and to select one of the first address and the second address to supply the most significant bits.
Ventana Micro Systems
Principal Engineer
Samsung Electronics America May 2017 - Dec 2018
Principal Engineer
Arteris Apr 2014 - Feb 2017
Chief Hardware Architect
Appliedmicro Jun 2012 - Apr 2014
Staff Engineer
Veloce Technologies Jul 2009 - Jun 2012
Staff Engineer
Education:
The University of British Columbia 2005 - 2007
Master of Business Administration, Masters, Finance
Stanford University 1993 - 1994
Master of Science, Masters, Electrical Engineering
University of Illinois at Urbana - Champaign 1989 - 1993
Bachelors, Bachelor of Science, Computer Engineering
Skills:
Debugging Processors Semiconductors Hardware Architecture Verilog Application Specific Integrated Circuits Soc Asic Cache Coherency Microarchitecture Embedded Systems Computer Architecture System on A Chip Cache and Memory Subsystem Architecture Rtl Design Eda Arm Architecture Functional Verification Arm Vlsi