Benedict R. Gaster - Santa Cruz CA, US Lee W. Howes - Santa Clara CA, US
International Classification:
G06F 9/45
US Classification:
717148, 717146
Abstract:
A medium and method is disclosed for compiling vector programs. A compiler receives program code that includes a function invocation. The compiler determines the vector width of a target computer system and creates a width-specific executable version of the program code by mapping the function invocation to a width-specific implementation of the function. The width-specific implementation corresponds to the vector width of the target computer system.
Vector Width-Aware Synchronization-Elision For Vector Processors
Benedict R. Gaster - Santa Cruz CA, US Lee W. Howes - Santa Clara CA, US
International Classification:
G06F 9/45 G06F 15/76
US Classification:
717148, 717146, 712 7, 712E09001
Abstract:
A medium, method, and apparatus are disclosed for eliding superfluous function invocations in a vector-processing environment. A compiler receives program code comprising a width-contingent invocation of a function. The compiler creates a width-specific executable version of the program code by determining a vector width of a target computer system and omitting the function from the width-specific executable if the vector width meets one or more criteria. For example, the compiler may omit the function call if the vector width is greater than a minimum size.
Lee W. HOWES - Santa Clara CA, US Benedict R. Gaster - Santa Cruz CA, US Michael C. Houston - Cupertino CA, US Michael Mantor - Orlando FL, US Mark Leather - Los Gatos CA, US Norman Rubin - Cambridge MA, US Brian D. Emberling - San Mateo CA, US
Assignee:
Advanced Micro Devices, Inc. - Sunnyvale CA
International Classification:
G06F 9/46
US Classification:
718102
Abstract:
Method, system, and computer program product embodiments for synchronizing workitems on one or more processors are disclosed. The embodiments include executing a barrier skip instruction by a first workitem from the group, and responsive to the executed barrier skip instruction, reconfiguring a barrier to synchronize other workitems from the group in a plurality of points in a sequence without requiring the first workitem to reach the barrier in any of the plurality of points.
Lee W. Howes - Austin TX, US Benedict R. Gaster - Santa Cruz CA, US Michael Clair Houston - Cupertino CA, US Michael Mantor - Orlando FL, US
Assignee:
Advanced Micro Devices, Inc. - Sunnyvale CA
International Classification:
G06F 9/46
US Classification:
719318
Abstract:
A system, method, and computer program product are provided for improving resource utilization of multithreaded applications. Rather than requiring threads to block while waiting for data from a channel or requiring context switching to minimize blocking, the techniques disclosed herein provide an event-driven approach to launch kernels only when needed to perform operations on channel data, and then terminate in order to free resources. These operations are handled efficiently in hardware, but are flexible enough to be implemented in all manner of programming models.
Abstracting Scratch Pad Memories As Distributed Arrays
Benedict R. Gaster - Santa Cruz CA, US Lee W. Howes - Austin TX, US
Assignee:
Advanced Micro Devices, Inc. - Sunnyvale CA
International Classification:
G06F 12/02
US Classification:
711170, 711E12002
Abstract:
In a computing system, memory may be managed by using a distributed array, which is a global set of local memory regions. A segment in the distributed array is allocated and is bound to a physical memory region. The segment is used by a workgroup in a dispatched data parallel kernel, wherein a workgroup includes one or more work items. When the distributed array is declared, parameters of the distributed array may be defined. The parameters may include an indication whether the distributed array is persistent (data written to the distributed array during one parallel dispatch is accessible by work items in a subsequent dispatch) or an indication whether the distributed array is shared (nested kernels may access the distributed array). The segment may be deallocated after it has been used.
Method And System For Synchronization Of Workitems With Divergent Control Flow
Benedict R. Gaster - Santa Cruz CA, US Lee W. Howes - Santa Clara CA, US Michael Mantor - Orlando FL, US Dominik Behr - San Jose CA, US
Assignee:
Advanced Micro Devices, Inc. - Sunnyvale CA
International Classification:
G06F 9/52
US Classification:
718102
Abstract:
Disclosed methods, systems, and computer program products embodiments include synchronizing a group of workitems on a processor by storing a respective program counter associated with each of the workitems, selecting at least one first workitem from the group for execution, and executing the selected at least one first workitem on the processor. The selecting is based upon the respective stored program counter associated with the at least one first workitem.
Heterogeneous Parallel Primitives Programming Model
With the success of programming models such as OpenCL and CUDA, heterogeneous computing platforms are becoming mainstream. However, these heterogeneous systems are low-level, not composable, and their behavior is often implementation defined even for standardized programming models. In contrast, the method and system embodiments for the heterogeneous parallel primitives (HPP) programming model disclosed herein provide a flexible and composable programming platform that guarantees behavior even in the case of developing high-performance code.
Heterogeneous Parallel Primitives Programming Model
- Sunnyvale CA, US Lee W. Howes - Sunnyvale CA, US
Assignee:
Advanced Micro Devices, Inc. - Sunnyvale CA
International Classification:
G06F 9/50
Abstract:
With the success of programming models such as OpenCL and CUDA, heterogeneous computing platforms are becoming mainstream. However, these heterogeneous systems are low-level, not composable, and their behavior is often implementation defined even for standardized programming models. In contrast, the method and system embodiments for the heterogeneous parallel primitives (HPP) programming model disclosed herein provide a flexible and composable programming platform that guarantees behavior even in the case of developing high-performance code.
Resumes
Facebook Representative To C++ Standards Committee
Iso C Wg21
Facebook Representative To C++ Standards Committee
Facebook
Software Engineer
Qualcomm Jul 2013 - Oct 2015
Senior Staff Engineer
Amd Jan 2013 - Jul 2013
Senior Mts, Heterogeneous System Software
Amd Oct 2010 - Dec 2012
Mts, Heterogeneous System Software
Education:
Imperial College London 2005 - 2009
Doctorates, Doctor of Philosophy, Philosophy
Dyson School of Design Engineering 2001 - 2005
Masters, Master of Engineering
Tiffin School
Skills:
Opencl C++ Algorithms Gpu Computer Architecture High Performance Computing Software Development Gpgpu Graphics Hardware Compilers Processors Programming Parallel Programming Debugging Parallel Computing Cuda Standardization Software Design Patterns Multithreading Device Drivers
Hi Welcome To My Google Page, Feel Free To Browse, If You Know Me, Add, Follow And Like Me And BITS n BYTES PCs Computers Services In Weston Super Mare Using The Links. I Was Born In Birmingham A Looo...