Richard West - Easton MA, US Puneet Zaroo - Santa Clara CA, US Carl A. Waldspurger - Palo Alto CA, US Xiao Zhang - Rochester NY, US Haoqiang Zheng - Sunnyvale CA, US
Assignee:
VMWARE, INC. - Palo Alto CA
International Classification:
G06F 9/50 G06F 9/46
US Classification:
718104, 718102, 709226
Abstract:
Methods, computer programs, and systems for managing thread performance in a computing environment based on cache occupancy are provided. In one embodiment, a computer implemented method assigns a thread performance counter to threads being created to measure the number of cache misses for the threads. The thread performance counter is deduced in one embodiment based on performance counters associated with each core in a processor. The method further calculates a self-thread value as the change in the thread performance counter of a given thread during a predetermined period, and an other-thread value as the sum of all the changes in the thread performance counters for all threads except for the given thread. Further, the method estimates a cache occupancy for the given thread based on a previous occupancy for the given thread, and the calculated shelf-thread and other-thread values. The estimated cache occupancy is used to assign computing environment resources to the given thread. In another embodiment, cache miss-rate curves are constructed for a thread to help analyze performance tradeoffs when changing cache allocations of the threads in the system.
Thread Compensation For Microarchitectural Contention
Richard West - Easton MA, US Puneet Zaroo - Santa Clara CA, US Carl A. Waldspurger - Palo Alto CA, US Xiao Zhang - Rochester NY, US
Assignee:
VMware, Inc. - Palo Alto CA
International Classification:
G06F 9/46 G06F 12/08
US Classification:
711118, 718103, 711E12017
Abstract:
A thread (or other resource consumer) is compensated for contention for system resources in a computer system having at least one processor core, a last level cache (LLC), and a main memory. In one embodiment, at each descheduling event of the thread following an execution interval, an effective CPU time is determined. The execution interval is a period of time during which the thread is being executed on the central processing unit (CPU) between scheduling events. The effective CPU time is a portion of the execution interval that excludes delays caused by contention for microarchitectural resources, such as time spent repopulating lines from the LLC that were evicted by other threads. The thread may be compensated for microarchitectural contention by increasing its scheduling priority based on the effective CPU time.
Cache Performance Prediction And Scheduling On Commodity Processors With Shared Caches
Puneet Zaroo - Santa Clara CA, US Richard West - Easton MA, US Carl A. Waldspurger - Palo Alto CA, US Xiao Zhang - Rochester NY, US
Assignee:
VMWARE, INC. - Palo Alto CA
International Classification:
G06F 9/50 G06F 9/46
US Classification:
718104, 718102
Abstract:
A method is described for scheduling in an intelligent manner a plurality of threads on a processor having a plurality of cores and a shared last level cache (LLC). In the method, a first and second scenario having a corresponding first and second combination of threads are identified. The cache occupancies of each of the threads for each of the scenarios are predicted. The predicted cache occupancies being a representation of an amount of the LLC that each of the threads would occupy when running with the other threads on the processor according to the particular scenario. One of the scenarios is identified that results in the least objectionable impacts on all threads, the least objectionable impacts taking into account the impact resulting from the predicted cache occupancies. Finally, a scheduling decision is made according to the one of the scenarios that results in the least objectionable impacts.
Carl A. WALDSPURGER - Palo Alto CA, US Rajesh VENKATASUBRAMANIAN - San Jose CA, US Alexander Thomas GARTHWAITE - Beverly MA, US Yury BASKAKOV - Newton MA, US Puneet ZAROO - Santa Clara CA, US
Assignee:
VMWARE, INC. - Palo Alto CA
International Classification:
G06F 12/12 G06F 12/10
US Classification:
711 6, 711160, 711E12059, 711E12071
Abstract:
Miss rate curves are constructed in a resource-efficient manner so that they can be constructed and memory management decisions can be made while the workloads are running. The resource-efficient technique includes the steps of selecting a subset of memory pages for the workload, maintaining a least recently used (LRU) data structure for the selected memory pages, detecting accesses to the selected memory pages and updating the LRU data structure in response to the detected accesses, and generating data for constructing a miss-rate curve for the workload using the LRU data structure. After a memory page is accessed, the memory page may be left untraced for a period of time, after which the memory page is retraced.
A management server and method for performing resource management operations in a distributed computer system takes into account information regarding multi-processor memory architectures of host computers of the distributed computer system, including information regarding Non-Uniform Memory Access (NUMA) architectures of at least some of the host computers, to make a placement recommendation to place a client in one of the host computers.
- Palo Alto CA, US Haoqiang Zheng - Cupertino CA, US Rajesh Venkatasubramanian - San Jose CA, US Puneet Zaroo - Santa Clara CA, US
International Classification:
G06F 9/455 G06F 9/48
Abstract:
Systems and methods for performing selection of non-uniform memory access (NUMA) nodes for mapping of virtual central processing unit (vCPU) operations to physical processors are provided. A CPU scheduler evaluates the latency between various candidate processors and the memory associated with the vCPU, and the size of the working set of the associated memory, and the vCPU scheduler selects an optimal processor for execution of a vCPU based on the expected memory access latency and the characteristics of the vCPU and the processors. The systems and methods further provide for monitoring system characteristics and rescheduling the vCPUs when other placements provide improved performance and efficiency.
Numa Scheduling Using Inter-Vcpu Memory Access Estimation
In a system having non-uniform memory access architecture, with a plurality of nodes, memory access by entities such as virtual CPUs is estimated by invalidating a selected sub-set of memory units, and then detecting and compiling access statistics, for example by counting the page faults that arise when any virtual CPU accesses an invalidated memory unit. The entities, or pairs of entities, may then be migrated or otherwise co-located on the node for which they have greatest memory locality.
Interference-Based Client Placement Using Dynamic Weights
- Palo Alto CA, US Madhuri Yechuri - Palo Alto CA, US Kalyan Saladi - Sunnyvale CA, US Sahan Gamage - Redwood City CA, US Puneet Zaroo - Santa Clara CA, US
Assignee:
VMWARE, INC. - Palo Alto CA
International Classification:
H04L 12/911 H04L 29/08
Abstract:
A management server and method for performing resource management operations in a distributed computer system utilizes interference scores for clients executing different workloads, including a client to be placed in the distributed computer system, as utilization values of resources, which are assigned continuously variable weights to produce weighted resource utilization values. The weighted resource utilization values are used to generate overall selection scores for host computers of the distributed compute system, which are then used to recommend a target host computer among the host computers of the distributed computer system to place the client.
Netflix
Senior Software Engineer, Big Data Platform
Zerostack Oct 2014 - May 2017
Founding Engineer
Vmware Jun 2004 - Oct 2014
Staff Engineer
Netapp Jul 2003 - Dec 2003
Engineering Intern
Education:
Purdue University 2001 - 2004
Master of Science, Masters, Computer Science
Indian Institute of Technology, Delhi 1997 - 2001
Skills:
Distributed Systems Linux Virtualization Operating Systems Software Engineering Kernel Algorithms Cloud Computing Scalability High Performance Computing Mapreduce Software Project Management Machine Learning Computer Architecture Perl Python Architecture Technical Leadership Vmware Esx High Availability Start Ups
Languages:
English
Youtube
Shrihaan making Rayaan race a car
Duration:
1m 48s
Sai Bhawan kids
Duration:
30s
Khanmaej koor by kids
Duration:
23s
Bouncing Back from the Rock Bottom | Puneet G...
Mr. Puneet is the founder & CEO of Astrotalk, one of the fastest-growi...
Duration:
14m 33s
Gang of Dusters member talks about his experi...
Mr. Puneet Zaroo, a #GangOfDusters member residing in Delhi talks abou...
Duration:
2m 48s
What is not cool about digital pathology with...
As much as I love Digital Pathology - things that are not always perfe...
Duration:
29m 6s
Scaling Venmo Applications for Growth With Ze...
Venmo had 44% YoY TPV growth in 2021. Trafc growth and new product fea...