Puneet S Zaroo from Los Altos, CA, age 45

Us Patents

Online Computation Of Cache Occupancy And Performance
view source
US Patent:

20100095300, Apr 15, 2010
Filed:

Oct 14, 2008
Appl. No.:

12/251108
Inventors:

Richard West - Easton MA, US
Puneet Zaroo - Santa Clara CA, US
Carl A. Waldspurger - Palo Alto CA, US
Xiao Zhang - Rochester NY, US
Haoqiang Zheng - Sunnyvale CA, US
Assignee:

VMWARE, INC. - Palo Alto CA
International Classification:

G06F 9/50
G06F 9/46
US Classification:

718104, 718102, 709226
Abstract:

Methods, computer programs, and systems for managing thread performance in a computing environment based on cache occupancy are provided. In one embodiment, a computer implemented method assigns a thread performance counter to threads being created to measure the number of cache misses for the threads. The thread performance counter is deduced in one embodiment based on performance counters associated with each core in a processor. The method further calculates a self-thread value as the change in the thread performance counter of a given thread during a predetermined period, and an other-thread value as the sum of all the changes in the thread performance counters for all threads except for the given thread. Further, the method estimates a cache occupancy for the given thread based on a previous occupancy for the given thread, and the calculated shelf-thread and other-thread values. The estimated cache occupancy is used to assign computing environment resources to the given thread. In another embodiment, cache miss-rate curves are constructed for a thread to help analyze performance tradeoffs when changing cache allocations of the threads in the system.

Thread Compensation For Microarchitectural Contention
view source
US Patent:

20110055479, Mar 3, 2011
Filed:

Aug 28, 2009
Appl. No.:

12/550193
Inventors:

Richard West - Easton MA, US
Puneet Zaroo - Santa Clara CA, US
Carl A. Waldspurger - Palo Alto CA, US
Xiao Zhang - Rochester NY, US
Assignee:

VMware, Inc. - Palo Alto CA
International Classification:

G06F 9/46
G06F 12/08
US Classification:

711118, 718103, 711E12017
Abstract:

A thread (or other resource consumer) is compensated for contention for system resources in a computer system having at least one processor core, a last level cache (LLC), and a main memory. In one embodiment, at each descheduling event of the thread following an execution interval, an effective CPU time is determined. The execution interval is a period of time during which the thread is being executed on the central processing unit (CPU) between scheduling events. The effective CPU time is a portion of the execution interval that excludes delays caused by contention for microarchitectural resources, such as time spent repopulating lines from the LLC that were evicted by other threads. The thread may be compensated for microarchitectural contention by increasing its scheduling priority based on the effective CPU time.

Cache Performance Prediction And Scheduling On Commodity Processors With Shared Caches
view source
US Patent:

20110231857, Sep 22, 2011
Filed:

Mar 19, 2010
Appl. No.:

12/727705
Inventors:

Puneet Zaroo - Santa Clara CA, US
Richard West - Easton MA, US
Carl A. Waldspurger - Palo Alto CA, US
Xiao Zhang - Rochester NY, US
Assignee:

VMWARE, INC. - Palo Alto CA
International Classification:

G06F 9/50
G06F 9/46
US Classification:

718104, 718102
Abstract:

A method is described for scheduling in an intelligent manner a plurality of threads on a processor having a plurality of cores and a shared last level cache (LLC). In the method, a first and second scenario having a corresponding first and second combination of threads are identified. The cache occupancies of each of the threads for each of the scenarios are predicted. The predicted cache occupancies being a representation of an amount of the LLC that each of the threads would occupy when running with the other threads on the processor according to the particular scenario. One of the scenarios is identified that results in the least objectionable impacts on all threads, the least objectionable impacts taking into account the impact resulting from the predicted cache occupancies. Finally, a scheduling decision is made according to the one of the scenarios that results in the least objectionable impacts.

Efficient Online Construction Of Miss Rate Curves
view source
US Patent:

20120117299, May 10, 2012
Filed:

Nov 9, 2010
Appl. No.:

12/942765
Inventors:

Carl A. WALDSPURGER - Palo Alto CA, US
Rajesh VENKATASUBRAMANIAN - San Jose CA, US
Alexander Thomas GARTHWAITE - Beverly MA, US
Yury BASKAKOV - Newton MA, US
Puneet ZAROO - Santa Clara CA, US
Assignee:

VMWARE, INC. - Palo Alto CA
International Classification:

G06F 12/12
G06F 12/10
US Classification:

711 6, 711160, 711E12059, 711E12071
Abstract:

Miss rate curves are constructed in a resource-efficient manner so that they can be constructed and memory management decisions can be made while the workloads are running. The resource-efficient technique includes the steps of selecting a subset of memory pages for the workload, maintaining a least recently used (LRU) data structure for the selected memory pages, detecting accesses to the selected memory pages and updating the LRU data structure in response to the detected accesses, and generating data for constructing a miss-rate curve for the workload using the LRU data structure. After a memory page is accessed, the memory page may be left untraced for a period of time, after which the memory page is retraced.

Numa-Based Client Placement
view source
US Patent:

20190213647, Jul 11, 2019
Filed:

Mar 13, 2019
Appl. No.:

16/352131
Inventors:

- Palo Alto CA, US
Puneet Zaroo - Santa Clara CA, US
Ganesha Shanmuganathan - Santa Clara CA, US
International Classification:

G06Q 30/02
H04L 29/08
G06F 1/3209
G06F 9/50
G06F 11/30
H04L 29/06
G06F 1/26
G06Q 50/06
Abstract:

A management server and method for performing resource management operations in a distributed computer system takes into account information regarding multi-processor memory architectures of host computers of the distributed computer system, including information regarding Non-Uniform Memory Access (NUMA) architectures of at least some of the host computers, to make a placement recommendation to place a client in one of the host computers.

Adaptive Cpu Numa Scheduling
view source
US Patent:

20190205155, Jul 4, 2019
Filed:

Mar 5, 2019
Appl. No.:

16/292502
Inventors:

- Palo Alto CA, US
Haoqiang Zheng - Cupertino CA, US
Rajesh Venkatasubramanian - San Jose CA, US
Puneet Zaroo - Santa Clara CA, US
International Classification:

G06F 9/455
G06F 9/48
Abstract:

Systems and methods for performing selection of non-uniform memory access (NUMA) nodes for mapping of virtual central processing unit (vCPU) operations to physical processors are provided. A CPU scheduler evaluates the latency between various candidate processors and the memory associated with the vCPU, and the size of the working set of the associated memory, and the vCPU scheduler selects an optimal processor for execution of a vCPU based on the expected memory access latency and the characteristics of the vCPU and the processors. The systems and methods further provide for monitoring system characteristics and rescheduling the vCPUs when other placements provide improved performance and efficiency.

Numa Scheduling Using Inter-Vcpu Memory Access Estimation
view source
US Patent:

20170031819, Feb 2, 2017
Filed:

Oct 10, 2016
Appl. No.:

15/289893
Inventors:

- Palo Alto CA, US
Puneet ZAROO - Santa Clara CA, US
Alexandre MILOUCHEV - Uccle, BE
International Classification:

G06F 12/08
G06F 9/50
G06F 9/455
G06F 12/0811
G06F 12/084
Abstract:

In a system having non-uniform memory access architecture, with a plurality of nodes, memory access by entities such as virtual CPUs is estimated by invalidating a selected sub-set of memory units, and then detecting and compiling access statistics, for example by counting the page faults that arise when any virtual CPU accesses an invalidated memory unit. The entities, or pairs of entities, may then be migrated or otherwise co-located on the node for which they have greatest memory locality.

Interference-Based Client Placement Using Dynamic Weights
view source
US Patent:

20160380907, Dec 29, 2016
Filed:

Jun 29, 2015
Appl. No.:

14/754409
Inventors:

- Palo Alto CA, US
Madhuri Yechuri - Palo Alto CA, US
Kalyan Saladi - Sunnyvale CA, US
Sahan Gamage - Redwood City CA, US
Puneet Zaroo - Santa Clara CA, US
Assignee:

VMWARE, INC. - Palo Alto CA
International Classification:

H04L 12/911
H04L 29/08
Abstract:

A management server and method for performing resource management operations in a distributed computer system utilizes interference scores for clients executing different workloads, including a client to be placed in the distributed computer system, as utilization values of resources, which are assigned continuously variable weights to produce weighted resource utilization values. The weighted resource utilization values are used to generate overall selection scores for host computers of the distributed compute system, which are then used to recommend a target host computer among the host computers of the distributed computer system to place the client.

Resumes

Senior Software Engineer, Big Data Platform

view source

Location:

575 Woodhams Rd, Santa Clara, CA 95051

Industry:

Computer Software

Work:

Netflix
Senior Software Engineer, Big Data Platform
Zerostack Oct 2014 - May 2017
Founding Engineer
Vmware Jun 2004 - Oct 2014
Staff Engineer
Netapp Jul 2003 - Dec 2003
Engineering Intern

Education:

Purdue University 2001 - 2004
Master of Science, Masters, Computer Science Indian Institute of Technology, Delhi 1997 - 2001

Skills:

Distributed Systems
Linux
Virtualization
Operating Systems
Software Engineering
Kernel
Algorithms
Cloud Computing
Scalability
High Performance Computing
Mapreduce
Software Project Management
Machine Learning
Computer Architecture
Perl
Python
Architecture
Technical Leadership
Vmware Esx
High Availability
Start Ups

Languages:

English