Abstract:
Various systems and methods for providing variable bitrate compression for split deep neural network (DNN) computing are described herein. A system may be configured to manage a split DNN, the split DNN configured to operate on a compute system and a second system over a communication network. The system may access a performance metric; determine, based on the performance metric, a split point of the split DNN, the split point defining a head portion of the split DNN and a tail portion of the split DNN; determine, based on the performance metric, a bottleneck layer configuration for a bottleneck layer at the split point, the bottleneck layer including a bottleneck encoder and a bottleneck decoder; execute the head portion of the DNN and the bottleneck encoder on the compute system; and recurrently access an updated performance metric and determine a revised split point or a revised bottleneck layer configuration based on the updated performance metric.