Case Study

Client:

NVIDIA

Project:

Cloud-based GPU Management Platform

Machine Learning Used to Accelerate Product Engineering

Problem:

Client’s existing solution at the time consisted only of standalone hardware with a command line interface.

Challenges:

Get a proof-of-concept off the ground quickly to evaluate the viability of the product idea. Then, implement the learnings from the PoC to build a portal for managing the deep learning hardware. Although each hardware device has its own CLI interface, there was no way for customers to manage multiple deployments from a single, central console. The cloud based solution brought into focus a host of other challenges that they hadn’t considered initially – like multi-tenancy, reliability, scalability, security, etc.

Solutions:

Tresbu built a secure, scalable, reliable, multi-tenant, AWS-based portal which is used by customers to manage deep learning hardware. The hardware gets deployed in the customers’ own data centers, while the management portal running on the public cloud provided a browser based GUI.

The GUI had a unified multi-tier dashboard – to view the resource (CPU, GPU, RAM, HDD) utilization – both collectively and individually of all the deployed hardware. It allowed customers’ users to add/remove docker containers via the GUI. It also had an admin dashboard to add/remove tenants; view basic analytics of the deployed hardware; and perform other housekeeping functions.

Impact:

The proof of concept was completed in 4 weeks accelerating final solution delivery and allowing internal and external teams to build, test, and deploy new products with greater efficiency.