Cost-Benefit of GPUs for Data and Machine Learning
When I heard the term GPUs as early as 4 years ago my mind immediately turned to computer graphics and video games. Little did I know that GPUs had found their way into other fields including data and machine learning. GPUs started allowing machine learning (ML) researchers to run experiments in minutes which took hours or days before. They have enabled various breakthroughs by allowing users to solve problems that required large amounts of computing which were not available or feasible prior.
Still, GPUs are not (yet) magical and general purpose. Many of us struggle to understand, use, and build a business case for GPUs. Over the last few years, I have been lucky to work with (and learn from) NVIDIA AI and various Google Cloud customers who are leading the charge in GPU adoption and value generation. When assessing the cost-benefit of GPUs for data and machine learning you can bucket various factors into 3 groups.
- Fit & Learning Curve: Are GPUs a good fit for your task? What steps in your workflow can they help with? How much skill, setup, and potential redesign investment is required to get GPUs to work? What software is available to make GPUs easier to leverage?
- Cost: What type of GPUs fulfill your needs? How long does your task take to run (with and without GPUs)? What is the price per unit of time? What is the overall cost per training experiment or prediction?
- Value: Most organizations and teams stop at (1) and (2) and don’t consider various value elements GPUs can influence. But GPUs are much more than an IT cost line item. It is important to understand both direct and indirect value generated from adding GPUs into your system processes and architecture. These can include employee productivity gains, improved end user experience, accelerated time to market and direct revenue impact.
I dive into a bit more detail on each topic below.
Fit & Learning Curve
The first step of course is to understand if GPUs are the right fit for your team and tasks. Below are a few considerations:
- Data Volume: The primary goal of GPUs (in our data science world) is to accelerate data and machine learning tasks. If your datasets are still small and take seconds and minutes to complete, GPUs may not be as impactful. Even so, you can build a data and ML strategy (i.e. save and acquire more data, build a data lake, pilot data science & ML projects) which can plan to make use of GPUs a few quarters and years out.
- Tasks: What is the bottleneck of your data pipeline? Are your teams exploring large deep learning models or simpler linear models? Is your end goal to derive insights from data or ship machine learning features into production as part of your application? GPUs are a no brainer for deep learning on big data(both training and serving) but are slowly evolving to accelerate traditional ML, SQL and data processing tasks as well. It is important to map your tasks to what GPUs are good at and set realistic expectations. The slowest step of your data pipeline is not always something GPUs can assist with and may prevent you from unlocking the full potential of GPUs later in your pipeline.
- Skills & Usability: Even if you have big data and identified the right tasks to leverage GPUs the biggest barrier may be the the skills required. GPUs require brand new skills for lower-level programming (i.e. CUDA) or extended knowledge to effectively use them with existing tools. Even if your code runs on GPUs it is most likely not efficient and optimal on the first try. Understanding how to optimize GPU utilization for your tasks is a non-trivial job. I am however most optimistic about this as GPU usability continues to improve and the breadth of tasks GPUs support continues to grow. This section alone could be it’s own blog so I’ll highlight just a few of the reasons to be bullish on GPU accessibility: (1) Nvidia’s investment in a growing GPU developer tools ecosystem including CUDA, RAPIDS, TensorRT, and NGC (2) increasing list of projects supporting GPUs natively including Spark 3.0, TensorFlow, PyTorch, XGBoost and Kubeflow.
Okay, at this point you have the right data, tasks and skills for GPUs. But aren’t GPUs expensive? Let’s take a look at cost.
To begin with, yes, GPUs are more expensive than CPUs. That is expected for something that provides more cores, higher throughput, higher degrees of parallel processing and concurrent operations. The key is to understand the cost once you normalize for factors such as task performance and time. In addition, cloud allows you to try various types of GPUs and only pay for the time you are using them. You may use CPUs 24/7 but GPUs for a few hours a day. GPUs no longer require capital investment in hardware for physical servers (or expensive research workstations).
- Types of GPUs: Depending on your data volume, model size, task, and system architecture there are various GPUs in the market to meet your needs across various price points. For training large computer vision or language models Nvidia’s V100s and new A100s will give you the horse power but also cost the most. If you are looking to serve these models in production as part of your application P4s and T4s will do the trick. If you want to leverage GPUs on Jupyter notebooks for various data and algorithms tasks start with K80s and T4s which are the lowest cost option on most clouds today. Check out all the Nvidia GPUs available on Google Cloud HERE.
- CPU & RAM requirements: Even with GPUs you need a server with CPUs and RAM. You are not fully replacing CPUs with GPUs but simply replacing a subset of tasks CPUs would need to perform. The key is understanding how much you can reduce your CPU and RAM needs by adding GPUs to the server. This will require trial and error and some knowledge of measuring CPU/GPU utilization and RAM usage. In this example from Nvidia, they were able to reduce vCPU cores and RAM by half (64vCPU 416GB RAM → 32vCPU & 208GB RAM) by adding 4 V100 GPUs to each node in the cluster.
- Price-Performance: These above choices with some experimentation allow you to create a normalized price-performance comparison per experiment for training or per prediction for serving. In the training scenario, GPUs may cost $5 per minute vs. $1 for CPUs but run an experiment 10 times faster resulting in a 2x cost reduction. It is good practice to be as specific as possible when defining an experiment for comparison (i.e. time, translated to cost, for a training job to reach 80% accuracy).
If GPUs are cheaper and faster per experiment (or per prediction) your business case to start using GPUs has written itself!
But what if in the example prior ($5 vs. $1 per minute) GPUs ran 3 times faster (not 10). Cost per experiment actually increases ~1.7x but performance is 3x faster. Is the 3x speed worth the 1.7x cost? That will be up to you to decide and leads to our 3rd, and often ignored, consideration: Value generated. Below are some ways to think about and ideally quantify value driven by GPUs.
- Productivity Gains: How many more experiments can your practitioners run in a day/week/month? How much are you paying your practitioners to sit and wait for their algorithms and jobs to complete? If you have dedicated teams of machine learning researchers and data scientists your tolerance for slow experiments is much lower than a company with a few data analysts running weekly batch jobs. Every company’s tolerance will lie within this spectrum but the key is to be aware of the opportunity cost of not being able to run many more experiments per hour/day/month/quarter/year.
- Accelerated Time to Market: The end goal of faster experiments includes launching a product or feature faster to market. Allowing your organization to run experiments faster creates a compounding effect of trying new ideas, failing fast, and reaching a breakthrough outcome. This requires thinking about the broader system and capturing data that you may not have today (i.e. avg. time from idea to launch, frequency of ML model refresh, # of ML models in production).
- Improved End User Experience: How much faster can you serve predictions to your end-users in your application? How much does an extra 100ms of latency mean to your customers? You may have developed a more accurate model but your CPU-based serving infrastructure can only serve it at 500ms of latency due to its larger size. The gains from the improved predictive power may be offset by the impacted latency. Various studies have shown how increased latency can result in lost traffic and sales. GPUs provide you the ability to deploy large, complex and more accurate models without sacrificing latency hits.
- Direct revenue impact: Is there a direct impact on conversion? churn? engagement? This is of course the holy grail. Something every leader asks their team. What is the top line impact?! GPUs may cost more during the experimentation phase but the improved models developed can return much more value once activated in production.
- The barrier to entry for learning and using GPUs continues to get lower. Your favorite data and ML tools today may already support GPUs.
- While you should always be skeptical of the GPU benchmarks you see online (50x faster!?!), with the advent of cloud you have no reason not to try.
- Even if GPUs are costlier don’t forget to evaluate the value they may return tenfold.
Use the framework discussed above to assess the cost-benefit of GPUs on your own data and experiments. You may be surprised with the results.
Have a question or want to chat? Find me on Twitter