Decision Optimization with GPU acceleration: In conversation with the NVIDIA cuOpt team

How can GPUs improve decision optimization workflows? In what ways will solving optimization problems change? How does this change the way technology leaders think about their AI strategies? We spoke with the NVIDIA cuOpt team to find out.

NVIDIA cuOpt is an open source, GPU-accelerated decision optimization engine. Nextmv co-founders Carolyn Mooney and Ryan O’Neil sat down with Burcin Bozkaya, NVIDIA Senior Developer Relations Manager, and Preethi Maulik, NVIDIA Product Manager, to explore what optimization could look like beyond CPU-based computing.

The following is a companion interview to a longer, related conversation that has been edited for length and clarity.

Carolyn Mooney: Let’s talk about the lay of the land regarding NVIDIA's investment and interest in decision optimization. Where did it start? What can you tell us about the past year? 

Preethi Maulik: It started about four years ago when there was a pandemic crunch and a lot of supply chain friction. We decided to take a sliver of the scope for vehicle routing problems, specifically, and try that on the GPU. We parallelized local search, heuristics, and meta heuristics for the VRP problem, and that was an immensely successful program. Our partners and customers widely endorse it, and it also holds 23 world records. With that success behind us, we decided to expand the scope for optimization.

We moved on to linear programming. We launched PDLP on the GPU, which can solve very large NP relaxations for lower precision solutions. We also launched the GPU-accelerated IPM, which is an interior point method for higher precision solutions. Then we moved on to MIP. We have a GPU-accelerated primal heuristics for MIP, and the goal is to find super fast, feasible solutions rather than optimality. It works very well for customers and use cases that want to integrate with LLM and run what-ifs. We launched our very first quadratic solver as well. It’s still early, but it’s good for small-scale programs, and it shows how we expanded our scope from VRP to LP to MIP, and now QP.

Earlier this year, with guidance from Jensen Huang, NVIDIA CEO, we decided to open source cuOpt. We had launched several early access programs, and the demand was beyond what we could actually manage, so we decided to open source it. Our primary goal is to create a vibrant ecosystem. For GPU-accelerated optimization, we want to broaden access to high performance decision making for all enterprises.

Carolyn: That's needed in our space too. Traditionally, there hasn't been as much of an open source community around decision optimization. You are also collaborating with a lot of the known players in this space: Gurobi, FICO Xpress, SimpleRose, HiGHS, etc.

How are you thinking about ways that NVIDIA's involvement can improve the modeling and solver experience that people are currently using today and accelerate not only with cuOpt, but with GPU in general?

Burcin Bozkaya: It's always been our philosophy to work with ecosystem partners and collaborators to implement this GPU technology in many diverse ways. This is an area where the GPU acceleration and decision optimization ecosystem was not quite there, to the extent that we wanted it to work with the partners you mentioned, and many more. We want to give them the technology, give them technical support, and give them open source cuOpt so there are good examples of how to implement decision optimization solvers using the CUDA language, primitives, etc.

NVIDIA’s role is to help the ecosystem innovate more and come up with new ways of solving problems. In the past 12 months, we’ve seen partners and the optimization community developing all new algorithms, then implementing and adding them to their software stack. This adds to the notion that GPU acceleration helps solve complex problems better and faster. We now have many different ways to explore alternative solutions, right in the solution exploration process.

In that sense, it helps model builders and problem solvers evaluate millions, and even trillions of solutions, in short amounts of time. Platforms like Nextmv, and others, facilitate this entire process. With the innovation that will come from NVIDIA, as well as our optimization partners, and the compute that is available these days, there's a lot of opportunity to solve better and faster.

Carolyn: Real power comes from acting on that information. It's not just about predictions, it's about making decisions. At this stage, how should business and technology leaders think about investing in decision optimization, given that it's part of their AI portfolio and given the much better access to GPU? 

Burcin: A lot of businesses and organizations are already well versed with using GPUs in machine learning tasks, AI, GenAI, and large language models. They stay on the predictive side of things, like you said. The real actions will come from prescribed solutions that come from solving optimization models. It's fair to say that we imagine a world where these two things come together, especially with the help of GenAI and LLMs working in conjunction with optimization models as agents and tools. Decision makers, technology strategists, and organizational leaders will find it very worthwhile to tap into their AI portfolios and look for ways to combine with optimization.

Preethi: We've seen customers actually use this loop, the combination of prediction and prescription. For example, they predict the travel times and then cuOpt will optimize the routes. They predict a forecast load and cuOpt will dispatch the best power. We’ve also seen them predict retail demand and then cuOpt will optimize the fulfillment processes. It grounds the reality of what ML can give you with that prescriptive action. 

Carolyn: We've seen that as well. We call them decision workflows. They tie together multiple technologies and steps in a process that may have been really manual before. Previously, they may have had these models running independently, but tying them into a more automated system can be super powerful. Your investment in cuOpt as open source expands the knowledge rate of the technology and exposure to decision optimization in general.

Ryan O'Neil: Two things stand out to me: First, what was available in open source for optimization was not nearly as high quality as it is today. There's been massive acceleration over the last few years with cuOpt coming out, and HiGHS and SCIP, among others. Second, the conventional wisdom was that you didn't do optimization on GPUs. Open sourcing cuOpt has accelerated both domains at the same time. How did you learn from other open source projects that were going on at the time? 

Burcin: NVIDIA has open sourced a lot of different libraries before in ML, CUDA-X, etc. There's a lot of interest in the cuOpt open source repo in terms of forks, clones, and use cases. At the same time, we’re making cuOpt source code that requires GPUs, so we would like folks to take cuOpt and build on it, customize it, and add new features. It lives and thrives if there's an active community contributing and a well-established entity to support that. cuOpt is now also part of the COIN-OR repository. We expect the community to interact, contribute, innovate, and take advantage of the source code.

Carolyn: Ryan, what are the pieces where Nextmv can change or enhance the building, testing, and deploying of these apps? What does that look like on Nextmv and other platforms, as well as the service layer?

Ryan: We have a situation where we're developing models in a modeling tool typically, and we need infrastructure and orchestration around it. There’s a dynamic shift that's happening now. Building optimization has moved from supercomputers to commodity hardware. You target the CPU you have in your laptop and just solve models there. 

One thing that has happened from the deep learning and GenAI boom is that we've learned there's an appetite for having high-powered compute that you just have access to. We're moving that into other domains, and we're now building optimization models for your super computer chip. I can have a super computer chip in my desktop or in my server rack, but since it's not just the core CPU that comes with everybody's machine, there’s some non-trivial complexity to build the model and put it into a production environment. Every optimization model benefits from the application architecture that goes around it, as well as any kind of microservice or application that you put into a system like this. You have model management, observability, configuration, versioning, etc. The ability to do the GPU acceleration instantaneously is also a big win. 

Carolyn: One of the most interesting aspects of decision optimization is where you have five or six different models that are all built with cuOpt, at any given time. They could all have different configurations or different objective terms, and they could all be valid. You could be a large-scale supply chain company and have different configurations or objectives for different geos because they're running campaigns that are trying to prioritize different things in their business.

Ryan: A lot of the people that we talk to are already using linear programming and other optimization technologies, and they're applying them on CPU architecture. As they adopt GPUs, what are the biggest gains that they should expect to see? What does this unlock for them that they can't do today? 

Preethi: I'll start with the VRP solver. We believe cuOpt is undoubtedly the best VRP solver out there and is what we provide to enterprises. What we've seen working with enterprises is tremendous speed up. We benchmark VRP at about 100x compared to a CPU solver. You have massive acceleration, massive scale, and a high quality solution. You have a rich set of constraints like pick-up, delivery, breaks, time windows, etc. All of this enables a business to make decisions dynamically. Optimization runs can last several minutes to hours, but now you can make a decision at the speed of business. One of our partners reduced their batch size from 45 minutes to under two minutes.

You can also enable dynamic re-optimization. If your fleet or your order changes, you can just run it again and get the most updated plan. We've seen enormous enterprise benefits from the VRP solver, even though MIP is the most popular one.

Burcin: We've seen similar acceleration benefits with our LP and MIP solvers as well, and we try to go with some of the common benchmarks out there. For example, in the world of LP, we always test against what's called “minimum benchmark problems”. These are a selected set of problems, small and large. With PDLP, we've seen it speed up to 1,000x or even 3,000x in some problems, but on average, it’s typically lower than that. PDLP is very much suited for GPU acceleration, by nature of the algorithm. We also added a barrier solver interior point method on GPU recently and compared to some open source CPU solvers out there, as well as some of the commercial barrier solvers. It showed speed up to 17x faster on average, and probably around eight or nine times faster with the barrier solver.

The nice thing about cuOpt is it actually runs all these solvers concurrently. It races them like the LP solvers, as well as different components of the MIP solver, so you get whatever works best for the type of problem that you're dealing with. On the MIP side of things, we are still working on adding features and some key components to the actual branch and cut procedures. We're actually writing up a technical blog to share some of those results.

Carolyn: Can you highlight some of the algorithmic pieces that go into the solver? Outside of VRP, what types of problems are people addressing with the LP and MIP side of this?

Preethi: Typically, most of the problems are resource planning, cost optimization, and production planning. We've also seen a lot of interest in scheduling and facility location. These are common use cases with LP and MIP. For VRP, the domain is super diverse – from retail to manufacturing to waste collection to warehouse routing robotics, integration with Omniverse, integration with LLM – routing cuts across everywhere. We're also starting to see renewed interest from aviation.

Carolyn: You touched on robotics. NVIDIA has simulation tooling as well with Isaac. My background is mostly in simulation, so I always love to see new applications and how they’re being utilized across the ecosystem. 

Burcin: Energy is another one where there's a lot of scenario analysis, pricing, unit commitment, and different versions of the model being solved over and over. The acceleration definitely helps a lot in that sense.

Ryan: In the future, we'll be using GPUs to optimize the energy creation and consumption for GPU data centers as well. It’s very meta. 

You mentioned what’s getting unlocked by being able to have shorter solve times. Carolyn and I both came from more classical environments where you'd optimize for hours or even days, but we were moving into an environment where we were optimizing every 25 seconds. The only thing that's the same is the math and literally everything else is different. The solver is different, the approach is different, and the way the business operates is completely different.

Preethi: Over the course of cuOpt scoping, we found that most folks did not have a need for optimality. That's how traditional optimization has always worked; very exacting, very scientific. There are some use cases that are specifically built for optimality, but for most, they’re comfortable with a very good, feasible solution because they do not want to trade off the speed. That's a finding that we learned from the ecosystem, and then we built on top of that. 

Carolyn: Where should community members go to get hands on? You've created this open source ecosystem, sample apps, tutorials, benchmarks, ways to contribute, etc. What are you all looking for from the community as we move forward?

Burcin: We have a lot of resources out there. We have two repos: One for the actual cuOpt source code and another with an extensive set of examples for using cuOpt in various scenarios and use cases. We have an extensive documentation page. We’re always looking for contributions from the community as well, so we have a discussion forum on our main repo page. We invite the community to challenge us, our engineers, and developers with algorithmic questions and feature requests. We want to make this a very active collaboration with the entire community so that we can keep innovating.

Check out the techtalk recording for the full interview to hear about how NVIDIA GPU-accelerated decision optimization has elevated the conversation around decision intelligence.

Video by:
No items found.