Transformation prompts new working partnerships alongside technical advances. Jannik Podlesny, Vista DnA’s Principal for Data Governance, Architecture and Technology, and Michał Zasadziński, Data Platform Engineering Lead for parent company Cimpress are exploring GPU technology against advanced analytics models and processing algorithms.
The power of parallel
A paradigm shift is underway. The potential of graphics processing units (GPUs) to accelerate computing in analytical use cases recalls another big moment – the impact of large-scale in-memory applications a decade ago. It’s a concept most of our current database technology relies on; SAP HANA, AWS Redshift, Redis and Snowflake included.
GPUs are already powering success of all kinds. A universal touchpoint comes from genome sequencing, which took a record-breaking five hours and two minutes on 16 March 2021, cutting turnaround from around two weeks. Not every task can be reimagined using GPU computation chips to perform rapid calculations – the requirements for massive parallelization have to be in place. And the scope and impact of GPU capacity will vary by industry, as a McKinsey forecast notes. But the opportunity to speed up data science 10-50x, curb the cost of ever-expanding data processing, and improve efficiency stands tall.
A unique viewpoint
For an expanding, technology-led e-commerce and manufacturing business like Vista, and the wider company, Cimpress, unlocking value through GPUs represents an exciting, necessary challenge. Both entities focus on leveraging mass customization via cutting-edge data and analytics – with the DnA team at Vista launched in 2020 to enable just this. Both organizations have embraced data mesh architecture.
So, while we won’t be the only teams examining the GPU trend, we might chart new waters scoping capability at such a scale – with a decentralized data mesh at the core of our stack. We have 300 DnA colleagues processing several petabytes of data across domains. Alongside a data-driven culture, mining technology to solve now-and-next business questions makes us tick. It’s a journey we’ll be sharing as our learnings unfold.
The search for applications
There’s a key thought to bear in mind when it comes to use cases. A lot of analytical queries can be accelerated through GPU technology, yes. But is it worth it?
Use cases that involve underlying combinatorial maths problems can be rewarding to assess. Right now, we’re reviewing archetypes of queries against performance statistics from the last 90 days. Around 25% of 23.7 million queries are ‘group and count’ activities that satisfy MapReduce principles of shared memory, making them ideal candidates for GPU acceleration.
The ‘effort gap’ matters too. It’s one thing to calculate how much money you can save, but if you need to educate hundreds of data engineers over a year to implement new technology, the value may not add up. So, there’s a balance between time spent on training versus efficiency. With available GPU frameworks like cuDF based on CUDA, engineers are familiar with the pandas environment and can lean on methods and functions they know to leverage GPU advantages.
GPU acceleration in action
- CUDA-based GPU (CuPy) versus Numerical Python (NumPy)
- CPU versus GPU – the trade off between small and large data frames
- GPU-accelerated quasi-identifier discovery in high-dimensional data
Realizing GPU acceleration
If parallel processing is a fit, GPU technology has such scope for improving advanced analytics models and processing algorithms of use cases, that it’s impossible to see the bottom or top. So, you – and we – need to get strategic about where speed gains and decreased costs will be.
Each of our domains offers a deep dive of options. From A/B testing and dynamic pricing scenarios to analyzing Vista customer interest in real-time to faster, more reliable ways of finding quasi-identifiers or personally identifiable (PII) data, to enhanced real-time call center analysis, e-commerce steering, and warehouse production insights: an almost endless list. We’ll serve and circle back to the needs of both internal and external customers as we experiment. They’re our road map.
Are you leveraging GPUs? Connect with us on LinkedIn to share your progress. Or does DnA sound like a good fit for you? Check out Vista careers.