Join Chirag Lalwani, Senior Data Scientist at Vista DnA, to explore how he and his team changed the way customers discover new, relevant designs on the Vista website.
The biggest validation of our designs is when a customer intends to use them. However, there is always a dilemma whether customers get to choose the design that best suits their needs or the best of the first options presented. In other words, did customers find what they were looking for?
After all, it is known that top-ranked designs tend to get more visibility and clicks irrespective of their relevance to the customer. However, this approach didn’t allow other, more relevant designs to surface on top pages.
As a part of a global mass customization platform, Vista data scientists took it upon themselves to help the customers access the designs that most closely represent what they had in mind.
In that light, we have been migrating our website to a new platform to make the customer journey more efficient. As a part of this change, the idea was to build a better solution for customers looking to try out pre-designed templated using Vista’s editor.
Instead of ranking the top-rated and most used templates, the new approach involved ranking designs more dynamically to help customers find more relevant templates in a comparatively shorter duration.
While designing this product, we also had to ensure that shifts in ranking don’t negatively impact the customer experience over time.
Here’s the entire story of how we came up with a solution.
Where We Started: the Challenges
A standard customer journey on the new Vista website would be a customer searching for the desired product and then landing on the product design gallery page. Here, the customer will have the choice to view various design options, edit the selected design in the studio, and finally review the design before placing an order.
Even with the improved website design, we found that there was substantial scope for improving the findability of designs on the gallery page by capturing the customer’s needs and showing the right designs.
The challenge with this approach was the legacy data which was biased toward the designs that had been ranking higher, leaving little room for fresh designs to appear on the top search results. Thus, we had to get over this “cold start”– exposing new products to customers with no historical interactions to learn from.
The data presented a mess to be structured: is it only the user, is it only product properties, can it be a combination of both? Can it also have time dependency? Is the atomicity of data on a product level, search level, or transaction level? From what end dimensions are we deriving the solution?
To generate these insights, we assembled a cross-functional team of:
- Product Owner: bringing value-driven knowledge and connectors of stakeholders v/s product
- Data Engineer: building a robust infrastructure and pipelines to enable the solution
- Data Analyst: catering to the analysis of different metrics, root causes, and reporting
- Data Scientists: formulating & building the scalable ranking solution
The team brainstormed the approach toward the ideal solution and defined:
- How is ranking going to handle new designs?
- How do we handle new customers?
- Is ranking going to change based on user action?
- Is ranking directly going to target online metrics along with offline measurements?
As opposed to the general trend of directly jumping into a machine learning model, we did an in-depth analysis of historical data. With this analysis, we understood that the current ranking influences customer decisions to a significant extent, allowing the scope of exploration.
Steps We Took to Build a Ranking Solution
In order to remove the position bias, we implemented a randomization strategy where we bumped up a few of the less exposed designs. This not only increased the exposure of designs to users and helped the new ranking function with a slow removal of the ‘cold start’ from live designed ranking.
The intuition behind this approach is that it will not disrupt any ranking strategy. Instead, it will give enough to users randomly to try from less exposed regions. These regions will eventually become a feedback loop to our ranking model.
The two parts of this idea are:
- Moving less exposed designs
- How will the designs be picked up?
- How will the designs be placed?
- Moving high-exposed designs
We built a probabilistic approach accompanied by analytics to pick strategic designs from less exposed regions and place them in higher exposed regions for designated moving intervals. At the same time, the movement of high exposed design is handled by our new ranking function penalizing and boosting designs with some proprietary factors.
Ignition for Ranking Solution
The very base of this solution was to have the historical gallery ranking data and the customer feedback in the right dimensions to build the product. Data engineers at Vista DnA built the robust pipeline so as to pull regularly updated data using dbt, Databricks, and Snowflake, scheduled in Airflow, managed through Terraform, and integrated with normalized multiple upstream gallery data modules.
There was no pre-existing data for ranking, which made us collectively build the raw data. This data constitutes design-level properties, including the vicinity of designs and time-dependent customer-wide engagement details like clicks, studio edits, and final orders.
Approaching the Options to Optimize Design Ranking
Ranking could be approached in two ways:
- Optimizing the online metrics: Ranking strategy and model optimizing directly for business and end-metrics like conversion rate, click-through rate, etc. Also named business metrics.
- Optimizing the offline metrics: Ranking strategy and model optimizing for measuring model performance constituting proximities, rating, diversity, etc.
While it is generally seen that offline metrics are correlated with online business metrics, it is certainly not so proven in every environment and thus is always a debatable topic. Some even claim that narrowing improvement to offline metrics can sometimes affect customer experience, thus derailing from business metrics in the short term.
In our case, the challenge was to tackle this by combining and simulating offline metrics in such a way as to inculcate the online business metrics in it, thus intuitively building the offline metrics to predict the business metrics indirectly.
What and How are We Targeting?
This is the most critical part of the model, involving many decisions. In the absence of a final label to target, we formalized one. Of course, one can move towards an unsupervised approach as well, but ranking without supervision doesn’t add up.
For offline metrics, there are two parts –
- Ground Truth or Relevancy of data for supervised evaluation
- Mechanism to compare the ranking strategy with ground truth
As it is generally called in ranking, “Relevancy” of design to customer; We intuitively hypothesized to include the following as components of relevancy –
- Engagement: How well the customer engages with design through clicks, edit operations, viewing it, purchasing it, etc.
- Findability: Metrics that indicate how easily the customer progresses through the interaction journey.
- Exposure to new designs
Our approach to defining the target proxy to actual relevancy through the above metrics was using the following logic:
- Higher the engagement of customers towards the design, the better the rank.
- Lower the findability, penalize the rank and give X factor boost to new designs by slowly reducing it as it is aging down.
This proxy for relevance acts as our ground truth and becomes our target variable.
We used a few standard mechanisms on top of some basic analytics, such as:
- Data parsing
- Feature selections
- Outlier detection and removal
- Handling missing data, and
We used feature scaling to understand the skewness and sparsity in the dataset– relationships between outlier and missing data values. However, we applied the following mechanisms to cater to more subjective requirements:
- We have categorical designs for user properties which are treated as categorical and thus formulated into one-hot encoded vectors.
- Others which are again categorical only but can be indexed and intuitively understood as ordinal. For example, the color intensity, which has values ‘normal’, ‘high,’ and ‘low,’ are categories of intensity but define the order of intensity prominently.
- Continuous variables are scaled to defined scope.
- Then there are properties that are free texts like design descriptions. We leveraged the pre-trained BERT model to extract the embeddings and thus reduce its dimension to treat them as scaled continuous variables, intuitively representing the semantic and context of free text design properties.
- In ranking and recommendation algorithms, data is skewed organically as expected from consumer behavior, and that skewness becomes high when ranking is prone to selection bias. Our continuous target variable – score for Relevancy, thus has high right skewness. We applied transformation methods to balance it and attempted to minimize the effort by passing the kernel density of data points weight in our model.
Ranking Algorithm and Evaluation
All things considered, how does one estimate whether the ranking is ideal or worse? Since ranking is always relative and subjective. How do we measure the effectiveness of a reordered rank?
After several versions of the algorithm that teaches our search to rank designs, we built the Neural net with the TensorFlow ranking framework, optimizing how close-model approaches toward the target relevancy score, i.e., deciding the design relevancy to rank.
We then evaluated it based on “Normalized Discounted Cumulative Gain.” It is a measure of ranking quality based on how well a retrieved design template meets the need and expectations of the user.
nDCG = (DCG/IDCG)
DCG = Discounted Cumulative Gain
IDCG = Ideal Discounted Cumulative Gain
The below example illustrates the computation of nDGC score
DCG = sum (relevancy/penalty) = 7.84
IDCG = 9
nDCG = 0.871
From the above example –
The model ranked a few designs as mentioned under Ranked observation. The relevancy (second column) is ground truth relevancy. Now, relevancy needs to be discounted by position, which is what the Penalty is, thus finally computing the Discounted Cumulative Gain (DCG).
The Ideal Discounted Cumulative Gain (IDCG) is computed similar to DCG in perfect order of ground truth relevancy to get the final normalized score.
We saw a significant improvement in metrics for this hypothesis on simulation.
The Fruits of Labor
As our research and development endeavor came to an end, we had an MVP (Minimum Valuable Product) ready to be launched in a phased manner. With this approach, we gained an improvement of 15% in the algorithm’s ability to predict designs based on the parameters we had chosen to evaluate the MVP.
But before rushing into a launch, we had to understand the customer behavior. So, we did a simulation of searches to study the improvements in the position of the designs. We saw improvements in the discoverability of the design as well as how far the customer progressed in their buying journey.
But the proof of the pudding is when it works on the website in a live setting. We are in the process of launching an A/B test vs. random time split tests to gain insights and optimize the ranking further.
As our team gears up for the next iteration of our little project, we wait to see whether what we achieved in controlled conditions will reflect in the real world and lose no time to push forward our platform for a better customer experience.