Using Google’s Cloud Vision Product Search to Drive eCommerce Sales

Customers expect significant convenience while shopping online. Product search is the key to conversion and customer loyalty. Text for product search has been the norm. Now customers can search for products by uploading an image and finding closest matches on the eCommerce site. Our article focuses on image-based product search using Google Cloud’s Vision API Product Search. After introducing an image-based product search, we will explain how Cloud Vision API Product Search works and share our experience in using it.

What does Google Cloud Vision API Product Search do?

When a user uploads an image for a product search, the API applies machine learning to compare the product in the user’s query image with the images in the retailer’s product set, and then returns a ranked list of visually and semantically similar results.

Vision Product Search runs on pre-trained machine learning models. These models were trained on different features of an image, such as color, shape, and pattern. The images with similar features are clustered to form a homogeneous set. When a user searches for an image, the API extracts features of the uploaded image and identifies the cluster it most likely belongs to. It then identifies the most similar product visuals of the input image from the cluster based on a similarity index.

The output from the API includes the best matching products for an image with the scores and the matching images. The matching products are ranked by matching scores in descending order. Scores range from 0 (no confidence) to 1 (full confidence).

Currently, Vision API Product Search supports only some product categories in Retail, including home goods, apparel, toys, packaged goods, and General.

How to Work with the API?

Create Input Dataset

The input dataset consists of a product set(s), multiple products belonging to the product set, and reference images associated with those products.

A product set contains multiple products and can be set as a product set ID for a product category. Further products can be added to the product set using a product ID (or a display name) and the product category. There are optional fields available for products such as product descriptions and labels. Labels are key/value pairs that describe the product, such as color=black or style=mens. Labels are useful to narrow down the product search to the specific user requirements.

A reference image is added after a product is created. Reference images are various views of a product. The metadata of a reference image includes a product ID, a category, and bounding box details. It is possible to add multiple reference images for one product representing different viewpoints. These images are stored in a Google Cloud Storage bucket. There are recommendations on how to pick the best quality reference images to establish a good training set.

The input dataset can be created in a single step using batch import. The product set is indexed before it is queried using Vision Product Search.

Run the Search Engine

The search for matching products can be initiated by supplying an image and a product set ID. The search can be controlled using various fields, including:

  • maximum number of matching images to be returned,
  • the product category to search in, and
  • product-specific labels such as color or style to look for.

These fields, along with search image and product set ID, can be supplied as a query to the API. The user request is saved in a JSON file, and the file is processed using a curl command.

Vision API Product Search flow illustration

Search Results

The output from the API is displayed in the form of matching products for the entire image. The response of the search results is recorded in a JSON file. Along with the matches, you get a whole array of information like the score of matching, the labels of the matched products, and performance metrics like how much time did the API takes to come up with these matches.

In the case of a multi-product image, the API shows results grouped by matching items for each product identified in the image. The API creates bounding boxes for the entire image and one box each around one or more products in the same image to facilitate a multi-product image search.

Solution Architecture

The Vision Product Search API, along with other Google Cloud services, can be used to build a complete solution for the product search use case for an eCommerce company. The following technical architecture presents details about the Google Cloud components involved, their interaction with each other, and the flow of data from one point to another:

Vision AI solution architecture


Vision Product Search pricing is based on monthly usage for querying the model and storing images on Google Cloud Storage. As of May 2020, the charges for prediction per 1000 images are $4.50 for images up to 5Mn and $1.80 for images up to 20Mn. The storage charges are uniform, with $0.10 per 1000 images. More details on pricing can be found here.

How is the Matching Accuracy of Vision Product Search?

For illustrating Google Cloud Vision API Product Search’s capabilities, we executed the API on a dataset of Fashion Product Images available on Kaggle. The dataset contains product images, product categories, and multiple attributes describing the product. The trained dataset consists of 10,000 images from various product categories, including apparel, shoes, and bags. The data distribution is 60% top wear, 6% bottom wear, 27% shoes, and 7% bags.

Shown below are results of product searches on various categories. Showing only the top three matches along with the scores:

Showing how Vision AI matches to an image search

 As seen from the results, the API has captured products visually similar to the product in the queried image. The output products closely match to the input image in terms of their pattern, style, and color. The API uses the part of the image where the product is placed to a get good confidence scores.

It is also observed that there should be a good number of reference images of high resolution for each product category to get good confidence scores. It is also good to provide multiple reference images for one product representing different viewpoints. For example, for a shoe there could be images with one shoe, or images with the pair.

How is Google Vision API Product Search in comparison to Other similar services?

There are many other players in the market, offering visual product search capabilities. Google has the edge over others in providing visual search specific to retail categories. There are open-source approaches that require a good theoretical understanding and takes time to implement. A host of retailers, including ASOS, Wayfair, Neiman Marcus, Argos, and IKEA, have all built proprietary visual search tools.

Google Vision Product Search is well placed among others having many advantages:

  • Simple and easy to implement architecture.
  • High speed search engine in which results are out in few nanoseconds. Also, low setup time.
  • Maintenance of product catalog is convenient and easy.
  • Flexible search criteria to control the output.

There are certain limitations of the API. The model may not give good results on novel images as it is pre-trained. Also, currently, it supports limited product categories in retail.

What are the applications of Vision API Product Search in the eCommerce industry?

The API allows retailers to suggest visually related items to shoppers, which can be difficult to search using a text query alone.

Retailers can compare price/customer reviews/availability/left in stock of images in their product catalog with similar images from other eCommerce platforms.

Retailers can update their product catalog with high-resolution images and capture multiple viewpoints of an image by searching similar high-resolution images available on other eCommerce platforms.

In the case of multiple sellers of a given product on an eCommerce platform, Vision Product Search, along with Predictive modeling, can help analyze how the type of image can have an impact on sales.


Supriya Garg, Senior Manager, Core Compete

Rohit Ramgire, Analyst, Core Compete