Accurate data on a powerful platform is a key enabler for campaign messaging.

SalesPond provide sales enablement, marketing services and data services to SMB & enterprise clients across Asia Pacific. SalesPond are headquartered in Sydney with offices across Asia. SalesPond seeks to provide inside sales capabilities for clients who wish to have campaigning outcomes driven by quality leads and targeted conversations. They wanted to enhance their capabilities and amplify the commercialisation opportunities from their datasets. SalesPond sought to achieve this by looking to improve the level of accuracy in regards to the search for the right audience, for the right message, at the right time.

The challenges

Standardisation of attributes across multiple sources

Multiple datasets, with millions of contact and organisation records, not surprisingly have various data formats, layouts and data quality challenges. Standardising and consolidating attributes according to a set of business rules is a monumental challenge

Regular ingest

The various input datasets, update on a weekly cycle or faster. The solution must be able to handle the change data capture without impacting other functions. As input size grows over time, the ingest process must be able to scale to have the most accurate data ready as soon as possible without system constraints.


A well performing, cost effective data store which supports slicing and dicing data for current and future campaigns is paramount. Filters can include geographical location, company size, industries etc. and should always reflect upon the latest record. This must scale across the millions of records to support users.

Our solution

BigQuery in the centre

In order to analyse the datasets across multiple sources, the data obtained from each source had to be analysed. We decided to use GCP’s enterprise warehouse – BigQuery since it is highly scalable, offers outstanding price/performance and helped gather insights across these datasets quickly.

Ingest, Enrichment, Analysis:

An ingestion pipeline was built using Compute Engine which hosts Open Source Talend along with native python scripts for staging and merging the updated data with the existing data sets. Historical archiving is also provided. After analyzing each of the datasets, they are standardised and made consistent according to the business rules in a BigQuery table. In addition to this, the country, state and city attributes is enriched and standardised making use of Google Geocode API. After standardising the datasets, it is inserted back to Cloud SQL for use by the Datalist front end CRM tool. For delivering faster query results, only the active records are made available on BigQuery  table which the customer queries to fetch the latest datasets for running their weekly marketing campaigns.

The platform design:

The Datalist platform was designed using the following GCP components:

  • Compute Engine 
  • Cloud SQL (MySQL)
  • BigQuery
  • Cloud Storage
  • Cloud Source Repository
  • Cloud IAM
  • Stackdriver

Security and privacy concerns are paramount when dealing with contact details. Access to the GCP resources were restricted through IAM, where there is clear lines of delineation between those who have roles with access to data, and those who have roles who access computing resources.