AWS native data lake with Redshift reporting
About Company

SocietyOne is Australia’s first and leading marketplace lender. They find investors looking to get great returns, then through their lending platform they match investors’ funds with high quality borrowers, who are looking to get a better, and fairer, deal than they’d get from the big banks.

The Challenge
The Challenge
The lending platform had evolved over time and is currently split between existing loans, in the legacy platform, and new loans being issued in the current platform (Salesforce). Additional challenges presented by the source system changes presented opportunities to increase data quality as the data grows, helping to create a future proof platform on a modern data architecture.
Servian were engaged to perform a data strategy to implement a data lake and resolve the data quality problems that existed between the two lending platforms.
After the data strategy was complete, Servian continued with the design and implementation of an AWS-native data lake, using Servian’s Serverless Data Lake framework. Emphasising the importance of data governance, the framework facilitates data validation using agreed metadata definitions for each and every file ingested.
The Solution
The Solution
The solution delivers a governed data lake ingesting multiple source datasets and historicising all records for full traceability, plus consolidated and centralised data marts for improved reporting and visibility.
The Data Lake consists of 3 layers:
- Landing : bucket that accepts any inbound data
- Raw Deltas : bucket holding record changes
- Conformed : bucket containing all time-variant data
An Ingestion Framework automatically processes an incoming data file, and if it’s content meets the predefined structure, it is ingested and historicised into a type-2 time-variant form. Where appropriate, the framework creates Glue Catalog entries and Athena artefacts within the lake layers for subsequent querying.
To capitalise on these governed data assets, the solution incorporates a Redshift instance containing subject-oriented Data Marts (e.g. Finance) that hold curated snapshots derived from the Data Lake. The data lake Conformed layer is also exposed to Redshift Spectrum enabling complete transparency across raw and transformed data in a single place.
A Transformation Framework executes Redshift statements, that load conformed data into metadata defined tables, making use of Step Functions for the orchestration of more complex sequences.

Serverless Data Lake Framework
Serverless Data Lake Framework
Servian’s Serverless Data Lake Framework is AWS native and ingests data from a landing S3-bucket through to type-2 conformed history objects – all within the S3 data lake.
The framework operates within a single Lambda function, and once a source file is landed, the data is immediately ingested (CloudWatch triggered) to time-variant form as parquet files in S3.
The framework function also generates the table definitions in AWS Glue Catalog on the raw and time-variant formats thus enabling direct querying on S3 objects using AWS Athena query engine. In fact, the framework utilises this feature to historicise the incoming data through the layers of the lake.
The Serverless Data Lake Framework components are deployed using AWS CloudFormation scripts thereby rapidly providing the capability to ingest data to time-variant form within a Data Lake.

Data Lake : S3, Glue Catalog, Amazon Athena, CodeCommit, Lambda, CloudFormation, API Gateway, CloudWatch, KMS
Transformation Framework
Transformation Framework
The eventual objective for SocietyOne was to establish a trusted, centralised and secure data warehouse in the cloud to service a myriad of data services to business users as well as customers.
The consolidation of inbound data, through a governed data lake, into Redshift provided a central location for reporting, analytics and data sharing. Exploiting the versatility of the data lake further, a Transformation Framework delivered the ability to load Redshift data models directly from the lake. To facilitate this, metadata-defined ETL logic, kept in CodeCommit offered full traceability and lineage of data manipulations.
Data Marts: Lambda, Redshift, Spectrum, Step Functions, CodeCommit, VPC Endpoints, CloudFormation
Delivery Framework
Delivery Framework
The Data Strategy engagement focused attention on how, and why, data is used as it is, and what strategic goals were desirable but not yet possible. Scrutinising their data assets in such a way, emphasised both the available growth areas in the current landscape, as well as highlighting the benefits and savings of a new data platform.
A project plan defining scope of major tasks and milestones was combined with a kanban-style task board for providing transparency of team activities. A Detailed Solution Design acted as the control document covering the following aspects:
- Architectural decisions
- Cloud environment
- Network design
- Security & access
- Metadata management
- Data governance
- Disaster recovery
- Cost estimates
- Data acquisition
- Data ingestion
- Data transformation
- Audit & logging
The Benefits
Improved Data Quality
Improved Data Quality
Increased data quality by detecting data changes over time, handling metadata changes, and capturing discrepancies for audit purposes.
Financial Reporting
Financial Reporting
A more holistic perspective for Finance reporting on both legacy and current lending platforms.
Increased Auditability
Increased Auditability
SocietyOne now have a stronger audit capability and can report on all of the historical information as it changes over time
Why Servian
We drive a competitive advantage for our customers by enabling them to become truly data driven. We help organisations design and implement robust enterprise data management strategies and data platforms that ensure the security, accuracy, and reliability of their data. Our services in data and analytics span across advisory, consulting and managed services.