Fishing in data lakes to derive valuable business insights

By October 26, 2018Uncategorized

Following EMC’s research, it is anticipated that by the year 2020, the amount of data in our digital universe is expected to grow from 4.4 trillion GB in 2013 to 44 trillion GB. This could mean that the ever increasing pool of invaluable data will give rise to more opportunities for organisations to understand customer experience, derive actionable insights and generate higher value from vast quantities of accumulated data.

To cater to ever increasing needs for data storage, enterprise data storage facilities have undergone a technological shift – from data warehouses to data lakes – which proves attractive to organisations because of its increased computing power and data storage capacity.

With the explosion of the concept of data lakes during the past five years, it is important to carefully observe how enterprises are going to store data from disparate data sources, and also ensure the same data storage facilities will not end up as data dumping grounds that lead to siloing of data.

An interesting feature of data lakes is that it acts as a reservoir of valuable data for enterprises. This enables rapid ingestion of data in its native format even before data structures and the business requirements have been defined for its use.

The value that is generated for enterprises lies in – having access to this vast amount of data from disparate data sources, the ability to discover insights, visualisation of  large volumes of data and also, importantly – the ability to perform analytics on top of this data. All of this used to be more complicated both, to acquire such vast quantities of as well as to perform the above mentioned functions.

It should be noted that analytics cannot be derived from raw data alone. Data first needs to be integrated, cleansed, transformed, metadata managed and properly governed. This way data lakes can be harnessed for control of data from disparate sources, in diverse formats to correlate – thus resulting in business insights that increases value to market.

Praedictio Data Lake, engineered by Mitra Innovation provides a competitive advantage with its comprehensive data lake solution consisting of data cataloguing, visualizations of data in the data lake, ETL (extract, transform and load) and Data Governance. The solution is built on AWS technologies such as S3 as the data storage technology, AWS Glue for data cataloging and ETL and AWS Quicksight for insight generation.

Praedictio leverages the inherent advantages of AWS Analytics capabilities such as the non-requirement of server management (because AWS takes care of heavy lifting work of server deployment and migration), pay as you use model where users do not have to pay upfront fees or annual fees, and scalability where usage can scale from few users to tens of thousands of users. The most attractive feature of Analytics is SPICE (Super-fast, Parallel, In-memory Calculation Engine) where users can run interactive queries of large data sets and extract rapid responses.

In addition to supporting analytics for previous data, Praedictio also delivers predictive analytics using machine learning technologies.

Machine learning enables the advantage of shorter times to receive faster insights. By way of leveraging AWS Machine Learning capabilities, Praedictio eliminates the barriers of using machine learning for developers and provides easy access to trained models.

Nevertheless, it is noteworthy to keep in mind – the importance of data management within a data lake. If not properly cataloged and governed, the opportunity for deriving business insights would be far less.

Follow us as we explore the newest frontiers in ICT innovation, and we apply such technologies to solving real world problems faced by enterprises, organisations and individuals. Thank you so much for reading! 🙂

Kalani Samarawickrame

Senior Software Engineer | Mitra Innovation

Leave a Reply