Skip to main content
All Posts By

Editor

ROI in Data Science

ROI in Data Science

By Blogs

The Splurge

About five years ago, Big Data was at the height of the hype cycle, and enterprises, not wanting to be left behind, jumped in on the movement, with many investing in expensive data teams, infrastructure, and tools that they hoped would improve and increase the speed and output of their team.

Experts claimed that these investments would take years to pay off. And here we are, years later, still with billions of dollars in big data-related expenses (from employees to tools), still investing and putting faith in data teams. Yet, many enterprises still report not being able to see a return on their investment, and are left unable to prove that any of it is really worth their time and money.

Exactly a year ago a Gartner press release said, “Worldwide IT spending is projected to total $3.7 trillion in 2018…” It also went on to say:

“Looking at some of the key areas driving spending over the next few years, Gartner forecasts $2.9 trillion in new business value opportunities attributable to AI by 2021, as well as the ability to recover 6.2 billion hours of worker productivity.”

However, without hard numbers pointing to success, it’s difficult for decision-makers to continue to invest hundreds of thousands (or millions) of dollars on their latest data endeavours. Certainly, any enterprise that has a data team, is re-evaluating their productivity and return on investment (ROI). And anyone looking to spin up a new data division is obviously doing their homework first – closely analysing costs and their potential ROI before diving in.

The Challenges to Calculating ROI in Data Science

A study conducted by Forbes and McKinsey (sponsored by Teradata) states, large enterprises reported that data efforts improved company growth by just 1% to 3% on average. However, this study shows, only 37% of respondents could quantify the business case for big data analytics, while 47% couldn’t, and 9% reported having ‘no clear vision’. And yet another study by BCG estimated 20 to 30% EBITDA gains for data-driven companies. So clearly, this is something most companies struggle with today and is something that analysts themselves struggle to quantify as a whole.

The bottom line is that while most companies struggle to calculate their big data ROI, including what they should exactly be measuring to get there, larger data still proposes that the investment is worthwhile and that investing in big data and data science is worthwhile.

A Real World Example: Netflix’s Content Recommendation Engine

In February 2018, Forbes reported that Netflix credits its ‘Content Recommendation Engine’ for reducing customer churn to the tune of $1 billion annually.

Deploying a data science model into production can be an expensive and lengthy process. This project took years to develop, but the company credits it for reducing customer churn to the tune of $1 billion annually.

So Netflix is obviously seeing the ROI it wants from data science. However, it wasn’t always sunshine and rainbows for them either. In 2012, the company spent $1 million on ‘content recommendation engine improvements’ that it never even used. This was, for the most part, due to the magnitude of the engineering efforts needed to complete the project.

However, the reason why Netflix has ultimately succeeded in building this highly profitable content recommendation system was in part because they opted out of implementing this $1 million worth of code. The decision makers had a goal, which was to reduce the number of customers who unsubscribed, and knowledge of the cost vs. the potential benefits in that particular context, a 10% improvement in recommendation accuracy was, at that time, simply not worth the estimated engineering investment.

Evidence of Absence

Companies invest in data teams, infrastructures and data tools for all kinds of reasons and are pushing different projects at various stages of maturity, so it’s not exactly a cookie-cutter calculation. So where to begin?

The reality of measuring the ROI in projects and data teams, especially for data gathering tools and technologies can be exceptionally challenging due to these reasons:

It’s at many times difficult to single out the contribution of data alone to improvements, especially larger business outcomes (like lower costs, higher profit margins etc.).
The calculation is complicated because the value isn’t all in one number; it can be spread across multiple departments and teams.
Due to these reasons, measuring ROI for a data project can end up being a data project in itself, which is often difficult to justify.

Defining Success

The question shouldn’t be ‘what does ROI mean?’, but rather ‘what can it mean?’ The initial step in calculating ROI is to define what ‘success’ means to your organisation, and considering all the paths, directly and indirectly, to which data (or your data department) contributes.

Value presents itself in many different forms. So, considering all the possible paths through which data could bring success, is a part of the work involved – it’s not a uniform approach and will differ for every organisation.

More Revenue

So how is data contributing to the company’s revenue? Clearly, if the company’s product is a data product, this is obvious. Take the examples of music recognition apps like SoundHound or Musixmatch, in the absence of the data product, their earnings would be zero because the app simply wouldn’t exist. However, for products where this is not the case, there are still likely to be some less self-evident ways through which data might impact revenue, so much so that it should be strongly considered in ROI calculations.

One instance is increasing the number of customers, possibly through better-targeted marketing and sales activities thanks to predictive analytics or a machine learning model.
And it’s not just for new customers. Data science could be at the root of revenue-impacting projects that make existing customers buy more (whether it’s because of a recommendation engine or up-selling and cross-selling).
Businesses that use data effectively are poised to take some US$1.2 trillion away from their peers by 2020. So we will likely see more of them making such large-scale investments in what they hope to be cost-effective solutions that will keep them ahead of the competition.

If you want to be among the data science successes, the team here at Mitra, can help you get a bird’s-eye view of your data science strategy. Armed with the right people, the proper processes and best technology, you will be able to improve your data insight, to enable ROI-boosting decision making.

If you like to find out more about the solutions that are available and how we can help you email us at innovate@mitrai.com, or call us on 0203 908 1977. We look forward to hearing from you!

Understanding DevOps

By Blogs

– Part of our Discover DevOps series, with Anuradha Prasanna

DevOps evolved from a rising demand for digital solutions, starting around 2010 and becoming mainstream in 2015.

DORA DevOps Maturity Quadrant

I remember as a junior engineer how we initially ran the software development and operations cycle in a waterfall fashion, with lengthy lead times to delivery, shifting to agile after a few years as the need for a faster software delivery cycle grew. DevOps arrived as a ‘big bang’ in 2015, allowing software features to be written and released faster.

But that’s enough of the history!

 

To most people DevOps means Continuous Implementation and Continuous Development (CI/CD), and that is indeed what drives DevOps in an organisation when established properly. But there are other elements required for the DevOps journey to become sustainable and cohesive within an IT organisation in the longer run. We can categorise these into three key areas:

  1. Processes
  2. People
  3. Culture

These three key elements must maintain a healthy environment for the CI/CD to run effectively and deliver business value to our software application customers.

Processes must be simple, supportive, effective and geared around agile planning and delivery to dynamically prioritise and manoeuvre software development towards fulfilling customer demands.

In addition, people must be agile and customer focused, with complementary skillsets to deliver real value to the customer while achieving rapid delivery.

The organisation, business units and team culture should support this by maintaining a healthy environment with boundary-free communications and the freedom and empowerment to innovate. Building teams, recognising their excellence and providing strong, supportive leadership is crucial.

DevOps teams act as one to achieve common goals, and that is to deliver fully functional software and features to the customer, faster!

With organisational processes designed with this objective in mind and teams built and led to succeed, this concept and the IT culture behind it becomes an enabler for all, making CI/CD effective in:

  1. Planning and prioritising the business requirements to be implemented
  2. Implementing business features in an agile manner
  3. Automatically testing the new features and resolving any issues
  4. Automatic deployment based on quality gates being passed
  5. Monitoring and gathering live health statistics
  6. Fewer issues in the live environment
  7. Bottlenecks and bugs found are prioritised and reported back to the requirements and planning process

Below shows a high-level list of items that must be leveraged by the CI/CD process:

  • Automated application deployment and promotion to next environments
  • Integrated automated testing in all deployment environments
  • Utilisation of test virtualisation (mock services and mock data) to shift-left testing
  • Orchestration of complex deployments for applications that span platforms (data, mobile, backend, middleware)
  • Manage slow-paced and fast-paced deployment dependencies (2 Speed IT)
  • Write the CI/CD pipelines as code (PaC)
  • Automated infrastructure management and provisioning with infrastructure as code (IaC)

In summary, this means development of high-quality software that works in production can be delivered to the customer faster than before. The effectiveness of the delivered software can be validated quickly, with a rapid turnaround for any course corrections.

Anuradha Prasanna
Enterprise architect | Mitra Innovation

DORA Quadrant for DevOps

DORA Quadrant for DevOps

By Blogs

Anuradha Prasanna, Enterprise architect | Mitra Innovation
– Part of our Discover DevOps series

DevOps Research and Assessment or DORA is a research program that investigates the capabilities that drive software delivery performance and stability in organisations across the globe. One of the useful analytical tools DORA provides is the DORA Quadrant. This tool determines the DevOps maturity of an organisation.

DORA DevOps Maturity Quadrant

The DORA Quadrant uses four key measurements:

1. Deployment Frequency

For the primary application or service that you work on, how often does your organisation deploy code to production or release it to end-users?

 

2. Lead Time for Changes

For the primary application or service that you work on, what is the lead time for changes i.e., how long does it take to go from code being committed to code successfully running in production?

 

3. Time to Restore Service

For the primary application or service that you work on, how long does it generally take to restore service when an incident that impacts users occurs i.e., an unplanned outage or service impairment?

 

4. Change Failure Rate

For the primary application or service that you work on, what percentage of changes to production or releases to users, result in a service impairment or outage, subsequently requiring remediation, e.g., a hotfix, rollback, fix forward or patch?

 

DevOps Performance
| by Deployment frequency and Lead time for changes

 

Application Stability
| by Time to restore service and Change failure rate

 

So, based on the deployment frequency and the lead time for changes, we can determine how effectively DevOps is performing, and by looking at the time to restore and the change failure rate, we can understand the stability of the application. 

Using DORA you can also compare your application or organisation with their annual, global studies. 

You can see here a summary of the study DORA has done in 2019:

source: DORA Accelerate State of DevOps 2019 report

Source: DORA Accelerate State of DevOps 2019 report

By comparing your application/team/organisation’s rating to these industry benchmarks, you can get a clear understanding of how you are performing.

If you have multiple applications to assess, you can measure these individually deriving an average for the final quadrant.

In summary, the key measurements above allow you to apply industry recognised metrics to your application development and deployment, assessing the quality of your application code and the maturity of your DevOps process.

This is a great tool for any organisation to assess current development and deployment processes and track performance against set internal targets and those of the industry.

 

Get in touch with us

To learn how Mitra Innovation Experts can help you, call us for a free consult at
0203 908 1977 or email us at innovate@mitrai.com. We look forward to hearing from you.

Cellcard choose Mitra

By News

Mitra Innovation, a fast-growing UK Technology solutions provider specialising in digital transformation, cloud enablement, and software development has been chosen as the technology partner by Cambodia’s fastest, fully integrated communications company, Cellcard. This move is set to benefit their three million plus subscriber base in the Kingdom of Cambodia.

Cellcard offers a full range of mobile communications and entertainment services for consumers, with a wide range of service innovations for voice, messaging, international roaming, wireless broadband, and value-added services. Cellcard wanted to completely revamp their ‘Cellcard Self Service App’ to enhance how their customers communicate, and to provide the best possible speeds. They needed a development team to take on this task—And Mitra’s agile development team fit the bill.

Mitra’s expertise in the scrum framework, combined with their team’s natural abilities as great communicators and problem solvers provided for a very collaborative and well-coordinated team, enabling the best possible outcome for the revamp, and for every new release moving forward.

“This is a major update of the App,” said Kevin Speering, Project Manager at Mitra Innovation. “The App was re-written from the ground-up, with new architecture, and developed by Mitra’s in-house developers, introducing exciting new functionalities. Users now subscribe to various new services and ad-ons with a single swipe. Localisation was implemented in three languages: Khmer, English, and Chinese. For added security, biometric authentication (Fingerprint and Face ID) was implemented. Integrations with third parties enable entertainment services such as watching movies, TV, streaming music, and a loyalty program encouraging users with many rewards.”

Thilina Herath, Software Architect at Mitra Innovation said, “The App was imperative for Cellcard to penetrate the market quickly and stay competitive. We implemented a hybrid app development approach using a react native framework for both iOS and Android apps. We used AWS serverless architecture to build all backend components, with numerous integration points, and internal and external systems, enabling a seamless user experience for customers.”

Thilina also said, “The technologies we utilised made this task a lot easier. It reduced time to market and provided an elegant platform in which our development teams can truly innovate on. The iOS and Android apps were delivered within a two-month development cycle, with 20+ external API endpoint integrations. The App was launched in Google Play receiving a 4.3 rating out of 5, and got over 100,000 users in 4 weeks, with more than 1000 concurrent users, most of them utilising it daily.”

Dammika Ganegama, Co-Founder and Managing Director of Mitra Innovation said, “Mitra’s relationship with Cellcard has been such an opportunity where progressive thinking and persistent effort has helped in the reimagining of business in the digital age. I congratulate Cellcard on their success for being the fastest, and more convenient mobile customer experience in Cambodia today.”

Also published on Daily FT

Fishing in data lakes to derive valuable business insights

Fishing in data lakes for business insights

By Blogs

Following EMC’s research, it is anticipated that by the year 2020, the amount of data in our digital universe is expected to grow from 4.4 trillion GB in 2013 to 44 trillion GB. This could mean that the ever increasing pool of invaluable data will give rise to more opportunities for organisations to understand customer experience, derive actionable insights and generate higher value from vast quantities of accumulated data.

To cater to ever increasing needs for data storage, enterprise data storage facilities have undergone a technological shift – from data warehouses to data lakes – which proves attractive to organisations because of its increased computing power and data storage capacity.

With the explosion of the concept of data lakes during the past five years, it is important to carefully observe how enterprises are going to store data from disparate data sources, and also ensure the same data storage facilities will not end up as data dumping grounds that lead to siloing of data.

An interesting feature of data lakes is that it acts as a reservoir of valuable data for enterprises. This enables rapid ingestion of data in its native format even before data structures and the business requirements have been defined for its use.

The value that is generated for enterprises lies in – having access to this vast amount of data from disparate data sources, the ability to discover insights, visualisation of  large volumes of data and also, importantly – the ability to perform analytics on top of this data. All of this used to be more complicated both, to acquire such vast quantities of as well as to perform the above mentioned functions.

It should be noted that analytics cannot be derived from raw data alone. Data first needs to be integrated, cleansed, transformed, metadata managed and properly governed. This way data lakes can be harnessed for control of data from disparate sources, in diverse formats to correlate – thus resulting in business insights that increases value to market.

Praedictio Data Lake, engineered by Mitra Innovation provides a competitive advantage with its comprehensive data lake solution consisting of data cataloguing, visualizations of data in the data lake, ETL (extract, transform and load) and Data Governance. The solution is built on AWS technologies such as S3 as the data storage technology, AWS Glue for data cataloguing and ETL and AWS Quick sight for insight generation.

Praedictio leverages the inherent advantages of AWS Analytics capabilities such as the non-requirement of server management (because AWS takes care of heavy lifting work of server deployment and migration), pay as you use model where users do not have to pay upfront fees or annual fees, and scalability where usage can scale from few users to tens of thousands of users. The most attractive feature of Analytics is SPICE (Super-fast, Parallel, In-memory Calculation Engine) where users can run interactive queries of large data sets and extract rapid responses.

In addition to supporting analytics for previous data, Praedictio also delivers predictive analytics using machine learning technologies.

Machine learning enables the advantage of shorter times to receive faster insights. By way of leveraging AWS Machine Learning capabilities, Praedictio eliminates the barriers of using machine learning for developers and provides easy access to trained models.

Nevertheless, it is noteworthy to keep in mind – the importance of data management within a data lake. If not properly catalogued and governed, the opportunity for deriving business insights would be far less.

Follow us as we explore the newest frontiers in ICT innovation, and we apply such technologies to solving real world problems faced by enterprises, organisations and individuals!

Data lake governance

Data Lake Governance

By Blogs

A data lake is a central storage facility that houses an organisation’s structured, unstructured and semi-structured data. In most cases, the data that is ingested will be strewn all over. As a data lake accumulates data over the years, this could lead to ‘data swamping’; where users will no longer know where their data is stored or what transformations took place to the data that was ingested. Such a situation will lead to data lying in isolation, thus losing the whole point of storing data.

This is where data lake governance comes into place. Data governance is a pre-defined data management process that an organisation implements to ensure that high-quality data is available throughout the whole data life cycle. However, there is a void in semantic consistency and governance of metadata in the current implementations of data lake solutions (Gartner, 2017).

There are a number of benefits for implementing data governance within a data lake.

  • Traceability – helps understand the entire life cycle of the data residing in the data lake (this also includes metadata and lineage visibility)
  • Ownership – helps organisations to identify data owners should there be questions about the validity of data
  • Visibility –  helps data scientists swiftly and easily recognise and access the data they are looking for, amidst large volumes of structured, semi structured and unstructured data
  • Monitored health – helps ensure that data in the data lake adheres to pre-defined governance standards
  • Intuitive data search – helps users to find and ‘shop’ for data in one central location, using familiar business terms and filters, that narrow results to isolate the right data.

Praedictio Data Lake

Praedictio, an Amazon Web Services powered data lake solution developed by Mitra Innovation, offers all of the business benefits discussed as above. One of the key attractions of the Praedictio Data Lake lies in its visualisation component which features a powerful three-fold visualisation of the data lake, as follows:

– Data lineage visualisation

– Source and destination visualisation

– Graph visualisation of data in the data lake

Furthermore, Praedictio Data Lake is equipped with a dashboard component which delivers visualisation of the health of the data lake to users, along with an alerting mechanism when pre-defined thresholds are met.

Another key feature of a data lake is the ability to catalogue data; which is based on  meta data that relates to the data residing in the Data Lake. This helps users easily search for the necessary data and also helps users determine which data is fit to use—and which data needs to be discarded because it is incomplete or irrelevant to the analysis at hand. Moreover, it also shows the schema changes of the underlying data over time too.

Key Take away

Data Lakes store data in their native formats. The data structure and requirements are not defined until the data is needed. Such data in its native format is gibberish and cannot be used to derive  business insights to gain a competitive edge. This makes it important that an organisation adds policy driven processes, thus adding context to the underlying data, making it more efficiently and effectively used by the stakeholders.

Hence, it is evident that data governance policies and data cataloguing is of great importance for higher value-generation, making actionable insights and informed decisions, as well as to eliminate the current drawbacks of data silos in data lakes.

Follow us as we explore the newest frontiers in ICT innovation, and we apply such technologies to solving real world problems faced by enterprises, organisations and individuals.

7 best practices to follow when designing scalable IoT architecture using AWS IoT

Scalable architecture using AWS IoT

By Blogs

Internet of Things (IoT) systems handle billions of source devices and trillions of data points flowing into a central platform. The growth of data is massive and IoT systems are required to scale in order to manage data explosions.

Key functions such as Inbound data collection within a central platform, real-time data analytics, scalable storage space and offline analytics – are all required to scale seamlessly for an IoT solution to successfully scale. There are IoT platforms that help manage requirements such as high scalability and security, and eliminates the task of managing such challenges by business owners.

The architecture of an IoT solution largely depends on the requirements of a given system, load and the data involved and the IOT platform that is in use. Today we discuss a few of the best practices that we follow to help us achieve the desired scalability of a solution using AWS IOT.

(Video: What is AWS IoT – take a first look at AWS IoT)

1.Design to operate at scale reliably from day one

As we saw, an IoT system is expected to deal with high-velocity and high-volume data captured by device sensors. The flood of incoming data might arise because of sudden growth in business, consistent growth over time, or due to a malicious attack. In any case, the system should be geared to face this from day one.

While taking care of the scalability aspects, the IoT system should also make sure that the data received is reliably processed. The best approach is to queue or buffer data as soon as it enters the system to ensure reliability.

2. Use AWS IoT Rules to route large data volumes into the rest of AWS

Consuming data from device topics directly by a single instance (without fault tolerance) prevents systems from being leveraged to achieve full potential scalability. Furthermore, this approach limits the availability of the system. On the other hand, the AWS IoT Rules Engine is designed to connect to endpoints external to the AWS IoT Core in a scalable way.

Additionally, AWS IoT Rules Engine facilitates the triggering of multiple different actions in parallel once the data is captured by the IoT system. This gives the system the ability to fork data into multiple datastores (receiving systems) simultaneously. If the receiving system is not designed to work as a single point of entry into the system, the data must be buffered or queued into a reliable system before being sent to the target systems – to provide the system with the ability to recover in the event of a subsequent failure.

3. Invest in automated device provisioning at earliest

When a successful business grows, the number of devices connecting to the IoT system is likely to increase as well. This makes manual processes such as – device provisioning, bootstrapping software, security configurations, device registration and upgrades – not feasible anymore. Hence, minimising human interaction in the initialisation process is important to save time and increase efficiency.

Designing built-in capabilities within the device for automated provisioning and, leveraging the proper tools that AWS provides – to handle device provisioning and management – allows systems to achieve desired operational efficiencies easily and at the cost of minimal human intervention

4. Manage IoT data pipelines

The tremendous amount of data captured as part of the IoT solution will need to go through data processing channels to make information out of them. This phase of ‘processing’ might involve many system/components in the process. The path that data travels through are referred to as data pipelines. Thus, data pipelines have to be designed to handle huge loads without compromising on performance.

In addition to this, architects should keep in mind that all of the data does not necessarily require all of the processing power that the system facilitates. During the design phase, architects should determine the data pipeline for each type of data.

5. Adopt scalable architecture for custom components

Adopt scalable architecture for external components that are added into the solution, and ensure that these components do not turn into performance bottlenecks that – in turn affect the entire solution.Additionally, these components must be designed to accommodate system expansions easily.

Adopting microservice like architecture does not only embed scalability within the system, but also provides the flexibility of replacing components that could possibly affect performance.

6. Adopt multiple data storage technologies

IoT systems deal with high volume, high velocity, and large varieties of data. A single IoT system might not be able to efficiently store all of the data in one type of datastore. Architects should choose from the most appropriate datastores and suitably supporting technologies for different types of data – to achieve the desired capacity and scalability of the system by increasing efficiencies and throughput.

7. Filter data before processing

All of the data that is directed towards an IoT system may not require processing in real time and such data can be filtered out. Filtering can be facilitated by using AWS Greengrass. Architects can select data that is not required by the system immediately and design the system to accept this data in chunks – when the cloud platform demands it. This way, system capabilities are used optimally, allowing more room for scaling.

Conclusion

In the forthcoming years, IoT is expected to instrumental in managing exponential growth. In turn, as adoption of fully integrated IoT systems grow, the numbers of devices being added to systems are also expected to grow exponentially..

Thus it is important to implement systems which are scalable when required, whilst also ensuring reliability and security. AWS IoT platform is a cloud IoT platform which offers all the qualities that a modern architecture may demand.

Implementing a scalable as well as reliable system from the day one will save cost and effort from the Architects and Business owners.

Even though AWS IoT platform is designed to provide automated scalability, solution architects should ensure that they follow certain best practices to make sure that systems integrations are planned and implemented in a scalable manner – to provide a comprehensive and scalable solution.

Thank you for reading our latest Mitra Innovation blog post. We hope you found the lessons that we learned from our own startup story interesting, and you will continue to visit us for more articles in the field of computer sciences. To read more about our work please feel free to visit our blog.

Sangeetha Navaratnam
Software Architect | Mitra Innovation

IoT with AWS

By Blogs

7 best practices to follow when designing scalable IoT architecture using Amazon Web Services (AWS) IoT

 

Internet of Things (IoT) systems handle billions of source devices and trillions of data points flowing into a central platform. The growth of data is massive and IoT systems are required to scale in order to manage data explosions.

Key functions such as Inbound data collection within a central platform, real-time data analytics, scalable storage space and offline analytics –  are all required to scale seamlessly for an IoT solution to successfully scale. There are IoT platforms that help manage requirements such as high scalability and security, and eliminates the task of managing such challenges by business owners.

The architecture of an IoT solution largely depends on the requirements of a given system, load and the data involved and the IOT platform that is in use. Today we discuss a few of the best practices that we follow to help us achieve the desired scalability of a solution using AWS IOT.

1.Design to operate at scale reliably from day one

As we saw, an IoT system is expected to deal with high-velocity and high-volume data captured by device sensors. The flood of incoming data might arise because of sudden growth in business, consistent growth over time, or due to a malicious attack. In any case, the system should be geared to face this from day one.

While taking care of the scalability aspects, the IoT system should also make sure that the data received is reliably processed. The best approach is to queue or buffer data as soon as it enters the system to ensure reliability.

2. Use AWS IoT Rules to route large data volumes into the rest of AWS

Consuming data from device topics directly by a single instance (without fault tolerance) prevents systems from being leveraged to achieve full potential scalability. Furthermore, this approach limits the availability of the system. On the other hand, the AWS IoT Rules Engine is designed to connect to endpoints external to the AWS IoT Core in a scalable way.

Additionally, AWS IoT Rules Engine facilitates the triggering of multiple different actions in parallel once the data is captured by the IoT system. This gives the system the ability to fork data into multiple datastores (receiving systems) simultaneously. If the receiving system is not designed to work as a single point of entry into the system, the data must be buffered or queued into a reliable system before being sent to the target systems – to provide the system with the ability to recover in the event of a subsequent failure.

3. Invest in automated device provisioning at earliest

When a successful business grows, the number of devices connecting to the IoT system is likely to increase as well. This makes manual processes such as – device provisioning, bootstrapping software, security configurations, device registration and upgrades – not feasible anymore. Hence, minimising human interaction in the initialisation process is important to save time and increase efficiency.

Designing built-in capabilities within the device for automated provisioning and, leveraging the proper tools that AWS provides – to handle device provisioning and management – allows systems to achieve desired operational efficiencies easily and at the cost of minimal human intervention

4. Manage IoT data pipelines

The tremendous amount of data captured as part of the IoT solution will need to go through data processing channels to make information out of them. This phase of ‘processing’ might involve many system/components in the process. The path that data travels through are referred to as data pipelines.  Thus, data pipelines have to be designed to handle huge loads without compromising on performance.

In addition to this, architects should keep in mind that all of the data does not necessarily require all of the processing power that the system facilitates. During the design phase, architects should determine the data pipeline for each type of data.

5. Adopt scalable architecture for custom components

Adopt scalable architecture for external components that are added into the solution, and ensure that these components do not turn into performance bottlenecks that – in turn affect the entire solution.Additionally, these components must be designed to accommodate system expansions easily.

Adopting microservice like architecture does not only embed scalability within the system, but also provides the flexibility of replacing components that could possibly affect performance.

6. Adopt multiple data storage technologies

IoT systems deal with high volume, high velocity, and large varieties of data. A single IoT system might not be able to efficiently store all of the data in one type of datastore. Architects should choose from the most appropriate datastores and suitably supporting technologies for different types of data – to achieve the desired capacity and scalability of the system by increasing efficiencies and throughput.

7. Filter data before processing

All of the data that is directed towards an IoT system may not require processing in real time and such data can be filtered out. Filtering can be facilitated by using AWS Greengrass. Architects can select data that is not required by the system immediately and design the system to accept this data in chunks – when the cloud platform demands it. This way, system capabilities are used optimally, allowing more room for scaling.

 

Conclusion

In the forthcoming years, IoT is expected to instrumental in managing exponential growth. In turn, as adoption of fully integrated IoT systems grow, the numbers of devices being added to systems are also expected to grow exponentially..

Thus it is important to implement systems which are scalable when required, whilst also ensuring reliability and security. AWS IoT platform is a cloud IoT platform which offers all the qualities that a modern architecture may demand.

Implementing a scalable as well as reliable system from the day one will save cost and effort from the Architects and Business owners.

Even though AWS IoT platform is designed to provide automated scalability, solution architects should ensure that they follow certain best practices to make sure that systems integrations are planned and implemented in a scalable manner – to provide a comprehensive and scalable solution.

Sangeetha Navaratnam
Software Architect | Mitra Innovation

CI - CD Automation with Jenkins

CI - CD Automation with Jenkins

By Blogs

Software teams face a dilemma when they move away from traditional waterfall methods to agile methods of software development. Waterfall methods mean that software teams are required to iteratively build and integrate entire lists of system components and ensure that the components are tested well.

Even though waterfall methods are a manual process, it can be handled with fewer complexities due to the lower rate of iterative cycles.

With agile methods in play, teams follow a continuous – build, integrate and test – roadmap to continuously expand system functionalities instead of simply building components separately and assembling them together at the end.

Understanding Continuous Integration (CI) –

Multiple engineers work to develop new systems one component at a time. Every time a new component or feature is completed, the finished component will be added to the existing code. This is what is referred to as a ‘build’.

For instance, consider the following example;

A movie production team creates an initial video clip and completes the whole sequence by continuously adding new frames. Everytime a new sequence of frames are added, the producers play the movie from the beginning to check whether the whole movie still makes sense. If it does, then we can safely say, we do have a ‘green build’.

Now, let’s say an engineer adds a new piece of code to the green build. However, when the test is run (just like re-running the whole movie sequence), it turns out that the new component doesn’t fit in very well. The component doesn’t integrate and thus results in a ‘red build’. Now, the engineer who created that particular faulty piece of code has to fix it.

Previously engineers wouldn’t have wanted everyone to know that a faulty piece of code had been added to the system build. Today, however, it is the opposite.

Thanks to Continuous Integration practices the development team is informed as soon as a faulty piece of code is added to the build. Developers make use of ‘red’ and ‘green lights’ to maintain build status visibility across the team.

A red light indicates that no new piece of code is to be added until a green light is indicated.

 

In the case of the above example, let’s assume the team consists of 10 developers. A team of 10 developers will be adding or changing code at a rate of 50 times a day per developer. This adds up to nearly 500 builds per day, per project. This will also include the following activities for each day the project is in development:

  • 500 rounds of source code downloads
  • 500 rounds of source code compilations
  • 500 rounds of artefact deployment
  • 500 rounds of testing (i.e unit testing, regression testing)
  • 500 rounds of build pass/fail notifications

This is a point where automation is called to task.

 

Understanding Continuous Deployment/Continuous Delivery (CD) –

CD is abbreviated for both; Continuous Deployment and Continuous Delivery. It must be emphasised that the two expansions of CD are mutually exclusive and do not mean the same thing.

Allow me to elaborate in the following example;

There are two types of television programs; the news, and a cookery show. Unlike the cookery show, the news program will have to go through a considerable number of checks prior to broadcast. Similarly, certain software domains accommodate the freedom to perform a direct release into production environments, whereas other software domains are required to make a business decision (prior approval) in order to proceed with a release into production environment.

 

Continuous deployment and continuous delivery both mean automating deployment and testing of a software on a regular basis to a production-like environment. However, the differences are as follows:

Continuous Deployment  —  A process which is fully automated and includes production deployments (no human interaction after code – commit).

Continuous Delivery  —  An almost fully automated process where production deployments occur with a business decision.

 

When following CI-CD methodologies, a pipeline typically breaks the software delivery process in to various stages. The last stage will provide feedback to the development team. Even though there isn’t a defined standard for creating a pipeline, we are able to identify the following stages:

  • Build automation
  • Continuous integration
  • Deployment automation
  • Test automation

A project may have different pipelines or even different types of pipelines for different purposes.

Let’s take a look at our options.

This leaves us with a requirement for a powerful tool which is capable of the following, using pipelines:

  • Automate code builds
  • Build artefacts
  • Perform deployments
  • Run tests
  • Provision above into multiple environments (Dev, SIT, UAT/ Prep/ Prod)

This leads us to several different engines such as:

  • CircleCI
  • Eclipse Hudson
  • GitLab CI
  • JetBrains TeamCity
  • ThoughtWorks GoCD
  • ThoughtWorks Snap
  • Jenkins

Jenkins –

Jenkins is an open CI-CD tool written using Java programming language. Since Java is platform independent, Jenkins inherits the same and will run on most platforms available today.

Jenkins commenced as a fork off the original Hudson project and has evolved since day one. Today it is among the top contenders in DevOps tool-chains because it is free, open source and modular.

Even though the functionalities of Jenkins – straight out of the box – is very limited, there are more than 1,000 plugins which enhance capabilities of Jenkins far ahead of most commercial or Foss tools.

Jenkins interface

(Jenkins interface)

Coming back to the original idea of choosing the CI-CD tool for our software projects here are some ways in which Jenkins can match our requirements,

  • Automate code checkouts from repository
    • Supports almost all of the existing source code repositories and can expect to support upcoming ones as well (i.e: Mercurial, Subversion, Git, CVS, Perforce, Bzr, Gerrit, Monotone, Darcs, etc.)
  • Automate the build
    • Supports most of the build automation tools available (i.e: Command-line, Maven, Ant, Gradle, etc.)
  • Every commit should build on an integration machine
    • Supports by polling as well as providing listeners to trigger builds by the SCM (i.e: Poll SCM feature, Git Hooks, etc.)
  • Make the build self-testing
    • Supports most of the unit testing tools/platforms through number of plugins (i.e: JUnit, NUnit, etc.)
  • Test in a clone of the production environment
    • Supports most of the test tools/platforms through number of plugins (i.e: JMeter, SoapUI, Selenium, Cucumber, etc.).
  • Make it easy for anyone to get the latest executable version
    • Supports by maintaining build execution history and through number of plugins (i.e Build history, Version Number Plugin, etc.).
  • Everyone can see what’s happening
    • Supports through the simplest easy-to-understand UI and build statues (i.e: Green, Red, Gray build statues, Blue Ocean project, etc.).
  • Automate deployment
    • Supports automation as out of the box product, build pipelines and variety of plugins (i.e: Cron Jobs, Build pipeline plugin, etc)

 

Conclusion –

It is not easy to emphasise enough on the importance of orchestrating, automating and streamlining development processes. Deployments themselves being tested over and over will provide for almost complete confidence in the system being built.

CI-CD is central to DevOPs, and a successful DevOps implementation may contain implications that extend beyond IT and to business itself. Continuous improvements of software continuously improves products and services.