New Telegraph

How FinTech companies can achieve 100% uptime and reliability

The Fintech business has been making massive impacts in recent times in the African ecosystem and this has contributed to the overall use of technology tools for transaction processing.

However, there has been an increased surge in financial transactions over the years that has now seen a lot of consumers gradually shift their transacting patterns from
traditional in-branch banking to a more digital approach.

Whilst several entrepreneurs have been able to hop on the fintech transaction digitization
bandwagon, the ability to scale and service customers without major downtimes has been a huge concern over the years.

Here are 7 (being the number of perfection) tips and recommendations for achieving maximum uptime and reliability:

People

Companies tend to go with the 300 spartans approach when starting off a company and
eventually go live and start scaling with this same amount of guys. Unfortunately, this is not a three day journey neither are you fighting only 70,000 enemies that die only once.
If a transaction fails today, it is re-tried almost immediately. Getting the right amount of people with the right skill set and mentality cannot be over-emphasized as these individuals are the pillars of your organization and the experts that drive the bazookas and armor tanks that you use to service
your customers.

Process
The spartans had a systematic way of approaching their battles, they would have performed pre-battle rituals and developed fierce-looking well trained external appearances towards their battle. Many tech companies today, lack the right process from the ideation stage to execution
and there are two parts to it:
1. Engineering: Take an engineering team in a gen-z fintech startup world today, we see
the senior engineer playing the role of an Enterprise Architect and the Product Manager playing the role of a Quality Assurance Engineer, automatically, this flaws the process of building. As there are more short-sighted activities than the overall grand scheme goal of the product. What we then have, is a product that has not gone through a rigorous development cycle that can stand the test of time. There are several standard
engineering issues that a product team should think about when building, these issues range from application scaling, database scaling, API optimizations, default
configurations, unoptimized database tables, and a host of others. With the right
engineering process, several aspects of the engineering issues will be addressed and
tackled even before go-live.
2. Issue Resolution: The question I typically ask companies or product teams when they
go through issue resolution is, did we learn from that? Or did we just fix it and move on?
I believe this is a pretty obvious aspect of process development. They say, “what doesn’t kill you makes you stronger” but if you look back, did the resolution of that issue make you stronger? Or it just created another grey area in your system that people will forget about over time. As an organization, when issues come, you should typically find a
solution at the moment to service your customers and create an incident report as well as a newly implemented process that will prevent that issue from re-occurring. The
process could be a new engineering task or product feature or simply a step-by-step
guide on dos and don’ts that people should follow.

Monitoring
This is one aspect of achieving 100% uptime that is largely overlooked. Companies don’t just simply invest in monitoring, I mean, why pay someone to just look at a bunch of dashboards and screens for the whole day and not do anything? That’s simply “unrealistic” for some companies
and it only makes sense for big companies and organizations to have a monitoring team.
However, what these companies fail to understand is that monitoring is one of the most important parts of a product. Now, some might argue that they have monitoring but in reality, what they have are “problem announcers”. Monitoring is not being reactive to issues, it’s about being proactive and being able to sense when there’s a potential issue that could occur. This can be achieved by a combination of tools tailored for your products and people that are
well-trained with the right skill set. A few tips on things to monitor include:
1. Database Monitoring: Just look out for slow and inefficient queries, DB table size
growth, and resource utilization
2. API Monitoring: Look out for API calls that take the most time in a microservice and
optimize them. There are several tools to achieve this e.g new relic, dotcom-monitor,
checkly, uptrends e.t.c
3. Third-Party systems monitoring: This is also related API monitoring, only that this time you might not be able to install custom tools for this, however, you can build out metrics and checks into your application such as success rate and average response time tracking for the providers you are integrated to.
4. CPU & Memory Utilization monitoring: Sometimes, the underlying infrastructure can
be a problem, even after several optimizations of other components, you might still see the server struggling. Thankfully, we have tons of gigabytes in the modern world that can be purchased from several sources to help the application from struggling with resource utilization.
5. Network monitoring: Your VPN server can be the culprit, or even the firewall or router.
They could be dropping packets or even have lost connection to the other entity’s server.
Monitoring this by configuring alerts for idle tunnels, increased load or packet drops can
help the team become proactive.

Distributed Processing
There are 3 main parts of distributed processing:
1. Applications: There’s a reason the micro-service architecture was invented, please use
it! Having one monolith doesn’t help your cause as an organization. In addition, know
when to scale your applications, tools like Kubernetes and load balancers have been made available to help with the auto-scaling of applications, this helps distribute the load
across several instances of a microservice.
2. Databases: Know when to scale horizontally and vertically! Asides from backing up the database tables and adding more resources, sometimes it’s best to just horizontally scale the DB simply to separate concerns and reduce the load on one DB instance. It’s also similar to the microservice approach.
3. Alternative Routes: Don’t put all your eggs in one basket if you want to scale! The
people you’re connected to or relying on might not be willing or ready to grow as fast as you want to. Get several ACTIVE routes for a single responsibility. I repeat ACTIVE not
passive. Use them actively and spread the load across them all. This will help you
understand the strengths and weaknesses of your providers over time. Don’t just use
one main partner and wait for them to fail before you move over to the next person, use
them all at the same time this way you don’t create a bottleneck out of your providers.

Automatic Failovers
This is similar to Alternative Routes earlier stated however instead of a manual failover on one provider, you build in monitoring tools in your application to help detect when providers are performing below their normal expectations and automatically switch to the next most optimal provider. This will help eliminate potential downtimes from a provider and the time to manually change a configuration from one to another will be completely eliminated thus presenting a much more reliable system to your customers. The same can be done on the network layer connections via VPNs. Have several VPN providers, don’t just assume Google can never go down or Azure can never go down, they are also companies like you.

Penetration Testing
Pen-testing your systems regularly cannot be over-emphasized. Not just before go-live, but
even after go-live. There have been several cases of hackers performing D-DOS attacks on systems that eventually lead to downtimes. Carrying out white-hat testing helps identify areas of improvement in your system’s security and ways to mitigate them.

Training:
You cannot hire all the experts in this world, but you can train individuals to become experts!
Training your employees is a very crucial aspect of achieving reliability. The ability to learn new technologies or even learn more about existing technologies currently being used in an organization helps the employee even deliver more. Don’t just rely on the skill you used to hire the employee as it might have been obsolete in another year or two. A few tips:
1. Invest in online training courses for your employees e.g Plurasight, coursera, udemy,
and a couple of them
2. Carry out physical training for your employees by either sending them for training with
organizations whose tools you use or inviting these experts to train your employees
internally
3. Carry out routine cyber security training to help build reliable and secure systems.

Achieving 100% uptime and reliability is not stumbled upon but requires a deliberate set of actions and consistent planning to reach the said goal. With the 7 perfect tips above, I believe you should achieve 100% uptime in little or no time having the right set of people, processes, and tools.

Profile:

Solomon Amadi is the Vice President of Processing Infrastructure at TeamApt Limited, Nigeria’s leading financial inclusion digital technology provider with almost a billion dollar valuation.

He has 10+ years of experience in software engineering and His expertise spans Business strategy,
transaction switching, Engineering, Product Management and Software architecture.

Read Previous

EPL: Record-equalling Kane boosts Spurs’ top-four bid

Read Next

Scrapping Mandatory NYSC Key To Getting Diaspora Support, Group Tells Tinubu, Atiku

Leave a Reply

Your email address will not be published. Required fields are marked *