Many people believe that Agile is a new concept, and it is something most developers would have to ramp upon. Surprisingly, Agile software development practices have been around for decades, and many senior developers are instinctively driven to use Agile techniques as they have become proficient with best-practice software processes. This story will illustrate how we engaged in Agile development practices, even when the concepts and terms were not yet socialised and outside of the traditional environments Agile is typically used.
In my former life, I worked in software development, specifically on a suite of telecoms products in the 3G area that was live in networks around the globe. I was responsible for planning the release of software updates for several of these products at different times, and eventually for the overall planning of the complete 3G product suite.
We were approached by one of our 3G customers in the US with a novel problem. One of the handset vendors that supplied this customer had a common set of flaws with all their 3G products, and our customer hoped we could address these flaws. To give some context, we had a long-running dispute about a perceived fault in our 3G product. Our “faulty” products were causing unacceptably high levels of dropped mobile calls for their users. After a lot of effort and investigation, we were able to identify that the issues were almost exclusively related to the 3G hardware of the vendor in question and that the fault was with their products, not ours. After some detailed technical presentations to our customer, they accepted the results of the investigation.
The vendor misunderstood certain aspects of the 3G standards, specifically the parts of the radio interface between the handsets and the 3G network. Their misunderstanding of the radio interface was consistent across all their 3G products. This meant that all their products (smartphones, 3G broadband dongles, 3G chipsets used in laptops/tablets, etc.) all had the same flawed behaviour. This flawed behaviour was when a user is moving from one cell/radio tower to another, either within a single network, or when moving from one network to another network run by a different provider (for example sitting on a train and moving from a Vodafone network to one run by Telecom Iceland instead).
The result was that calls got disconnected when they should have been transferred successfully between towers/networks. It did not occur 100% of the time but significantly increased the number of disconnects users experienced and generated negative PR for our customer. Our customer contacted the vendor and explained the situation to them. The vendor denied they had misunderstood the 3G standard and said we were at fault despite all evidence to the contrary.
When the customer investigated, they found that deploying over-the-air (OTA) updates to this supplier’s hardware was not a viable strategy. They had hoped that if they were able to create updated software for the relevant hardware themselves (or via a contracted 3rd party). The common update mechanism for their entire product suite was entirely manual and involved sideloading software to the devices from a PC for the most part. The volume of affected devices meant there would be a considerable PR hit for our customer if they chose to go that route, and the estimates for the cost of setting up a support organisation to handle this and replacing the inevitably bricked devices from failed upgrades / sideloads was eyewatering.
We assessed the query, and the outcome was that from a technical perspective it was possible. We were able to identify the relevant hardware via manufacturer / ID tags that were included in certain radio interface signals. The faulty implementation of the radio interface revolved around a small number of specific events within the 3G standard so creating a compatible radio interface for this specific vendor’s hardware was not a difficult feat. The real issues were more of a strategic/philosophical nature:
- Was it ethical to fix the mistakes made by an unconnected 3rd party without their consent or knowledge?
- Is it a once-in-a-lifetime type of situation or do we want to explore this as a potential service offering?
- How do we ensure this solution does not break once the vendor corrects their faulty implementation? Or when they introduce further mistakes?
The conclusion was this situation would be a test of how we could offer such a service if requested, but that it was not going to be published in any of our literature as the viability of a solution would be heavily dependent on the technical aspects of the situation. In this case, the agreement with the customer was:
- We would provide a solution that could be turned on or off on command. The default was for it to be turned off.
- We would provide support for this specific solution for a number of months. We would not provide an update if the hardware vendor changed their 3G radio interface implementation afterward.
- If that happened the US customer would pay us to update our software again, assuming our investigation showed the updated software behaved consistently across the full set of products.
A significant obstacle was testing the solution. We did not have any of this vendor’s hardware in our test environments. It was not possible for us to acquire any due to import/export restrictions. It was not possible for the US customer to provide us with any hardware samples due to licensing agreements with the vendor. The solution we devised involved an Agile process for development and testing and involved live networks in the US. The US customer was reluctant at first but once we explained the reasons, they understood we were mitigating the risk to the best of our ability and providing a quick turnaround mechanism for any issues encountered.
Once agreed we proceeded in sync with the US customer. The first step was completed in parallel where we tested the software in our environments (both with the feature turned on and turned off) to ensure no impact on the normal radio interface behaviour, while the US customer tested it in their test lab with a selection of the vendor’s equipment to show that the alternative radio interface worked. We were able to perform our usual regression tests, which included a lot of automated testing, while the customer was restricted to manual testing on a small scale.
After a week of these tests, the next step was to roll out the new software in a live test network. The customer selected two areas for this limited deployment. One of the areas was surrounded by more of the customer’s own networks (meaning any call handovers were to their own network and equipment) and the other area bordered on another service provider’s networks (meaning handovers to another service provider’s network and equipment). Both test areas were live with real users being served and were in light urban areas. Both were isolated from the normal maintenance process (weekly finetuning of parameters, etc.) and were put under hyper-care both by the customer and us. We had agreed and arranged a weekly delivery cycle for any issues found and both our test environment and the customer’s own test lab were on standby to test any updated software. A small number of issues were uncovered and were fixed in a single update.
After 2 weeks the customer was happy with the results and then deployed the software in a select network in a more densely populated urban area. Again, we had arranged a weekly delivery cycle for any issues found and both our test environment and the customer’s test lab were on standby to test any updated software. No issues were found at this stage but finetuning of certain parameters was found to improve performance. At the end of this stage, the customer was pleased with the overall results and proceeded to deploy the updated software across their entire networks at a much quicker pace than they normally did. The solution was not 100% perfect, it did eliminate 90+% of the issues within the US customer’s own networks and removed 30-60% of the issues when bordering on other provider’s networks. It means our customer could show their own users they improved their networks and that the remaining problems were with other service providers’ networks.
While it is strange to consider this process as Agile in nature, and even more so to consider using Agile outside of a test environment but given the situation we were in we ran it in an agile manner:
- We had development and testing tightly coupled together with short, direct feedback loops for any faults found.
- We ran each stage/sprint in as short a timeframe as possible and had specific goals in each test environment at each stage, so we were not repeating other testing and thus reducing efficiency. We believed that we could have run the live network trials for a shorter period (over a weekend, ideally a weekend with a public holiday connected to it) but the US customer needed to show senior management that the solution worked, and they felt that 3-4 days was not a large enough sample size to support their case.
For us it was a great experiment for using the Agile development and test process in the maintenance organisation outside of our usual spaces, the testing in a live network on that scale was new to us. As we had a good deal of experience in Agile planning, development, test, and release, we had a good understanding of the decision processes, fallback strategies, and contingencies for issues that arose, and this assisted in making the event a success and in reassuring the customer that we were not experimenting blindly with their live network as it generated revenue for them.
This kind of solution is the outside-the-usual-lines type of solution we at Aspira like to consider when we work on a client’s problem. For anyone encountering difficulties that need a novel solution, or that needs advice on testing, development, or deployment strategies, Aspira’s Software Services group is an excellent choice.
I have shown that although many people might think becoming Agile is a recent development, part of the “digital transformation” that many organisations are driving currently. In reality, is it a way of developing software that stretches back multiple decades? Agile frameworks like Scrum are modern guides to processes that were well known to be well suited to “best-practice” software engineering. It might mean that your development team’s digital transformation journey will be a lot shorter and easier than you might think it will … just ask some of your “old heads” and see if they can teach you about the “old way of doing things!”.
Reach out to us today to learn more about Aspira’s Software Development Services.