Abridged Version Published in August 2014 Peer to Peer, The Quarterly Magazine of ILTA (International Legal Technology Association)
Failing faster to Succeed Quicker
A methodology to improve software/systems debugging and problem-solving in general.
Problem-solving is 90% of what IT professionals do. How do you as a manager help them or as a technical professional help yourself, solve problems faster and better? There are formal software debugging methodologies, but for the purposes of this article I am going to explain the method I have absorbed from those over the two decades I have been an IT professional and distilled into my own method. I am standing on the shoulders of the engineers and computer scientists that came before me. Being a better problem solver will set you up for success now and in the future.
Enough background though, you want to know this “magical” method that will save time and create better software and systems:
Failing Faster – The Method
- Fail Faster – More attempts with less waiting
- Fail Smaller – Focused changes on each iteration
- Fail Smarter – Have a plan
- Build on Successes – How did you solve similar in the past? Use others’ experience
- Know When You Have Failed – Sometimes things don’t work. Know when to quit.
- Stop on Success – Know when you are done. Do not exceed your scope.
I thought about it but I just could not change the method that works just to make a cool acronym.
Are we talking about actually trying to fail? No. So what is failing faster? Simply put it is trying things until you solve a problem. In technical parlance this is an iterative approach with small enhancements and or fixes in each iteration. This is the antithesis of “analysis paralysis.” I find my most successes when I try something rather than sitting and trying to imagine an answer. You have plenty of time to think while compilers are running, server are rebooting, or you are just waiting on results.
In software development we have the luxury of “low costs of failure.” Meaning that people don’t die and materials are not wasted if we just try something. If a programmer tries a piece of code and it does not do what is expected he can just start over. The only cost is time, which is why I encourage this methodology. Often we have time, but no other resources available.
This would be the point where I mention the “faster” part. We want to use less time. We may have time as our only resource, but what Legal Technologist has spare time? I have never met one.
If you fail 100 times in a day, you are going to get more right answers when you are done than if you fail 30 times in a day.
You never want to be repeating mistakes. Your goal is to making fewer mistakes on each iteration. Faster does not mean just do a bunch of things as fast as you can. It means don’t spend all that time staring at the ceiling imagining what you want to test, go make a change and test it. Be an action verb.
This famous quote applies here, “Doing the same thing and expecting different results is the definition of insanity.” You have to improve things incrementally each time.
Narrow the scope of changes you are making. Your changes need to be precise like a surgeon’s scalpel, not massive like a blow from a giant war mace.
Trying to do too much is a sure path to complete failure. If you have 30 things to tweak that could be the cause of your problem, you would not make changes to all 30 and then check your work. You make one change at a time, rolling back the change if it did not improve the outcome.
There is an old programmer’s joke it goes something like this: The compiler says you have 5 errors. You make a change and compile again, now you have 37 errors. This is not what we want.
Don’t just thrash about. Have a plan as to what you are trying to achieve. Making the program work is not an achievable goal. That is an aspiration. Making the data repopulate automatically as you change web pages is a goal. You probably need to break that into smaller and smaller goals until you can try one, and fail at that until it works.
If your developers say they are trying to get the web page to load faster, ask them what they are working on and how they plan to achieve it. They should respond with a focused answer. If not guide them to a single item. Again, faster web page loads is an aspiration not a goal. Speeding up the query to load the main data grid is a goal.
Build on Successes
If you are deliberate and careful you can recreate each success and improve what you are doing. There is also no reason to reinvent the wheel.
You may not know what the reason for the odd behavior the testing team is reporting with the new application, but chances are after a few years in IT you have seen something like it.
We have Edward Gauss to thank for the “Wolf fence” algorithm. It goes something like this: “There is one wolf in Alaska, How do you find it? You build a fence in the middle of the state. Listen for the wolf to howl. Determine which side of the fence it is on. Repeat the process on that side only until you can see the wolf.” This is a bisection algorithm and one of the most important. You keep splitting the problem in two until you have solved it.
You know that seeing another system user’s data when you are making a data request is not due to network error. It has to be some kind of global variable not being contained to the user session. Where that problem exists is unknown, but you can lead your team in the right direction if they are stuck. Hint, it is probably not the database. Modern Database systems are pretty robust. Trust the database engine to do what it does. Your code is the new untested part.
Know when you have failed
There is a possibility that the system will never work. There are multiple ways this could occur: The design is wrong. Microsoft Access is not the database to use when running a 1000 transaction per minute customer facing website. The code has a serious design flaw. The application data is a hopeless pile of spaghetti code that is impossible to trace much less fix.
What do you do in these cases? In a word, punt. You have to just start over. It is never fun to tell the project sponsor that you cannot accomplish what they want. Beating your head metaphorically against the wall until you are unconscious is not the road to success.
This does not mean to just give up when something is hard, but after years of experience you will know when you or your team is just not going to make it happen. It is difficult to kill a project, especially one that someone has spent days, weeks or even months working on, but sometimes it has to be done.
Stopping on Success
This seems simple, but in practice it can be hard. You have to stop when you have succeeded. If the specification says you need the result returned from the database in 5 seconds. Once you get a 4 second return time average you are done. Do not keep going to try to get it 1 second.
It is tempting to “chase shiny objects.” My team calls them “squirrels” after the easily distracted talking dog from the movie UP. You project managers you know it as “gold plating.” It is just doing work that is not required.
People often berate the technical teams for not letting things complete. No program is ever finished. Another quote is “Programmers will never stop unless you make them.”
Wrapping it all up
You have to try to succeed. Before you succeed you will fail. So you might as well fail faster so you can get to success sooner. Failing smaller gets you quick results. Failing smarter gets you better output from each attempt. Building on your successes from the past or successes of others is the way to use knowledge to avoid bad attempts. Knowing when you have failed is important; some things will just never work. The biggest thing you can do to avoid wasted time is to stop on success. Now go out there and fail faster so you can achieve success quicker.
Further Reading and references
- Algorithmic Program Debugging – Dr. Ehud Shapiro
- Why Programs Fail: A Guide to Systematic Debugging – Andreas Zeller
- “Pracniques: The “Wolf Fence” Algorithm for Debugging” – J. Gauss