• 28Nov

    David Worthington’s recent article in SD Times is based on research results from Forrester’s “Problem Resolution Survey Results and Analysis,” and makes for interesting reading. The article states that “the biggest time-sink in the application production life cycle [receives] the least regard from development managers.” The time-sink to which Worthington is referring? Investigating and resolving application problems.

    A couple of other gems from the article:

    “The respondents spend almost three out of every 10 hours (29 percent) in various stages of troubleshooting: documenting, reproducing or testing. On the average, a problem takes six days or more to resolve, and one in four of the problems reported by a QA or test group are returned as irreproducible.”

    “Of the time spent on defect resolution, 26 percent is spent reviewing information, 34 percent on reproducing the behavior, and the remaining 40 percent goes toward isolating the root cause of the problem.”

    Someone more cynical than me may wonder why there is no time left over to actually code and resolve the problem! Seriously though, these numbers reinforce the need to continue investigating different ways of building more robust code in the first place, meaning to detect possible bugs earlier in the development life-cycle and to implement a program of continual process improvement.

    The article does not divulge any specific methodologies these projects use. It would be interesting to know if any were using agile techniques such as incremental development or TDD (or even doing any unit testing - in our experience, most teams don’t).

    Surprisingly, only 66% of managers would be interested in a solution to these problems, even if “it created significant efficiencies and improved quality” (two somewhat subjective dependencies). This reflects a serious attitudinal problem: for the remaining 34% it smells to me like: “post deployment this is someone else’s problem.”

    By the way, these issues are not confined to niche areas: the findings were universal across verticals and enterprises.

  • 26Nov

    Last week, I started reading Beautiful Code, which is a wonderful collection of life stories from various authors including Tim Bray, Michael Feathers and Karl Fogel, including coding examples in many different languages from Assembler and LISP to Java and Ruby.

    Often, when people use the term “beautiful code”, they talk about attributes such as structure, adaptiveness, naming conventions, decouplement and all the stylistic attributes that make the code pretty (easy on the eye). In the book, Adam Kolawa raises an additional issue which is worth discussing: results.

    For Kolawa, beautiful code means code that allows “…use and reuse without any shred of doubt in the code’s ability to deliver results…not what the code looks like but what I can do with it.”

    This is an interesting viewpoint; it is generally accepted that most of the cost of a software project is in the maintenance phase, reworking and bug fixing. So the primary focus of code quality efforts is often on readability, with the expectation that others will take over maintenance of the code later.

    Although “elegance” and “results” are related, there is a subtle difference; does the code do what I want it to do efficiently and is it maintainable?

    We’ve all had to write ugly code in the past for different reasons. I recall my first programming job, working with Ada on a radar system where, to satisfy performance requirements, we changed all the case statements to if/else statements as they were a fraction faster and got us inside the timing requirements - the code “looked” worse - and was probably harder to maintain, but it did what it had to do.

    Even in these days of seemingly limitless memory and processing power, with some companies like Google utilizing billion element arrays, sometimes there is still a balance to be struck between elegance and results.

  • 14Nov

    The build process is still an area I see in many organizations that, perhaps surprisingly, is overlooked. Many teams do just enough to compile and package up an application, and not much more. There is significantly more value that a well defined build process can add.

    I am an advocate of a full build process. What do I mean by full? I mean that a build does the following:

    1. Gets the latest source code from the Repository system
    2. Compiles and runs unit tests
    3. Runs analytics and QA gates (at development level)
    4. Produces reports
    5. Informs the team (or at least the Build Manager) if any problems occur
    6. Publishes the application to a test server so the test team can get straight to work

    And, does all of this automatically, eliminating mundane, repetitive manual processes (which can, and often do, go wrong). The ultimate goal, of course, is Continuous Integration (CI), but let’s not get ahead of ourselves.

    By scheduling this process nightly (or even more frequently), the team is guaranteed to discover compilation errors that may not be present on their workspace. (The developer’s workspace and the build machine may not be in sync, and there may be other software that needs to be added to the build and test machines.)

    Also, unit tests can be run against the integrated code, again showing any issues that may not arise on a single developer’s machine. If a problem does occur, the system can email the build manager to inform him/her of the problem so they can investigate and report back to the team what the problem is.

    Another huge benefit is the fact that the test team can walk in and get straight to work without the hassle of setting anything up and jumping over technical hurdles to get the application configured and working before they can start doing their job. I’ve seen examples where testers have to spend up to half a day trying to resolve these issues.

    By adding analytics and reporting (i.e. going beyond the minimum requirements), management can receive automated updates of the health of the project to be prepared for any meeting with the team. You can produce a lot of reports from different plug-ins which can provide great data for constructive feedback to the team and provide visibility into the project at different levels.

    ANT or Maven can be used to write scripts to perform the tasks of compiling, executing tests, reporting and setting the application up to be copied to a test server, while CruiseContol, Hudson and Continuum are all free CI Servers that can perform scheduling and automate these tasks.

    lava.jpg

    If you are new to this, or feel that your build process is at a ‘bare minimum’ all this may seem like a daunting task. ‘Pragmatic Project Automation’ by Mike Clark spells out how to automate the build process in less than 150 pages and even shows how to use lava lamps to indicate whether the build succeeds or fails.

    CI introduces the concept that the build process gets triggered every time a change to the code or a configuration file is committed to the version control system. The two greatest benefits of CI, in my opinion, are that (a) risk is further reduced (any defects, by definition, must have occurred with the last edit, and can be fixed straight away) and (b) the fact you can produce deployable software at any time. ‘Continuous Integration – Improving Software Quality and Reducing Risks’ by Duvall, Matyas and Glover is a good book that explains this further.

  • 05Nov

    Recently, I was fortunate enough to catch Josh Bloch’s ‘Java Puzzlers Episode VI: The Phantom-Reference Menace/Attack of the Clone/Revenge of the Shift’. One puzzle in particular concerned the use of threading in JUnit testing. This is a great example of a problem developers may face, however, I believe it also points to a much more significant problem – false positives in feedback systems.

    Reproduced here (with the author’s permission) is the problem:

    Q. How Often Does This Test Pass?

    Test

    The assumed answer may be a) or b) due to race conditions when the thread accesses the variable ‘number’. The actual answer is c) It always passes. This is because the assertion is running in a background thread to the ‘test’ method and JUnit does not support concurrency and is unaware of any action or exception thrown in the background thread, and so will exit with success. The code to resolve the issue is included at the bottom of this post.

    Once you know what the problem is, it’s not that difficult to resolve. But therein lies the problem. JUnit passes this test every time. A manual code review would, in most cases, probably not catch this and, in any case, how many passing tests are even reviewed? This test will pass, the code will be shipped and it’s not until a run time problem occurs that the problem is reported back and the scratching of heads occurs as all the unit tests pass. (OK – I’ve taken a bit of a liberty here, one would hope system testing would catch this but it is not always the case).

    The only really effective way to catch problems like these is through the use of static code analysis tools. FindBugs is one tool that can find and flag this particular problem. These tools are very good but not a panacea; of course some problems may still escape undetected. However, the extensibility of these code analysis tools ensures that, once discovered, rules covering these problematic areas can be added, enhancing the tool’s power and utility.

    (For those interested, take a look at Puzzler number 5 – ‘Mind the Gap’ (PDF link), in which the java.io.skip(gap_to_skip) returns any value between 0 and the parameter passed in. One would wrongly assume that the return value would be that of the next element after ‘gap_to_skip. Bloch stated that in the source code of the JDK, there are 67 instances of this call, and in 56 cases the return value is not checked!)

    My take from all of this? Although false positives are a fact of life, we can and should use static analysis to find those cases that can be found, and we should be diligent about adding new rules, where possible, to cover problems that crop up, so that they don’t happen again.

    Solution:

    Bloch showed a couple of possible solutions to this problem:

    1. Create instance fields in the test to hold any exceptions generated in the background thread and set them in the thread’s run method.

    volatile Exception exception;
    volatile Error error;

    Thread t = new Thread(new Runnable() {
    public void run(){
    try{
    assertEquals(2, number);
    } catch(Error e){
    error = e;
    } catch(Exception e){
    exception = e;
    }
    }
    });

    2. Check those fields in a tearDown method.

    //Triggers test case failure if any thread asserts failed
    public void teardown() throws Exception{

    if(error != null)
    throw error;
    if(exception != null)
    throw exception;
    }

   

Recent Comments