Wednesday 23 November 2022

Functional Library - Errors

The success path for any given application is an easy path follow. Problems arise when errors need to be dealt with. I am of the mindset that exceptions are significantly different to errors. An exception, is typically a showstopper, null pointer exception, stack-overflow exception, out of memory exceptions. These type of exceptions usually result in a program crash and graceful handling can be difficult.

The other end of the scale consists of errors that one can typically handle without detrimental effect to the application. Examples include incorrect input, failure to connect to a database, a socket etc. The latter examples might be down to network failures so retries may be possible. In contrast one cannot perform a retry if an exception is raised (assuming the above exceptions).

Like I say this is not easy! There are other issues. Exceptions create a stack trace, extremely useful for debugging. Manual error handling offers no such gifts and can make debugging much harder.

In this post...

Exceptions vs Manual Error Handling

To avoid complexity, let's keep things simple. Exceptions denote exceptional events, null pointers, stack overflows, out of memory, etc. So, essentially, system errors. Typically, exceptions of the aforementioned results in program failure.

In contrast errors denote problems that do NOT necessarily result in program failure. Examples and possible solutions follow...

  • Entering an invalid email address. In this scenario, one could just prompt user to enter a new (valid) email address.
  • Database connection failure. The database might be offline, network maybe down. One could save transactional data to a file to be run later. This is a tricky problem and depends upon the database transaction that was about to occur. Context is the key, one needs to view the impact on the application.
  • Socket connection failure. Same techniques for database failure might be applied. Retry strategies in both scenarios will be required.
Top

Functions

Functions generate results or perform an action. Typically, in OO solutions, actions do not yield a result and may, instead throw an exception upon failure. Whilst throwing an exception might be the way forward, function signatures typically do not state that an exception may be thrown. Typically, one must read supporting documentation to reveal if a function may throw an exception. Throughout my career I have encountered code that calls a function and ignores any exceptions that might be thrown. This especially tends to be the case if said function can throw any number of exceptions.

I do not think exceptions are a bad thing, however, I do believe they are misused and a better mechanism is required.

Top

Function Honesty

Consider the following function...

int Divide(int number, int divisor)
{
  return number / divisor;
}

Clearly, if the divisor is zero a divide by zero exception will be thrown. Of course the function signature does not convey this. How can we deal with this type of scenario? I believe there are three ways...

  • Return a sentinel value
    In this case NaN.
  • Throw a divide by zero exception.
  • Return a result which either contains a valid value or a reason as to why the function failed.

Analysing the above I would come to the following conclusions...

  • The NaN informs me that the function resulted in a non-number, doesn't tell me why, but, at least I know the function failed. In this simple example one could probably fathom that the function failed as the divisor was zero. Functions exist in numerous libraries that use sentinel values where usage may not be so obvious.
  • The exception informs me why the function failed. However, I might not be in the correct place to catch and respond to the exception. Also, the function does not state that it might throw an exception. So, why would I be inclined to respond to exceptions.
  • The result solution, in this case, might be best. I can call the function and act accordingly upon the result. If the result denotes success, on my merry way I go. If the result denotes failure, I might retry, log the error and fail gracefully, maybe try a different execution path. Best of all, the function signatures indicates that the function may succeed or fail.

Now, consider the following function declaration...

int GetChar();

Most will understand that char is short for character. As such, one might assume that the function returns a character. However, this is clearly not the case as the function returns an int (signed integer). The reason for this is that -1 is a sentinel value that indicates end of file. Of course you will only know this if you read the documentation and said documentation is both accurate and up to date. Personally, I think a better way to to handle this situation is to use some kind of iterator. For example...

class ReadCharacters{
  public bool IsEof {get; private set;}
  public char Current { get; private set; }
  
  public bool Next()
  {
    int ch = GetChar();
    if (ch == -1)
    {
      IsEof = true;
      return false;
    }
    Current = (char)ch;
    return true;
  }
}

I am not going to pretend that the above code is foolproof. However, I think it conveys more information than "GetChar" and is harder to misuse. Both are good properties that help produce code with fewer bugs.

Top

Saturday 19 November 2022

Performance Testing

Often in software development one needs to determine which algorithm is more performant. The easiest way to do this, in C#, is to use the Stopwatch class. To ensure fairly accurate results, it is best to run each algorithm N times and take the average. Even so, results are not entirely accurate as the first run will be cold. That is, currently, the JIT compiler may not have been enacted. My own test results show that that the first run tends to be slowest. Subsequent runs tend to be much faster (in the same session).

Regardless of the above, timings are useful enough to determine performant algorithms.

In this post...

Goals

With any software project, no matter the size, some planning is always worthwhile. So, for this project I want the following...

  • Easy test specification, e.g. test name and a function pointer to the test.
  • Specify the number of times each test should be run.
  • Obtain the total time for all test runs and the average run time.
  • The ability to obtain test results as a string.
Top

Analysis

Given the above requirements the following can be stated.

  • Each test is named and timed.
  • Each test is run N times, from this one can deduce the total runtime and average runtime.
  • One may run multiple tests and each test may be run N times. From this one can deduce that an array of sorts will be required.
Top

Design

Given the requirements and preliminary analysis we can proceed to perform some basic design.

Let's start with the idea of a test. A test simply consists of a name and a function pointer. Something like...

Test(string name, Func<object> func)

The function pointer expects the function to return an object. This leaves it open-ended and will prove useful to ensure that in release mode functions are not removed. Release mode in most languages will remove functions that appear useless.

Next we need a structure that holds the results of a single test run N times. We might call this TestGroupResult. The structure should maintain the test name, total elapsed time (for the N runs, and the average time per test run). Such a structure might be as follows...

TestGroupResult(string name, TimeSpan totalElapsedTime, TimeSpan averageElapsedTime)

For the next step we need a structure that maintains the entire result set. That is all the group test results. Again, for each test we run said test N times to obtain a total elapsed time and average times. The result is the TestGroupResult. Let's name this new structure BenchmarkResult...

BenchmarkResult(int runsPerTest, GroupResult[] results)

Finally, we need an entry point, where we specify the tests to run and the number of times each test should be run. It makes sense to call this entry point Benchmark.

Benchmark.Run(string name, Func<object> func)

To create an instance we might run the following code.

internal class Program
{
  static void Main(string[] args)
  {
    var result = Benchmark.Run(10,
      ("Test1", () => { Thread.Sleep(500); return 1; }),
      ("Test2", () => { Thread.Sleep(400); return 2; })
    );

    Console.WriteLine(result.GenerateReport());    
  }
}

Using the above code, my console output is as follows..

Top

Source code

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Text;

namespace Fun.Benchmarking
{
  /// <summary>
  /// Encapulates a group of tests.
  /// Maintains the total elapsed time and average elapsed time.
  /// Generate a report as a string.
  /// </summary>
  public class TestGroupResult
  {
    public string Name { get; }
    public TimeSpan TotalElapsedTime { get; }
    public TimeSpan AverageElapsedTime { get; }

    public TestGroupResult(
      string name,
      TimeSpan totalElapsedTime,
      TimeSpan averageElapsedTime)
    {
      this.Name = name;
      this.TotalElapsedTime = totalElapsedTime;
      this.AverageElapsedTime = averageElapsedTime;
    }

    public StringBuilderGenerateReport(StringBuilder sb)
    {
      return sb
        .AppendLine($"Test Name : {Name}")
        .AppendLine($"  Total Elapsed Time: {TotalElapsedTime}")
        .AppendLine($"  Total Average Time: {AverageElapsedTime}");
    }
  }

  /// <summary>
  /// Represents a benchmark result which includes...
  /// The number of runs for each supplied test.
  /// An array of test group results (a group is a single test run N times).
  /// </summary>
  public class BenchmarkResult
  {
    public int RunsPerTest { get; }
    public TestGroupResult[] GroupResults { get; }

    public BenchmarkResult(
      int runsPerTest,
      TestGroupResult[] groupResult)
    {
      this.RunsPerTest = runsPerTest;
      this.GroupResults = groupResult;
    }

    public string GenerateReport()
    {
      StringBuilder sb = new StringBuilder();
      sb.AppendLine($"Runs Per Test: {RunsPerTest}");
      foreach (TestGroupResult groupResult in GroupResults)
        groupResult.GenerateReport(sb);
      return sb.ToString();
    }
  }

  /// <summary>
  /// The Benchmark allows one to determine the performance of an algorithm.
  /// Multiple algorithms may be tested, great for determining which algorithm performs best.
  /// </summary>
  public class Benchmark
  {
    /// <summary>
    /// Run one or more tests and specify how many times each test should be run.
    /// The total and average elapsed time for each test will be calculated.
    /// </summary>
    /// <param name="runsPerTest">How many times each test should be run.</param>
    /// <param name="tests">An array of tests to be run.</param>
    /// <returns></returns>
    public static BenchmarkResult Run(
      int runsPerTest,
      params (string Name, Func<object> Code)[] tests)
    {
      List<TestGroupResult> groups = new List<TestGroupResult>();
      foreach (var test in tests)
      {
        TestGroupResult group = RunTestGroup(runsPerTest, test);
        groups.Add(group);
      }
      return new BenchmarkResult(runsPerTest, groups.ToArray());
    }

    /// <summary>
    /// Run a single test (N runsPerTest) times.
    /// </summary>
    /// <param name="runsPerTest"></param>
    /// <param name="test"></param>
    /// <returns></returns>
    public static TestGroupResult RunTestGroup(
      int runsPerTest,
      (string Name, Func<object> Code) test)
    {
      Stopwatch stopwatch = Stopwatch.StartNew();
      for (int testCount = 0; testCount < runsPerTest; testCount++)
      {
        test.Code();        
      }
      stopwatch.Stop();
      TimeSpan elapsedTime = stopwatch.Elapsed;
      return new TestGroupResult(test.Name, elapsedTime, new TimeSpan>(elapsedTime.Ticks/runsPerTest));
    }
  }
}