Software Development With Karl: Tips To Improve Software Development

1. Write A Problem Statement
2. Break Problem Into Smaller Manageable Tasks
3. Limit Use of Third-Party Libraries
4. Use Immutable Data
5. Avoid Primitive Obsession
6. Never Use Global Variables

1. Write A Problem Statement

Software is written to solve problems. If you don't fully understand the problem, you are unlikely to solve it correctly. Also, as time progresses you may forget the original problem statement or deviate from solving the original problem. Write (or type) the problem statement and keep it handy. I usually write an abstract text file and store it within the project. This ensures I can visit it frequently to keep the problem statement fresh in my mind.

Top

2. Break Problem Into Smaller Manageable Tasks

This is almost like a to do list. For example, assume you wish to create an app that populates a database with invoice details and posts to a financial system. One way to break this down might be as follows.

Obtain invoice source data
Validate source data
Select/filter relevant data
Populate invoice database
Post invoice details to financial system

Notice that each task is brief, but completing each task will result in a solution. Before attempting a task, break into even smaller tasks until you reach a point where you feel ready to start writing some code. Using the above example, I might break the last task into the following smaller tasks.

Initiate financial application session
Select pending invoices from database
For each pending invoice perform the following

Log invoice details
Validation (e.g. check supplier is valid, that net, VAT and gross tally and so on)
Attempt to post to financial system
Update database to reflect invoice posted or failed to post
Log invoice post details

Tidy up - close any database connections, logoff financial system

Using this approach you can enter tasks and sub tasks into a to do list or spreadsheet and mark as completed as you progress.

Top

3. Limit Use of Third-Party Libraries

Introducing additional dependencies into software also introduces additional failure points. The additional dependency may contain unknown bugs which will ultimately become your bugs.

Typically, when using third-party libraries I will code an API wrapper. I then use my wrapper rather than calling third-party code directly. This has a number of benefits.

If I spot a potential bug I can output trace information or place breakpoints for debugging within my wrapper function.
I can additional code within my wrapper function to fix bugs.
I can wrap multiple third-party library function calls into a single wrapper function.

Top

4. Use Immutable Data

One of the biggest problems with software is complexity. As expected, complexity increases as the software grows in size. State, particularly managing state is probably the biggest contributing factor to complexity.

There are two ways of managing state.

Mutable State

Used in classic object-oriented programming, data is modified in-place. That is the original data is modified.
Immutable State

Used in most (all?) functional programming languages. New data is created by transforming the original data.

As an example, consider a simple Person class that contains first and last names. The following C# snippet illustrates mutable data.

class Person
{
  public string FirstName { get; set; }
  public string LastName { get; set; }
  
  public Person(string firstName, string lastName)
  {
    this.FirstName = firstName;
    this.LastName = lastName;
  }
}

static class Test
{
  public static void Run()
  {
    // Create a new person Fred Bloggs
    Person p1 = new Person("Fred", "Bloggs");
    
    p1.LastName = "Smith";
    // Fred Bloggs has changed to Fred Smith, the original data is lost
  }
}

The following C# snippet illustrates immutable data.

class Person
{
  // Note, set has been removed
  public string FirstName { get; }
  public string LastName { get; }
  
  public Person(string firstName, string lastName)
  {
    this.FirstName = firstName;
    this.LastName = lastName;
  }
}

static class Test
{
  public static void Run()
  {
    // Create a new person Fred Bloggs
    Person p1 = new Person("Fred", "Bloggs");
    
    Person p1New = new Person(p1.FirstName, "Smith");
    // Fred Bloggs remains unchanged, p1New contains new name, "Fred Smith"
  }
}

Using immutable data has the following benefits

Thread safety

Multiple threads can read data without using synchronisation. Immutable data is read only so data integrity is assured. If data must be modified (i.e. mutable) then thread synchronisation is required to ensure data integrity.
Good key elements

Certain data structures (dictionaries, hash sets, most tree structures, and so on) require keys to access values. Once inserted, the key should never change. Changing the original key will result in the original value being unobtainable.
Easier to reason about code

Functions tend to be free from side-effects as input parameters cannot be modified. Class invariant is established once (during construction) and remains unchanged throughout the duration of the program.
Easier to code

Immutable data is contructed once and cannot change during the lifetime of the program. Functions (and associated logic, validation, etc) to modify existing data are not required.
Improved caching

Caching is used when data creation is relatively expensive. Examples include reading configuration data, reading from a database and so on. In these scenarios you would expect the same data each time you ask the cache for a particular item.

Top

5. Avoid Primitive Obsession

Primitive obsession is where language primitives are used to describe a data type that is more complex than the primitive describing it. Most data types requires constraints. Examples include...

A person's age is typically between 0 and 100. (Could use upper value of say 150 to be sure)
An email address, as a minimum contains a @ character and a domain suffix (.COM, .NET, etc)
A Bank Account Number only contains digits
A unique database table row identifier is typically an integer (e.g. customer id)
An IPv4 address contains four, three digit integers.

Referring to the above examples. Using primitives,
1,3 and 4 might use integers, the remainder would likely use strings.

Consider a customer data type that may be persisted to a database consists of a unique id, first and last name and an email address. Using primitive data types we might code this as follows...


class Customer
{
  public int Id { get; }
  public string FirstName { get; }
  public string LastName { get; }
  public string Email { get; }
  
  public Customer(int id, string firstName, string lastName, string email)
  {
    this.Id = id;
    this.FirstName = firstName;
    this.LastName = lastName;
    this.Email = email;
  }
}

There are several problems with this code.

It is possible to inadvertently mix email with first or last name as they are all strings.
There is no size constraint on the name fields whereas there will be on the database side.
The email address could be supplied an invalid value.
Any of the string fields could be supplied with null or zero-length values.
The id field is a signed integer whereas unique identifiers tend to be positive integers.
There is no distinction between a customer id and other unique identifiers that we will likely require.

string email = "fredbloggs@hotmail.com"; string email2 = "test.exe"

The first email address might well be valid. The second is most definitely not a valid email address. However, using a string to represent an email address does not tell us if the email address is well-formed. The email data type is more complex than the string describing it.

A better way to model an email address might be as follows.

public class EmailAddress
{
  public string Value { get; }
  
  public static EmailAddress Parse(string value)
  {
    string input = string.IsNullOrEmpty(value) ? "" : value.ToLower();
    // validate value, throw exception if supplied value is not a valid email address
    return new EmailAddress(value);
  }
  
  private EmailAddress(string value)
  {
    this.Value = value;
  }
}

The above code implements a simple email address type. In reality more would be needed, such as testing for equality, hash code generation and so on. Also, I use a static function (Parse, in this example) to validate input before creating the type. The constructor is made private to ensure that the email type can only be constructed using the static creation functions (parse in this case). The following code is what I might typically use. This example implements the age type

public class Age
{
  // Exception specific to the Age class. Aids in debugging and logging.
  public class InvalidValueException : Exception
  {
    public InvalidValueException(string message) : base(message) { }
  }

  // Value will always contain a valid age due to smart construction and private constructor
  public byte Value { get; }

  // Smart construction validates age range.
  // Returns a valid age or throws an Age.InvalidValueException
  public static Age Create(int value)
  {
    if (value < 0 || value > 120)
      throw new InvalidValueException($"Age should be a value between 0 and 120");
    return new Age((byte)value);
  }

  // Smart construction validates parses age from a string.
  // Returns either a valid age or throws an Age.InvalidValueException
  public static Age Parse(string value)
  {
    int result = 0;

    if (!int.TryParse(value, out result))
      throw new InvalidValueException($"Unable to convert string, {value} to age");
    return Create(result);
  }

  private Age(byte value)
  {
    this.Value = value;
  }
}

Top

6. Never Use Global Variables

Did I say never? Generally (read almost always) speaking, global variables are bad, really bad, like, don't ever use them. The reason is simple. Unprotected global variables can be modified, anywhere, within the software. This makes tracking changes impossible. Of course data has to start somewhere, which, generally means that you might have one global variable. This single global variable typically contains the global state as used by the app.

A better idea is to have a bootstrap function that creates initial state and pass that on to the main app. If this is not possible then you should wrap the initial state and supply get and set accessors. This will at least ensure that you can add trace, breakpoints to anything that modifies the global state.

If you follow one of the strongest rules, always ensure functions receive dependencies as parameters, using global variables isn't such a big deal. What is a big deal, and what will come back to bite you as a huge problem is using global variables without passing as parameter(s).

Consider a simple example (in C#).

class Globals
{
  public static int ThisIsGlobal { get; set; } 
}

class Test
{
  public static void Func1()
  {
    Globals.ThisIsGlobal = 1;
  }
  
  public static void Func2()
  {
    Globals.ThisIsGlobal = 2;
  }
}

A somewhat contrived example, I agree. That said, I hope it proves a point. Throughout the duration of the software, Func1 or Func2 may be called. Both are bad functions in my opinion as firstly, they do not accept Globals as an input parameter. Secondly, both functions produce side-effects. The both modify the global state.

I probably need to, and will write more about this. I will say this, if you have global variables, have only one (it can be a structure a class, etc). Wrap the global variable so that potentially, if needs be you can log acess/modifications to said global variables. Better still, have a bootstrap file that creates initial state conditions and that state is passed to the main controller. The main controller will be a UI element in a UI-driven app, or an entry function in a non-UI app.

Top

7. Release First Draft ASAP

Releasing the first draft allows all stakeholders to determine if the software is on the right track. If it is great, move on to the next feature. If not, this is a great chance to redefine the original problem statement.

In the early days of software development, it was commonplace to write pages and pages of text that described the overall software purpose. Not only does this not work, my own experience shows that this approach also doesn't work. The reason why this doesn't work is that software requirements are not only complex but they also evolve as both stakeholders and engineers realise the system.

The initial release, prototype, if you will, is great for all involved to give an opinion. Again, in my experience, this is where the real requirements come about. So many times I have been given a set of requirements, delivered a basic solution, only to find that the final deliverable is not what the customer wanted. What a customer thinks they want and want they actually want, in my experience, are completely different.

I guess this is why we call this software and not hardware. Hardware is set in stone, cannot be changed. Software on the other hand...

Top

8. Release Often

Software design used to involve many drawing, static drawing, runtime analysis diagrams and so on. Basically design as much up front as possible, then, go away and code. The problem, gathering the requirements could take months at best. Knowing a little about human nature and a lot about customers. Customers never really know what they want, they have ideas as to what they want. This is not a bad thing it just means that pages and pages of requirements will never work.

So what is the answer. Regular deliverables. This allows stakeholders/customers to see the software in situ and see the software grow. Potential problems can be weeded out. You can never get this level of clarity in a written document. Sometimes, people just need to see!

Releasing software on a regular basis is also a good idea. You/team might spend a week developing a new feature (the outline only I hope). You then present this to the stakeholders. If they like, great, you can some question, get some depth, go away an implement a fill solution. If they hate it, you may have wasted 5 days development time. Not a big deal in the scheme of things. Even a failure is likely to bring out features. Sometimes when people see what they don't like, they have like a eureka moment and realise what they would like.

Top

9. Always Supply Function Dependencies

For maintainable software this a huge rule for me. Always supplying dependencies to function achieves the following

The function will (should) produce the same output given the same inputs
A (pure) function can only produce an output based upon its input

As an example, assume a function, SaveCustomer, that simply takes a customer object. What does save a customer really mean. Save to a database, save to a CSV file, XML file and so on.

In software some concepts are truly open ended. The concept of saving customer details is an example. Let's add code to see why.


class Customer
{
  public CustomerId Id { get; }
  public string FirstName { get; }
  public string LastName { get; }
}

Assume that the above code contains a single constructor where one specifies the id, first and last name. Now assume that we have a class, database that contains a function saves customer details. The class might look as follows


class Database
{
  public static void Save(Customer customer)
  {
    // code to save customer to a database
  }
}

You may even have a top entry class. I tend to do this and usually call the class App. My app class usually acts as a kind of gateway, even an API to allow other parts of my software to call important code. As an example I might have something like the following.


static class App
{
  private static readonly Database _db = new Database("connection string");
  
  public static void Save(Customer customer)
  {
    using (var trans = _db.Open())
    {     
      _db.Save(customer);
    }
  }
}

Of course, in the above code, the database is slightly different to the one I originally presented. My point was that at the top-level, not supplying dependencies is OK. In the above example, I don't supply the concrete database class. I specify this as a class static.

In summary, non-top level functions should always explicitly state inputs. Top-level functions can ignore this rule and create dependencies within the function body. This should only ever occur at top-level functions.

Top

Refactor Regularly

As a project progresses, you will most likely start to think that the original code leaves much to be desired. This has happened to me often (even though the code works and works well). This is not the end of the world, you simply need to refactor your code. Refactoring is similar to writing code from scratch using new knowledge. The real difference is that you will be replacing existing code with new code, possibly using new lessons learned--------------------------------------.

Following a release, I think it always worthwhile refactoring the existing code.

Software Development With Karl

Search This Blog

Friday, 21 May 2021

Tips To Improve Software Development

1. Write A Problem Statement

2. Break Problem Into Smaller Manageable Tasks

3. Limit Use of Third-Party Libraries

4. Use Immutable Data

Mutable State

Immutable State

Thread safety

Good key elements

Easier to reason about code

Easier to code

Improved caching

5. Avoid Primitive Obsession

6. Never Use Global Variables

7. Release First Draft ASAP

8. Release Often

9. Always Supply Function Dependencies

Refactor Regularly

Use Higher-Order Functions

Use Interfaces For Varying

Treat Exceptions And Anticipated Errors Differently

Favour Composition Over Inheritance

No comments:

Post a Comment

Use C# To Access Blog (Blogger.com)