Saturday 27 November 2021

Fast File Traversal in C#

If you have tried to use the .NET method for traversing files and directories you have probably encountered problems. Problems include access denied exception and slow execution. File access seems to be implemented by .NET and NOT by the operating system. The code supplied in this blog was successfully run on Windows 10, 64 bit. The code was developed using Visual studio 2022 using .NET Core 5.0.

Why are the .NET methods troublesome?

In my opinion the .NET framework API contains some great code. The problem is that some API methods are somewhat lacking. The idea that traversing a file structure may throw exceptions seems a little bizarre.

Most of the methods for traversing the file structure use IEnumerable. Ultimately, using IEnumerable is a good idea. Problems arise however, due to the fact that an IEnumerable implementation, upon receiving an exception, tends to close the underlying stream. This is certainly the case for methods that obtain a list of directories or files.

I think the file system API, especially for traversal, is very poor. I should be able to view all directories and files. Attempting to open/modify said directories or files is a different matter (maybe this is where .NET file security should kick in?).

Top

Under the hood

Some of the .NET implementations appear to be wrappers over the Win32 API. The Win32 API, is old-school, C functions, of which there are many. Calling Win32 functions from a C# app incurrs overhead due to the marshalling of types. In my experiece, the overhead is not significant, but, may well be in tight loops. (So, just something to be aware of.)

Top

The code

using System;
using System.Collections.Generic;
using System.IO;
using System.Runtime.InteropServices;

public static partial class Functional
{
  #region Constants
  public static readonly IntPtr InvalidHandle = new IntPtr(-1);
  #endregion // Constants

  #region Data
  [StructLayout(LayoutKind.Sequential)]
  struct FILE_TIME
  {
    public uint Low;
    public uint High;
  }

  [StructLayout(LayoutKind.Sequential, CharSet = CharSet.Auto)]
  struct WIN32_FIND_DATA
  {
    public int Attributes;
    public FILE_TIME Created;
    public FILE_TIME LastAccessed;
    public FILE_TIME LastWrite;
    public int SizeLow;
    public int SizeHigh;
    public int Reserved;
    public int Reserved2;

    [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 260)]
    public string Name;

    [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 14)]
    public string AlternateName;

    public uint Type;
    public uint CreatorType;
    public uint FinderFlag;

    public bool IsValidDirectory()
    {
      int len = Name.Length;
      return len > 2 ||
        (len == 1) && (Name[0] != '.') ||
        (len == 2) && (Name[0] != '.') && (Name[1] != '.');
    }
  }
  #endregion // Data

  #region Imports
  [DllImport("kernel32.dll", CharSet = CharSet.Auto, SetLastError = true)]
  private static extern IntPtr FindFirstFile(
  string fileName,
  [Out] out WIN32_FIND_DATA data);

  [DllImport("kernel32.dll", CharSet = CharSet.Auto, SetLastError = true)]
  private static extern bool FindNextFile(
    IntPtr hndFindFile,
    [Out] out WIN32_FIND_DATA lpFindFileData);

  [DllImport("kernel32.dll")]
  private static extern bool FindClose(IntPtr handle);
  #endregion // Imports

  public static void FindFiles(string root)
  {
    WIN32_FIND_DATA fd = new WIN32_FIND_DATA();
    Stack<string> dirs = new Stack<string>();

    dirs.Push(root);

    while (dirs.Count > 0)
    {
      string curDir = dirs.Pop();
      string search = Path.Combine(curDir, "*.*");
      IntPtr handle = FindFirstFile(search, out fd);
      if (handle != InvalidHandle)
      {
        bool valid = true;
        while (valid)
        {
          bool isDir = 0 != (fd.Attributes & (int)FileAttributes.Directory);
          if (isDir && fd.IsValidDirectory())
            dirs.Push(Path.Combine(curDir, fd.Name));

          valid = FindNextFile(handle, out fd);
        }
        FindClose(handle);
      }
    }
  }
}
Top

Summary

First, note that the code does not yield/return useful results. The code simply serves as a starting point. It is fast, and can be modified accordingly to suit project needs.

Memory allocations (mainly strings as to be expected) are quite high. I tried using string interning, StringBuilder to build paths and so on. Some techniques yielded slightly less memory consumption, nothing so drastic as to make me think "yeah I should post that".

Top