Saturday, 12 February 2022

C# To HTML

In this post I use Rosyln to covert C# code to HTML. Took me best part of a day to understand the Rosyln code. Not exactly straightforward.

The Code

The code is split into two parts, syntax tree walker, code renderer. There is then the code to compile c# code and convert to HTML. To date, comments, keywords and document comments (e.g. ///) are catered for.

IRenderCode interface

To keep things simple, this interface renders code based upon a supplied Token and string. Another way is to have a method for each token to handle. However, I tend to like smaller interfaces. The IRenderInterface is defined as follows...

Top

using CSharpToHtml.CodeRenderers;
namespace CSharpToHtml
{
  /// <summary>
  /// The Token enum lists all tokens we are interested in for C# to HTML conversion.
  /// </summary>
  public enum Token
  {
    Comment,
    DocumentComment,
    LiteralChar,
    LiteralString,
    Keyword,
    Text,
  }

  /// <summary>
  /// The IRenderCode interface renders token and supplied text.
  /// The Completed method is called once tokenisation is completed.
  /// </summary>
  public interface IRenderCode
  {
    void Render(Token token, string text);
    void Completed();
  }

  /// <summary>
  /// The FactoryRenderCode class creates specific renderer instances.
  /// </summary>
  public static class FactoryRenderCode
  {
    public static IRenderCode ToConsole() =>
      new RenderCodeConsole();

    public static IRenderCode ToHtml(TextWriter writer) =>
      new RenderCodeHtml(writer);
  }
}
Top

Rendering HTML

The HTML render class is as follows...


using System.Web;
namespace CSharpToHtml.CodeRenderers
{
  class RenderCodeHtml : IRenderCode
  {
    private readonly TextWriter _writer;
    private readonly Dictionary<Token, Action<string>> _handlers;    

    public RenderCodeHtml(TextWriter writer)
    {
      _writer = writer;
      _writer.Write("<pre class="code"><code>
");
      _handlers = new Dictionary<Token, Action<string>>
      {
        { Token.Comment, s => Write("Green", s) },
        { Token.DocumentComment, s => Write("Green", s) },
        { Token.Keyword, s => Write("Blue", s) },
        { Token.LiteralChar, s => Write("Red", s) },
        { Token.LiteralString, s => Write("Red", s) },
        { Token.Text, s => Write(s) },
      };
    }

    public void Render(Token type, string text)
    {
      _handlers[type](text);
    }

    public void Completed()
    {
      _writer.Write("</code></pre>");
    }

    private void Write(string text)
    {
      _writer.Write(HttpUtility.HtmlEncode(text));
    }

    private void Write(string color, string text)
    {
      _writer.Write($"<span style="color:{color};">");
      Write(text);
      _writer.Write("</span>");
    }
  }
}
Top

Syntax Tree Walker

To make all of this work, one needs a syntax tree walker. Using the CSharpSyntaxWalker was a good start. The first thing I wanted to do, for test purposes, was to traverse the syntax tree and print values so that the resultant content would look like my original code. This is what one might call a sanity check. The syntax walker class is as follows...


using Microsoft.CodeAnalysis;
using Microsoft.CodeAnalysis.CSharp;

namespace CSharpToHtml
{
  /// <summary>
  /// The SyntaxWalker parses supplied code (RenderCode) and walks tokens.
  /// The base class, CSharpSyntaxWalker, allows one to walk trivia as well.
  /// Trivia includes tokens such as whitespace, comments, etc.
  /// As such, token depth walking should suffice for the IRenderCode interface.
  /// </summary>
  public class SyntaxWalker : CSharpSyntaxWalker
  {
    // _model is not currently used
    private readonly SemanticModel _model;
    private readonly IRenderCode _render;


    /// <summary>
    /// Take the code to parse and the code renderer to render tokens.
    /// </summary>
    public static void RenderCode(string code, IRenderCode renderer)
    {
      // Create syntax tree from supplied code.
      var tree = CSharpSyntaxTree.ParseText(code);

      // Create a new compilation unit, gives access to the semantic model.
      var compilation = CSharpCompilation.Create(
        "MyCompilation",
        new[] { tree },
        new[] { MetadataReference.CreateFromFile(typeof(object).Assembly.Location) });

      // Create semantic model, a walker and node visitor.
      // Visit nodes then call renderer completed.
      var semanticModel = compilation.GetSemanticModel(tree);
      var walker = new SyntaxWalker(semanticModel, renderer);
      walker.Visit(semanticModel.SyntaxTree.GetRoot());
      renderer.Completed();
    }    

    /// <summary>
    /// Visit all tokens.
    /// A token may have leading or trailing trivia.
    /// Trivia is defined as whitespace or comments.
    /// </summary>
    /// <param name="token"></param>
    public override void VisitToken(SyntaxToken token)
    {
      // Process trivia that may lead the supplied token.
      if (token.HasLeadingTrivia)
        ProcessTrivia(token.LeadingTrivia);

      // Special case - token is a keyword.
      if (token.IsKeyword())
        _render.Render(Token.Keyword, token.ValueText);
      else
      {
        // get token type.
        var kind = token.Kind();
        switch (kind)
        {
          // token is a character.
          case SyntaxKind.CharacterLiteralToken:
            _render.Render(Token.LiteralChar, $"'{token.ValueText}'");
            break;

          // token is a string.
          case SyntaxKind.StringLiteralToken:
            _render.Render(Token.LiteralString, $""{token.ValueText}"");
            break;

          // don't care what the token is at this point.
          default:
            _render.Render(Token.Text, token.ValueText);
            break;
        }        
      }

      // Process trailing trivia (typically whitespace).
      if (token.HasTrailingTrivia)
        ProcessTrivia(token.TrailingTrivia);

      base.VisitToken(token);
    }

    private SyntaxWalker(
      SemanticModel model,
      IRenderCode walker) : base(SyntaxWalkerDepth.Token)
    {
      _model = model;
      _render = walker;
    }

    /// <summary>
    /// Process trivia (comments, whitespace or text.
    /// </summary>
    /// <param name="triviaCollection"></param>
    private void ProcessTrivia(SyntaxTriviaList triviaCollection)
    {
      foreach (var trivia in triviaCollection)
      {
        var kind = trivia.Kind();

        switch (kind)
        {
          // Single or multiline comments are rendered as a Token.Comment.
          case SyntaxKind.SingleLineCommentTrivia:
          case SyntaxKind.MultiLineCommentTrivia:          
            _render.Render(Token.Comment, trivia.ToString());
            break;

          // Document comments are rendered as a Token.DocumentComment.
          case SyntaxKind.SingleLineDocumentationCommentTrivia:
          case SyntaxKind.MultiLineDocumentationCommentTrivia:
            string text = "///" + trivia.ToString();
            _render.Render(Token.DocumentComment, text);
            break;
          default:
            _render.Render(Token.Text, trivia.ToString());
            break;
        }        
      }
    }
  }
}

Using the code

To generate HTML output the following code should be used.


using CSharpToHtml;
var code = File.ReadAllText("myfile");
using var writer = new StringWriter();
SyntaxWalker.RenderCode(code, FactoryRenderCode.ToHtml(writer));
string html = writer.ToString();
}

One can also pass C# code as a string to the SyntaxWalker.RenderCode method.

Summary

I showed how to use the C# syntax walker to pick out tokens for use when converting code. It should be noted that syntax highlighting for code in this post was genererated using the above code.