An orangutan in sunglasses gives a thumbs up. — Cooler than you average primative

Semantic ‘Primitives’ in C#

8 min readJun 8, 2022

My very first post on Medium, how exciting. For it, I thought we could explore some ideas around trying to add more semantic surety to primitive types. In other words, making a strongly-typed class or struct where you’d ordinarily reach for a string or int.

Say you come across property public string Email { get; set; } Had the dev left no comments, you’d guess it probably means an email address. Though it could mean an email’s body? If you infer address, you still aren’t sure has that string already been validated as correctly-formed, or might it hold some raw, possibly bad, user-input?

Now let’s say you’re hoping to re-use or connect this email-address concept. Your project has a User class with an Email field, and you discover another Organization class with a string Email. Are the fields the same concept, same validity level, etc.? Can/should you tell your CSS to render them consistently? Your search-index to apply the same ranker, stemming, etc. to them consistently? It’s hard to say based on field-name alone, you’d prefer a type-name: EmailAddress.

Types then allow projections (EmailAddress[] CoAuthors → User.Email, engine re-use (html render of CoAuthors as mailto: links), policy compliance (EmailAddress is registered as a Personally-Identifiable Information field for GDPR handling), etc. And now, instead of needing to write (and evolve) validation and policy-handling unit-tests on CoAuthors and Users separately, we can write tests against EmailAddress alone.

This problem’s been wrestled with by many. Domain Driven Design may espouse a strongly-typed Value Object vs. the “primitive obsession” of string, etc. And the W3C’s Semantic Web has a raft of ontological RFCs (OWL, RDF, etc.) to give even individual fields a globally-unique namespace. But, … baby-steps. Let’s ease into this primitive-as-class concept.

The Wrapper Class

// Our first semantic primitive!
public record EmailAddress
{ public string Value { get; set; } }  // draft concept// Our first semantic-primitive client
public class User
{
    // Look! Strongly-typed! 
    public EmailAddress Email { get; set; }
    ...
}

You can’t assign a different Semantic Primitive, say City, to an EmailAddress. That’s nice (though it’s not likely top on the list of a developer’s burning problems). But, sadly, working with the field is harder: User.Email = new Email() { Value = emailStr }and Write(User.Email.Value). Worse, the default serialized JSON breaks back-compatibility { "User": { "Email": { "Value": "joe@example.com" } } }; your users will hate you. We gotta fix it (and later we will).

Why a class (or record) vs. a struct? The extra indirection and object overhead is unfortunate, but by using a class we allow sub-classing: You can make class VerifiedEmail : EmailAddress that your business-logic only instantiates once the user has acknowledged your confirmation-email. Later workflows needing a VerifiedEmail won’t compile if you hand them an non-verified EmailAddress so we get a notion of semantic sub-setting for free.

First, add validation logic. Until constructors check bounds and patterns, the semantic primitive is harder, not really safer.

public record SemanticProperty<TValue>
{
    protected TValue _value;
    
    protected SemanticProperty() { }    // bool param aids disambiguation
    // once we've introduced operators.
    protected SemanticProperty(TValue v, bool _)
        => this.Value = v;    // Minor optimization to skip re-validating other's _value.
    protected SemanticProperty(SemanticProperty<TValue> other)
        => this._value = other._value;    public TValue Value
    {
        get => _value;
        set => _value = ValidateOrThrow(value);
    }    protected virtual TValue ValidateOrThrow(TValue v) => v;
}public record EmailAddress : SemanticProperty<string>
{
    // Sadly these 3 ctors need repeating in each subclass
    public EmailAddress() {}
    public EmailAddress(string addr) : base(addr, true) { };
    public EmailAddress(EmailAddress other) : base(other) { }    protected override string ValidateOrThrow(string s) => 
        ...do your regex around '@', host dot-names etc...
}

The EmailAddress regex-validating its value is a nice win; we can eliminate a bunch of scattered (possibly inconsistent) validations done (or dangerously not done) by clients, and eliminate the corresponding unit-tests.

Templatizing the base-class gave us needed creativity. One can specialize ValidateOrThrow appropriately (e.g. Celsius can’t accept < -273.15.)

public class EmailAddress : SemanticProperty<string>   { ... }
public class ExpireDate   : SemanticProperty<DateTime> { ... }
public class Celsius      : SemanticProperty<float>    { ... }
public class TenantId     : SemanticProperty<Guid>     { ... }
public class FeatureVector<T> : SemanticProperty<T[]> { ... }

Immutability: We gave SemanticProperty’s Value property a get and a set however this doesn’t limit subclasses from being read-only via declaring an init.

public record SocialSecurityNumber : SemanticProperty<string>
{
    ...     public string Value { init => base.Value = value; }
}...
user.SSN.Value = "123-45-6789"; // <== COMPILE ERR

Casting will skirt this: ((SemanticProperty<string>)user.SSN).Value = “oopse”;, so you may prefer making the base’s setter protected then adding a public set to those sub-classes you want mutable.

Second, reduce usage overhead. Perhaps(and I know this bit is going to be polemic) you don’t want tons of Data Transform Objects or gratuitous new expressions peppering your libs and apps everywhere semantic- and plain-primitives come together. Someone consuming your new NuGet in which semantic-primitives debut won’t want to fix all the places where string was being assigned before. Enter type-conversion operators.

public record EmailAddress : SemanticProperty<string>
{
    public EmailAddress() { }
    public EmailAddress(EmailAddress other) : base(other) { }    // The 2nd bool arg is our foil to 
    // `operator string` making the base() call ambiguous
    public EmailAddress(string val) : base(val, true) { }    public static implicit operator string(EmailAddress self)
        => self.Value;    public static implicit operator EmailAddress(string val)
        => new() { Value = val };
}

The first operator allows left-hand side assignment, textBox.Text = User.Email , to just work transparently. The second operator allows a string on an assignment’s right-hand-side, User.EMail = legacyThingie.emailString, which is scary, yes, but at least we’re assured of getting the ValidateOrThrow safeguard.

But you can’t assign one SemanticProperty<string> sub-class (e.g. TwitterId) to another (EmailAddress), even with implicit operator string between them. So we’ve still demonstrably moved the dial towards safety.

In our VerifiedEmail example, subclassing EmailAddress, we might also allow a more intentional public static explicit operator VerifiedEmail(EmailAddress em). This at least requires one add a cast: onboardedUser.VerifiedEmail = (VerifiedEmail) candidateUser.Email; visually indicating the dev’s swimming outside safe waters. (I personally don’t deem this attention-grabbing/protective enough, but …maybe.)

Serialization and Remote Calls

Ok, what about the JSON problem? { “Email”: { “Value": “joe@example.com" } }will totally break our customers’ proxy-code expecting our old { “Email”: “joe@example.com" }responses.

Third, custom JsonConverter magic preserves the over-the-wire format.

using System.Text.Json.Serialization;
using System.Text.Json;public class SemanticPropertyConverter<TProp, TVal> 
    : JsonConverter<TProp>
      where TProp : SemanticProperty<TVal>, new()
{
    public override TProp Read(
            ref Utf8JsonReader reader,
            Type typeToConvert,
            JsonSerializerOptions options)
    {
        object? v = null;
        if (typeof(TVal) == typeof(string)) 
            v = reader.GetString();
        else if (typeof(TVal) == typeof(int)) 
            v = reader.GetInt32();
        else if (typeof(TVal) == typeof(bool)) 
            v = reader.GetBoolean();
        else if (typeof(TVal) == typeof(DateTime)) 
            v = reader.GetDateTime();
        // ...and so on...         return new TProp { Value = (TVal)v };
    }    public override void Write(
            Utf8JsonWriter writer,
            TProp value,
            JsonSerializerOptions options)
    {
        switch (value.Value)
        {
          case string x: writer.WriteStringValue(x); break;
          case int x: writer.WriteNumberValue(x); break;
          case bool x: writer.WriteBooleanValue(x); break;
          case DateTime x: 
              writer.WriteStringValue(x.ToString("o")); break;
          case null: writer.WriteNullValue(); break;
          default: 
             throw new NotImplementedException(
                $"Writer for type {typeof(TVal).FullName} " + 
                $"not implemented.");
        }
    }
}public class SemanticPropertyAttribute : JsonConverterAttribute
{
    public override JsonConverter CreateConverter(
            Type typeToConvert)
    {
        Type ldPropType = typeToConvert.BaseType;
        Type valType    = ldPropType.GetGenericArguments()[0];        // TODO: Activator.CreateInstance isn't highly performant;
        // consider some Emit.IL magic with type-lookup
        var converter =
          (JsonConverter)Activator.CreateInstance(
              typeof(SemanticPropertyConverter<,>)
                .MakeGenericType(new[] { typeToConvert, valType }),
              BindingFlags.Instance | BindingFlags.Public,
              binder: null,
              args: null,
              culture: null);        return converter 
          ?? throw new ApplicationException("Dang!");
    }
}

If you’ve spent some time looking at Microsoft’s docs for System.Text.Json converters, you’ve possibly seen code similar to this. The Activator + MakeGenericType voodoo may seem new, but hopefully will make sense as you ponder it. I’m not claiming this approach is perf-optimal (which might be a good subject for a later post), but hopefully shows how the round-trip could work.

Sadly, I didn’t see a good way to hang the SemanticPropertyAttribute just on the SemanticProperty base-class and have the JsonSerializer use it in derived-classes. You have to hang the attribute on every sub-class yourself.

[SemanticProperty]
public record ArrivedAt : SemanticProperty<DateTime>
{
   ...the typical ctor and operator code here...
} public class User
{
    public EmailAddress Email     { get; set; }
    public City         City      { get; set; }
    public ArrivedAt    ArrivedAt { get; set; }
}
...
User u = new() { 
  Email     = "joe@example.com", 
  City      = "Seattle",
  ArrivedAt = new DateTime(2018, 6, 12),
};
string userJson = JsonSerializer.Serialize(u);

yields {“Email”:”joe@example.com”,”City”:”Seattle”,”ArrivedAt”:”2018–06–12T00:00:00.0000000"}just what we wanted to see!

Miscellany

Null-Reference Support: Hopefully, you’re enabled nullable reference types on your projects (or #nullable enable on new files, at least. Please.), and if so, you’re seeing code-analysis squigglies on those generic args above. In the interest of article-focus, we ignored those, but I’d likely add nullability-attributes to try to reduce noise.

public record SemanticProperty<TValue> {
    [MaybeNull, AllowNull]
    protected TValue _value;
    ...
    protected SemanticProperty([MaybeNull] TValue v,bool _)
        => this.Value = v;    [MaybeNull, AllowNull]
    public TValue Value
    { get => _value; set => _value = Validate(value); }    [return: MaybeNull]
    protected virtual TValue Validate([AllowNull] TValue v) => v;    [return: MaybeNull]
    public static implicit operator TValue(
      SemanticProperty<TValue> self) => self.Value;...and so on...

Code-Gen of boilerplate: As the ctors, type-conversion operators, and the json-converter attribute are all repetitive boilerplate on sub-classes, you may wish to look at .net6’s source-generators to crank-out partial-class siblings of this stuff.

Alternative Implementations: Microsoft helps you out here with handy topics to generalize this approach and with Entity Framework support, e.g. Implement Value-Objects.

Another more scoped approach can be seen in the NuGet package ValueOf. (This one has an unusual compiled-lambda factory for making instances, over which I’m curious about the motivating case. They demonstrate IEquatable well, which we’ve left to record to shoulder for us.)

Conclusion

Employing a semantic-primitive instead of a raw primitive can help you better leverage the compiler’s strong-type guarantees, encapsulate validation, and possibly reduce unit-test duplication in your code. Implementations in .Net are not overly complex, as I hope you agree, but are not absolutely free of ceremony, so you may wish to use the concept judiciously in your project.

Whether you elect to or not, hopefully some of the techniques above were interesting to see. And the perspective that individual fields can be re-usable components, not just classes/structs, might have been enlightening to some.

I’m curious to hear from folks who may have tackled this a different way or to hear ideas how to refine this one. Thank you very much for reading.

Semantic ‘Primitives’ in C#

The Wrapper Class

Serialization and Remote Calls

Miscellany

Conclusion

Written by Norm Bryar