Building a Generic Statistics Library, Part 1: Interface

Posted by on in Blogs
Let me start this post with an apology: I know that generics are a new technology in Delphi for Win32 and that many people might be looking for a simple, gentle introduction to using them.  However, I'm not going to write that sort of post, at least today.  Instead, I'm going to probe the limitations of Delphi 2009 generics, and work around them when possible.

You can do a lot with Delphi 2009's generic types, but one thing that you can't do is specify an "operator constraint." This means that I cannot perform operations like +, -, implicit type conversions, etc., on a type passed as a type parameter. In my last post on this subject, I suggested one possible solution, and experimented enough to show that it does work.  But, as I noted in the post, I suspected it might not be the best way. In particular, I didn't like the fact that my previous approach used virtual methods, and it required creating a new "adder" type for each value type you wanted to use as a type parameter to the generic type.  That's just too much work.

In the comments to another post on the subject of Delphi 2009 generics, Raymond asked for an implementation of MapReduce.  I thought about how I might demonstrate that in the context of generics, and decided that implementing the higher order functions Fold and Map would be a slightly better demonstration than MapReduce.  But since higher order functions can be a pretty abstract concept, I wanted to show an example of how you might use them, so I will use these functions to build a library that does some simple statistical operations, namely, computing the average and the standard deviation of a list of numbers.

By using generics, I can write the algorithms for average and standard deviation once and reuse them for any type which supports certain primitive operations, like addition and taking the square root.

I'm going to cover a lot of ground in this demo, so I'll be breaking it down into several posts.  In today's post, I'm going to examine how data will be supplied to the generic statistical routines.

Most statistical functions take a list of numbers as input and supply one or more numbers as output.  For example, the arithmetic mean, or average, function takes a list of numbers and returns a single number.  A function which returns the mode or modes of a list would take as input a list of numbers and return a different list, which might have one entry, or many.

So right away, I have a problem: I'd like to take a list of numbers as an invalid argument to a function, but I don't know what type the numbers in the list will be, never mind the type of the list.  Ideally, I would be able to write something like this:
function Average<A, R>(AData: IEnumerable<A>): R;

This declaration should mean that the function Average accepts any list of values of type A which can be enumerated in a loop and returns a value of type R. However, there are a few roadblocks in the Delphi 2009 RTL which make this impossible.

While the Delphi 2009 RTL does define an IEnumerable<T> interface type, neither TList<T> nor its abstract parent type TEnumerable<T> implement this interface.  If a trivial filesystem search can be believed, nothing else does, either. I've tried, unsuccessfully, to implement it myself; that's one of many things that I haven't figured out how to do in Delphi 2009 yet, and is perhaps the subject for another post.

My wild guess is that the most common type users of my statistics library will want to pass as the argument AData will be TList<A>. So I have to support that.  But I'd like to support other types of lists, as well.  It seems to me that the best I can do today is to specify TEnumerable<T> as the list argument type.  Then, at least, a user of the library can use other subtypes of TEnumerable<T>, such as TStack<T>, TQueue<T>, etc.

So my function declaration now looks like this:
function Average<A, R>(AData: TEnumerable<A>): R;

This, unfortunately, means that the user of the library cannot supply input data in the form of an array, although it's trivial to write a class which exposes array data in the form of a TEnumerable<T> (using the facade pattern), so this isn't a huge loss.

In order to make the code I'm going to show simpler, I'll use the same type argument for the input and output.  In other words, if the function is passed a list of integers, it will return the average as an integer.  If it's passed a list of doubles, it will return the average as a double.  This is probably not ideal for a general-purpose statistical library, but, for the purposes of this demo, it makes the code quite a bit easier to understand.

That makes the function prototype look like this:
function Average<T>(AData: TEnumerable<T>): T;

However, since I have no way of specifying an "operator constraint" on T, and because specifying an interface constraint on T would prevent using my functions with primitive types, at least in Delphi 2009, and because I suspect that primitive types are the most common use case for a statistical library, I have to tell the function how to perform the mathematical operations it needs to do its work.  In order to compute an average, I need to be able to add two values and divide two values.  For reasons that I'll cover in a future post, it will also be useful if I can map an integer value into the type of the type argument, so I'm going to require a function for that, as well.

Here's the final version of the function's interface:

  TBinaryOp<T> = reference to function(ALeft, ARight: T): T

  TStatistics<T> = class
class function Average(const AData: TEnumerable<T>;
AAdder, ADivider: TBinaryOp<T>;
AMapper: TFunc<integer, T>): T; overload;

Calling methods with long lists of arguments can be kind of annoying, so I've marked a function as overloadable.  This allows me to subtype the class if I want to, implementing the primitive operations, and introducing an overloaded call to Average that passes those implementations, like this:

  TIntStatistics = class(TStatistics<integer>)
class function
Add(ALeft, ARight: integer): integer;
class function Divide(ALeft, ARight: integer): integer;
class function
Mapper(AValue: integer): integer;
class function
Average(const AData: TEnumerable<integer>): integer; overload;

That gives me much closer to the "ideal" function prototype I designed above, but requires a new subtype to be implemented.  The nice thing about this current approach is that you can pick whichever solution you like: Implement a subtype if you want a simpler call to Average, or just pass the appropriate functions if you'd like to use a type parameter for which you don't have a subtype available.

Since I've covered quite a bit of material just on the subject of the interface for one function, I'm going to take a break for today.  Stay tuned for part 2!
Tags: Delphi
Comments are not available for public users. Please login first to view / add comments.