Building a Generic Statistics Library, Part 1: Interface
You can do a lot with Delphi 2009's generic types, but one thing that you can't do is specify an "operator constraint." This means that I cannot perform operations like +, , implicit type conversions, etc., on a type passed as a type parameter. In my last post on this subject, I suggested one possible solution, and experimented enough to show that it does work. But, as I noted in the post, I suspected it might not be the best way. In particular, I didn't like the fact that my previous approach used virtual methods, and it required creating a new "adder" type for each value type you wanted to use as a type parameter to the generic type. That's just too much work.
In the comments to another post on the subject of Delphi 2009 generics, Raymond asked for an implementation of MapReduce. I thought about how I might demonstrate that in the context of generics, and decided that implementing the higher order functions Fold and Map would be a slightly better demonstration than MapReduce. But since higher order functions can be a pretty abstract concept, I wanted to show an example of how you might use them, so I will use these functions to build a library that does some simple statistical operations, namely, computing the average and the standard deviation of a list of numbers.
By using generics, I can write the algorithms for average and standard deviation once and reuse them for any type which supports certain primitive operations, like addition and taking the square root.
I'm going to cover a lot of ground in this demo, so I'll be breaking it down into several posts. In today's post, I'm going to examine how data will be supplied to the generic statistical routines.
Most statistical functions take a list of numbers as input and supply one or more numbers as output. For example, the arithmetic mean, or average, function takes a list of numbers and returns a single number. A function which returns the mode or modes of a list would take as input a list of numbers and return a different list, which might have one entry, or many.
So right away, I have a problem: I'd like to take a list of numbers as an invalid argument to a function, but I don't know what type the numbers in the list will be, never mind the type of the list. Ideally, I would be able to write something like this:
function Average<A, R>(AData: IEnumerable<A>): R;
This declaration should mean that the function Average accepts any list of values of type A which can be enumerated in a for...in loop and returns a value of type R. However, there are a few roadblocks in the Delphi 2009 RTL which make this impossible.
While the Delphi 2009 RTL does define an IEnumerable<T> interface type, neither TList<T> nor its abstract parent type TEnumerable<T> implement this interface. If a trivial filesystem search can be believed, nothing else does, either. I've tried, unsuccessfully, to implement it myself; that's one of many things that I haven't figured out how to do in Delphi 2009 yet, and is perhaps the subject for another post.
My wild guess is that the most common type users of my statistics library will want to pass as the argument AData will be TList<A>. So I have to support that. But I'd like to support other types of lists, as well. It seems to me that the best I can do today is to specify TEnumerable<T> as the list argument type. Then, at least, a user of the library can use other subtypes of TEnumerable<T>, such as TStack<T>, TQueue<T>, etc.
So my function declaration now looks like this:
function Average<A, R>(AData: TEnumerable<A>): R;
This, unfortunately, means that the user of the library cannot supply input data in the form of an array, although it's trivial to write a class which exposes array data in the form of a TEnumerable<T> (using the facade pattern), so this isn't a huge loss.
In order to make the code I'm going to show simpler, I'll use the same type argument for the input and output. In other words, if the function is passed a list of integers, it will return the average as an integer. If it's passed a list of doubles, it will return the average as a double. This is probably not ideal for a generalpurpose statistical library, but, for the purposes of this demo, it makes the code quite a bit easier to understand.
That makes the function prototype look like this:
function Average<T>(AData: TEnumerable<T>): T;
However, since I have no way of specifying an "operator constraint" on T, and because specifying an interface constraint on T would prevent using my functions with primitive types, at least in Delphi 2009, and because I suspect that primitive types are the most common use case for a statistical library, I have to tell the function how to perform the mathematical operations it needs to do its work. In order to compute an average, I need to be able to add two values and divide two values. For reasons that I'll cover in a future post, it will also be useful if I can map an integer value into the type of the type argument, so I'm going to require a function for that, as well.
Here's the final version of the function's interface:
type
TBinaryOp<T> = reference to function(ALeft, ARight: T): T
TStatistics<T> = class
public
class function Average(const AData: TEnumerable<T>;
AAdder, ADivider: TBinaryOp<T>;
AMapper: TFunc<integer, T>): T; overload;
Calling methods with long lists of arguments can be kind of annoying, so I've marked a function as overloadable. This allows me to subtype the class if I want to, implementing the primitive operations, and introducing an overloaded call to Average that passes those implementations, like this:
TIntStatistics = class(TStatistics<integer>)
private
class function Add(ALeft, ARight: integer): integer;
class function Divide(ALeft, ARight: integer): integer;
class function Mapper(AValue: integer): integer;
public
class function Average(const AData: TEnumerable<integer>): integer; overload;
end;
That gives me much closer to the "ideal" function prototype I designed above, but requires a new subtype to be implemented. The nice thing about this current approach is that you can pick whichever solution you like: Implement a subtype if you want a simpler call to Average, or just pass the appropriate functions if you'd like to use a type parameter for which you don't have a subtype available.
Since I've covered quite a bit of material just on the subject of the interface for one function, I'm going to take a break for today. Stay tuned for part 2!
Comments

Raymond Thursday, 11 September 2008
Craig,
Thanks for taking the time out to work up this series of posts.
You mention that neither Tlist nor TEnumerable implement IEnumerable in the RTL.
Would this be hard to add as a modification to a local version of the RTL source?
Yes, I know everyone will say that will break things in deployed and third party packages etc, and they'd be right. But we don't use packages, and we recompile from source all our third party components so, so it's not a problem for us :)
Raymond. 
Fair enough  for some reason that explanation of the return type didn't register.
As far as op constraints in generics are concerned, I'll need to have word with my .NET colleagues all of whom seem to be labouring under a misunderstanding of .NET generics.
(I don't doubt you  a quick google seems to confirm that op constraints are also not supported in .NET) 
Please login first in order for you to submit comments
 Page :
 1
I am told that .NET has operator constraints. If so, it's a real shame that didn't find it's way into the Delphi Win32 generics implementation.
Too many concrete applications of generics, outside of simple containers, would seem to require a greater richness than Delphi 2009 generics currently provide.
I'm beginning to think that an approach based more on C++ templates would have been preferable (or at least a useful addition, if not instead of).
A slight tweak to the syntax might yet enable this:
Average(aData: IEnumerable): Double;
For which:
Average(..);
would be fine but
Average(..);
would fail to compile (when the compiler attempts to emit : sum of strings divided by number of strings).
The operator constraints approach could deal with things in this case more cleanly though:
Average(aData: IEnumerable): Double;
This is an offthecuff speculation at the sort of contraints specification that might be needed (i.e. T supports Addition of T yielding T and division by T yielding Double).
Assuming that Addition and Division indicate the calculation of a mean average, the result is surely a floating point of one form or another, not T. I've simply opted for Double in my decls above.