Data Structures for Time Records

Posted by on in Blogs
Over the years, I've had a number of occasions to need to store information about when somebody worked on a certain job. Sometimes people will want to store the date the work was done and the number of hours the employee worked, where as other people will need to store the start time and the stop time (in other words, when the employee "punched in," and when the employee "punched out"). Often, the same customer will need to use both methods of recording time, depending upon the nature of the work recorded.

A first try


So here is a naïve attempt at creating metadata for this sort of data. I'm going to demonstrate this as a database table, but the issues involved are nearly identical when you create similar structures in code.

CREATE TABLE TimeRecords (
Id INTEGER NOT NULL PRIMARY KEY,
PersonId INTEGER [...],
WorkDate DATE,
StartTime TIME,
StopTime TIME,
TotalHours NUMERIC(18, 4));


This might look OK at first glance, but it turns out to be fairly badly broken. Before I discuss why, I should clarify the data types involved.

What isn't broken


One thing you might notice when looking at this metadata is that I've made most of the columns nullable. In my experience, it is very important to allow users to save work in progress, and to only do validations (such as requiring that the Person is filled in) when you actually try to do something important that the time record, such as submit it for payroll. For example, an employee could begin a time record when they "punch in" at the beginning of a shift, and finish the time record when they "punch out" at the end of a shift.

DATE and TIME in this DDL refer to the SQL Standard data types which contain, respectively, a date and a time only. The SQL Standard data type for a date and time in a single column is TIMESTAMP. Not all databases (or programming languages/environments) support such types. In many cases, you will have to do with only TIMESTAMP, or its equivalent, such as TDateTime in Delphi.

Note that TotalHours is NUMERIC, not TIME. It is important not to confuse a time span with a time of day; they are different units. In .NET, there is a TimeSpan type which is appropriate for this.

Problems with the naïve approach


Presuming that the user has filled in all of the required information for a time record, what would we expect to see in a typical row?

If the user is recording only the date and the number of hours worked, it seems pretty obvious: both the WorkDate and the TotalHours fields will be non-NULL, and the StartTime and StopTime columns will be NULL.

What if the user needs to record the start and stop times? Well, the WorkDate, StartTime and StopTime fields almost be non-NULL. But what about TotalHours? We could fill that in, but wouldn't it be redundant with the information in StartTime and StopTime? Or, worse, could we even be sure that all three fields are kept "in sync?" So we might be tempted to leave that NULL, but then, what if we want to do a SUM in SQL? By filling in TotalHours, we can compute an employee's total work time, even if some of the records are filled in with start times and stop times, but others were filled in with only the total hours worked.


























Id PersonId WorkDate StartTime StopTime TotalHours
0 100 2009-01-29 1:00 PM 1:30 PM 0.5
1 101 2009-01-29 NULL NULL 1.5

To compute the total hours on a record that has start and stop times, we might be tempted to subtract the start time from the stop time. But that doesn't work. An employee could work a shift from 8 PM to 2 AM, yielding a negative result for this calculation.

We could make a rule that we add a day to the stop time before subtracting if it is less than the start time, but this presumes that the employee's shift was less than 24 hours. Greater than 24 hour shifts may be uncommon, but they do happen; I once worked a 31 hour shift, fresh out of college.

What this means is that recording only the start and stop times without the start and stop dates does not give us sufficient information to compute the total hours worked in all cases, and makes it more difficult in many cases.

Some useful redundancy


Here's another way to store this data. This is what I generally do. But I'll forgive you if you're skeptical when you see it. I'll discuss some alternatives later.

CREATE TABLE TimeRecords (
Id INTEGER NOT NULL PRIMARY KEY,
PersonId INTEGER [...],
WorkDate DATE,
StartTime TIMESTAMP,
StopTime
TIMESTAMP,
TotalHours NUMERIC(18, 4));


First, it is no longer a problem to compute TotalHours from the StopTime and StartTime, if they're filled in. StopTime minus StartTime will always equal the correct total hours worked, when they are non-NULL. Also, it is no longer a problem to record a greater than 24 hour shift.

But isn't there some redundancy here? WorkDate is always going to equal the date portion of StartTime, when StartTime is non-NULL. Somebody once suggested to me that it would be possible to just use the date portion of the StartTime TIMESTAMP for a "TotalHours-only"time record (and drop the WorkDate column altogether). The problem with this idea is that a TIMESTAMP always contains time information, whether or not you bother to fill in. If you assign only a date, then the time portion is presumed to be midnight. So there is no way to distinguish between a record where the user intended to fill in only total hours and one where the user intentionally filled in midnight as the StartTime but forgot to fill in the StopTime. That will be a problem when you validate the record prior to submitting it for payroll.

Another way


I found that the approach above works well for me, but those who find the duplication of one date completely untenable might be tempted to do something along the lines of this metadata:

CREATE TABLE TimeRecords (
Id INTEGER NOT NULL PRIMARY KEY,
PersonId INTEGER [...],
WorkDate DATE,
StartTime TIME,
StopTime
TIMESTAMP,
TotalHours NUMERIC(18, 4));


There's no redundancy in this version. You can store both time records with start times and stop times or time records with just elapsed times, and it's easy to distinguish the two. Unfortunately, certain queries become more difficult. With the version which uses TIMESTAMP for the StartTime column, I can write a query like this to determine who was in the building at a particular point in time:

SELECT
PersonId
FROM
TimeRecords
WHERE
:SomeParameter BETWEEN StartTime and StopTime


Records without start and stop times will be ignored by this query, since they do not contain enough information to answer the question. With a TIME instead of a TIMESTAMP data type for the StartTime column, I would have to use an expression in the where clause to write this query:

SELECT
PersonId
FROM
TimeRecords
WHERE
:SomeParameter BETWEEN (
WorkDate + StartTime) and StopTime


That might well mean that my query can no longer be indexed.

Conclusion


Although my preferred schema has a small bit of redundancy, I find it significantly more usable.It is considerably less work to handle copying the date between the WorkDate and StartTime fields, and computing the TotalHours from the StartTime and StopTime, than it is to not have the date in either column.And not having dates in the StartTime and StopTime columns at all is just unusable.


Comments

  • Guest
    Heinz Z. Thursday, 29 January 2009

    Hallo Craig,

    why you need Workday at all when StartTime and StopTime are of type timestamp?

  • Guest
    Anthony Thursday, 29 January 2009

    Because the date that a work activity occurs on might not necessarily be the same date that the shift itself started on. Like pretty much everyone who works a night shift starting at 11 p.m..

  • Guest
    Fred Z Sunday, 15 February 2009

    Some time ago I wrote a Paradox based timekeeping system for a group of lawyers and their assistants. In the end multiple redundancies were the best answer, because of human inaccuracy. My pre-posting check routines demanded answers where there were conflicts, for example if time worked did not match start and stop entries.

    We also had multiple instances of a timer form, one started for each task, with the timer running only when the form for the task was on top. For a time we went completely stupid and recorded all of those time intervals separately in an effort to see how often people got interrupted and how accurately they clicked on the form when they changed tasks or estimated time. That got too complicated and the results were not good. Either accuracy of time estimate or recording discipline were not working.

    The result of all of it was that I have confidence only in the order of magnitude of time reported.

    Your lawyer feels the same way: next time you get a time itemized legal bill, tell your lawyer you want to review the time records and he'll probably offer you a discount on the spot.

  • Please login first in order for you to submit comments
  • Page :
  • 1

Check out more tips and tricks in this development video: