In a recent piece of work the concepts of “Today” and “Yesterday” were introduced to the analytical toolset. There were a few other date concepts to bring quick groupings of temporal data closer to end-users by putting them into plain and simple English. What could go wrong, you ask?It turns out more than was expected.
The load from source runs nightly and the ETL view of “today” is “when it is running the process”. The clients view of “today” is “when I’m querying the model” which usually is the day after. Shifting the ETL view of “today” might seem like an intuitive solution to the problem, but it isn’t. The data for “today” is not there yet and won’t be there until it is too late for “today” to be useful.
That has a worse impact when you consider all rolling date periods are dragged along with today. The impact to end-users, both from an experience and data accuracy perspective is entirely undesirable.
Moreover, the testing process made assumptions about the date periods and looked at it out of context for normal use. Considering the dates alone is guaranteed to return accurate results, because the count of days in a rolling 7 day period should be 7 days. The better questions to add to that would be “Are they the right 7 days?” and “Do we have data to support these 7 days?”
Fortunately these are trivial problems to resolve, but it goes to show a simple sounding change can have more consequences than you see on the surface.