zak at the disco 9/2/20 zak at the disco 9/2/20

ct covid trends part 2 - % changes over time, time intelligence, and date hierarchies in dax and power bi

exploring time intelligence measures in dax and power bi to calculate % changes over time, use date hierarchies, and filtering using daily reported covid-19 data from connecticut

In Part I, we took daily updates from Connecticut’s reported numbers around covid-19 - new cases, deaths, change in hospitalizations, and new tests administered and looked at some trends. There’s other fun exercises we can do with this data and given that we have daily data for 6 months and counting, let’s summarize some of this data and looked at the percent monthly changes.

You can view the report at the bottom of the page, or keep reading as we break down what’s going on here!

Time intelligence calculations can be a bit tricky to figure out sometimes. There’s an important nuance to this - your date dimension needs to be continuous (daily) and range from the earliest to the latest date in your model. That also means that there needs to be date and record for everyday - if there are blanks (for example, no cases for a given day), that record still needs to have a date and show a zero.

Calendar / Date Dimension Table

Lucky for us, the daily data I’ve been capturing subscribes to this format. But what if it doesn’t? As a best practice, it’s a good idea to always create a separate date table to use for time intelligence calculations. It’s really easy. To automatically generate a date dimension table with the dates within the model, create a new calculated table with the below function:

Calendar = calendarauto()

The next step is to create a relationship between this new date table and the Cases table on the Date columns.

Since the Cases table follows the same format, in this scenario we’ll see a 1 to 1 relationship with a bi-directional filter. This means that the Cases table will filter the Calendar table and vice versa.

ct covid trends part 2 - img 1 - table relationship.jpg

You will also notice that the calendar icon next to our Date column in the fields pane is gone - but the Date column on the new Calendar table has it. This is referring to the date hierarchy within the model. The Date column on the Cases table no longer has the hierarchy because we’ve created a relationship between that column and the Calendar table - so the hierarchy (Year > Quarter > Month > Day) resides with the Date column on the Calendar table.

Percent % Change Over Time Measure

Now for the fun part - writing a measure that will dynamically calculate the percent change over a time period. I say dynamic here because we will use the date hierarchy in a visual and we want our users to be able to drill up and down in this hierarchy and we need the measure to take into account which level of the hierarchy we are in for its calculation. So what do we do? We’ll leverage variables in this next measure just so the dax is easier to understand and maintain.

Cases Change =
var prev_value =
if(hasonevalue('calendar'[date].[day]),
    calculate([cases],previousday('calendar'[date])),
    if(hasonevalue('calendar'[date].[month]),
        calculate([cases],previousmonth('calendar'[date])),
        calculate([cases],previousyear('calendar'[date]))
        )
    )
return
divide( ([cases] - prev_value), prev_value, 0)

The first section we’ll declare our variable prev_value and the goal here is to grab the sum of cases from the previous time period based on the level of the hierarchy that’s in our visual - day, month, or year. We’ll use the hasonevalue() function to check if the date column from our new calendar table has only one distinct value - and this function will return a boolean true or false based on this criteria. Since measures evaluate in the context of any filters applied both within the measure and by the visual, using the date column as a dimension in our visual provides that filter context.

In the column references for the hasonevalue() function, we’ll look for distinct values in each level of the date hierarchy. We’ll use a nested if statement to check for the three levels we want to use (day, month, year). Once an if statement evaluates to true, we’ll use the calculate() function to evaluate our Cases measure we created in Part 1, filtering for the corresponding previous time period using the functions previousday(), previousmonth(), and previousyear(). The result of this expression will be saved in our prev_value variable. This variable can be used inside of the expression following the return statement.

When it comes to writing the math for the percent change, we can really appreciate the use of variables in dax. The basic equation to calculate this value is to subtract the old value from the new value and then divide the result by the old value - so we’ll just swap in our measures and variables to make this happen. Since we will have time periods with no data or dates in a previous relative time period (data started reporting in March), we will have at least one instance of this equation trying to divide the current number of new cases by zero. This does not compute - so let’s wrap the equation in the divide() function. The third parameter in this function allows us to specify a value in case of division by zero. Let’s go with zero.

Let’s rinse and repeat this for all of our metrics so we can produce percent change over time for each of our metrics - deaths, hospitalizations, cases, and tests.

Once we graph these measures, we’ll notice that we have March in the data at 0% change (since we don’t have any February data) and the September value as of this writing only accounts for a few days - so we don’t have a full month to compare to August. Of course, we can just filter any date after March 31st out using the filter pane, and maybe add a footnote to the report to explain that the September (or any current month) would be comparing an incomplete set of data to make our users aware. We can also do this with measures so that the data model can make this determination for us regardless of the date range of data that gets loaded.

Filtering Date Dimension Using Measures

Complete Report Month =
if(
    endofmonth('cases'[date]) = endofmonth('calendar'[date]),
    1,
    0
)

We’ll need to be able to filter each visual based on the value this measure calculates and so we can’t rely on the boolean true/false values. The if statement will produce integers to represent the boolean values instead - 1 being true and 0 being false. To accomplish this, we’ll use the endofmonth() function to check what date is the last date in the month in the current filter context in both our calendar table and our cases table. This date will be missing from our data in the cases table for future dates, so any month after August 2020 will be flagged as false, or zero.

No Previous Month =
if(
    isblank(calculate([cases],previousmonth('calendar'[date]))),
        1,
        0
    )

To determine if we have data in the previous month to compare against, we’ll use the same approach to calculating the integer value of a boolean so that we can use this in a visual level filter in the filter pane. The calculate() function and the previousmonth() function will evaluate the existing cases measure for the previous month and the isblank() function will determine if the result of this expression has a value or not. If there is no data, isblank() will evaluate to true and we’ll slap a 1 on this record for true, and if the previous month does contain a value we’ll slap a zero on it for false.

With these two new measures, we can add these to the filter pane for each of our visuals and filter to only display months from the calendar table that have a complete month of reported data and exclude any months that do not have data in relative prior months. Cutting out the extra months on the graphs will clean up the look and give us a less distracted view of the percent change, month to month. Throw some conditional formatting on the columns to highlight the ups and downs and we’re good to go!

COVID-19 Trends - % Change Over Time Report

zak at the disco 8/29/20 zak at the disco 8/29/20

ct covid trends - rolling averages, cumulative values, and correlation in dax and power bi

deriving data using only connecticut’s reported daily changes in metrics to view trends - rolling averages, cumulative values, and correlation in dax and power bi

The state of Connecticut has been publishing daily (until a a few weeks ago) the daily change in values and current totals in COVID-19 related metrics - so I can began to keep track of these daily changes in a spreadsheet so that I could see trends in the data. Now there’s almost 6 months of data though I could’ve sworn March was just last week!

Keeping only a log of daily changes in key metrics in Connecticut - new cases, change in hospitalizations, new deaths, and new tests administered - gave us some insight into the direction we were moving in the last 24 hours, but how do we take these 4 values captured daily and look at a slightly bigger picture? This simple data set proved to be a fun exercise in a few common calculations:

Cumulative values
Correlation quick measure
Rolling Averages

Below is a few sample rows of the data - we’ll call this table Cases:

Date	New Cases	New Hospitalizations	New Deaths	New Tests
03/29/20	578	113	2	2,700
03/30/20	557	91	33	1,000
03/31/20	429	158	16	1,000

Let’s break down what’s going on in the data model - or you can jump to the bottom of the page to view the Power BI dashboard.

Cumulative Measures

Calculating cumulative values may not be that intuitive in Power BI compared to Excel, where you can easily write a formula in an adjacent column and drag it down. Power BI does have a set of quick measures that will calculate totals over a time range, such as year to date, but these quick measures use dax functions that would require the use of Power BI’s date hierarchy to work properly (such as totalytd()) . There’s an easy cumulative measure you can write that is not restricted to a specified time range hierarchy. This approach, however, will not work properly while using the Power BI date hierarchy, as this measure will only give you the cumulative value within that hierarchy - so contextually you will need to use the date value rather than the hierarchy.

The first measure we’ll write is to count (or sum) the values in the new cases column - need to warm up with an easy one!

Cases = 
sum('Cases'[New Cases])

You can embed this sum() function within the cumulative measure we’re going to write, but I used it as a separate measure as I may want to use it elsewhere in the report. Let’s write a measure for the other metrics as well - deaths, hospitalizations, and tests.

For the cumulative measure, we’ve got two options and we’ll leverage the Cases measure and the calculate() function:

Cumulative Cases = 
calculate(
    [Cases],
    filter(
        all('Cases'[Date]),
            'Cases'[Date] <= max('Cases'[Date]
        )
    )
)

This measure will give you the flexibility of not being tied to a specific year in the time dimension. The second option, below, is a bit more straightforward, but will leverage the time dimension to give you a year to date cumulative number:

Cumulative Cases YTD = 
calculate(
    [Cases],
    datesytd('Cases'[Date])
)

Correlation quick measure

Calculating the correlation coefficient in Excel is simple as you can just use the correl() function. However, there is no equivalent to this function in dax, so you’ll have to do the math. Fortunately, there’s a quick measure option that can create this for you.

When selecting the Correlation coefficient quick measure, Power BI will ask you for three parameters: Category, Measure X, and Measure Y. We want to find the correlation coefficient between cases and tests administered to see what kind of relationship they have. For this, we will use our Date value as the category, the sum of New Cases as Measure X, and the sum New Tests as Measure Y. Is there a relationship between the number of tests administered and the number of positive cases?

Finding the correlation coefficient will quantify a relationship between two sets of data. In English, that means we will be able to measure the relationship between two sets of values. The correlation coefficient will evaluate to a number between +1 and -1. The closer to +1 or -1, the stronger the relationship is. The sign of the value will determine the direction of the relationship.

The correlation coefficient, with data through August 27th, is -0.54. This shows a fairly weak relationship. For the relationship to be considered strong, we would want to see a -0.8 or lower, or 0.8 or higher. The negative sign indicates a negative relationship between the 2 numbers - meaning as one number increases, the other decreases. This weak negative relationship may imply that as tests administered go up, cases go down, but it is not statistically significant and also does not imply causation (the two sets of data do not influence each other, rather, they are just related).

Rolling Averages

Rolling averages are useful for viewing trends as it will smooth out outliers or spikes in data. This measure, similar to the cumulative measure, cannot be used with Power BI’s date hierarchy, so you’ll need to use just the Date column as the dimension in your visual.

For this measure, we’ll re-use the Cases measure from earlier.

New Cases Rolling Average = 
averagex(
    datesbetween(
        'Cases'[Date],
        dateadd(lastdate('Cases'[Date]), -4, DAY),
        lastdate('Cases'[Date])
        ),
    calculate([Cases])
)

We want to use the averagex() function here as this will allow us to evaluate an average for each row of the table. We don’t want to average the entire new cases column, we only want to average the last 5 days, and calculate this for each date, so this function works well here. The first parameter is to provide the table for the average function. We want to grab the last 5 days and so we want to include the current row as well as the 4 previous rows (or days) to get our 5 day rolling average. For this, we’ll use a combination of the datesbetween(), the dateadd(), and the lastdate() functions to grab all of the values between the latest date in the row (which would be the only date in that row, provided we have the Date column as a dimension in our visual). The datesbetween() function will give us a table of dates in the range we want, and the dateadd() function will help us define that range by adding negative 4 days to our current date - giving us a 5 day range for which to evaluate the expression in the last parameter - calculate([Cases]) - for which we’ll reuse our existing Cases measure we defined earlier.

Positivity Rate

To calculate the positivity rate - the percent of new tests that come back testing positive - we’ll use a simple calculation with the divide() function and reference the cases and tests measures we defined earlier. The divide() function’s third parameter allows us to define a value to use in case the measure tries to divide by zero - so we’ll make this zero so we don’t get an error.

Positivity Rate =
divide(
    [cases],
    [tests],
    0
)

The dashboard is embedded below - and I will continue to update this data daily (except for weekends when the state does not publish data). Not all measures were used to in the visuals on the dashboard - but they are in the data model should I (or other collaborators) choose to use them.