ct covid trends - rolling averages, cumulative values, and correlation in dax and power bi

The state of Connecticut has been publishing daily (until a a few weeks ago) the daily change in values and current totals in COVID-19 related metrics - so I can began to keep track of these daily changes in a spreadsheet so that I could see trends in the data. Now there’s almost 6 months of data though I could’ve sworn March was just last week!

Keeping only a log of daily changes in key metrics in Connecticut - new cases, change in hospitalizations, new deaths, and new tests administered - gave us some insight into the direction we were moving in the last 24 hours, but how do we take these 4 values captured daily and look at a slightly bigger picture? This simple data set proved to be a fun exercise in a few common calculations:

  • Cumulative values

  • Correlation quick measure

  • Rolling Averages

Below is a few sample rows of the data - we’ll call this table Cases:

Date

New Cases

New Hospitalizations

New Deaths

New Tests

03/29/20

578

113

2

2,700

03/30/20

557

91

33

1,000

03/31/20

429

158

16

1,000

Let’s break down what’s going on in the data model - or you can jump to the bottom of the page to view the Power BI dashboard.

Cumulative Measures

Calculating cumulative values may not be that intuitive in Power BI compared to Excel, where you can easily write a formula in an adjacent column and drag it down. Power BI does have a set of quick measures that will calculate totals over a time range, such as year to date, but these quick measures use dax functions that would require the use of Power BI’s date hierarchy to work properly (such as totalytd()) . There’s an easy cumulative measure you can write that is not restricted to a specified time range hierarchy. This approach, however, will not work properly while using the Power BI date hierarchy, as this measure will only give you the cumulative value within that hierarchy - so contextually you will need to use the date value rather than the hierarchy.

The first measure we’ll write is to count (or sum) the values in the new cases column - need to warm up with an easy one!

Cases = 
sum('Cases'[New Cases])

You can embed this sum() function within the cumulative measure we’re going to write, but I used it as a separate measure as I may want to use it elsewhere in the report. Let’s write a measure for the other metrics as well - deaths, hospitalizations, and tests.

For the cumulative measure, we’ve got two options and we’ll leverage the Cases measure and the calculate() function:

Cumulative Cases = 
calculate(
    [Cases],
    filter(
        all('Cases'[Date]),
            'Cases'[Date] <= max('Cases'[Date]
        )
    )
)

This measure will give you the flexibility of not being tied to a specific year in the time dimension. The second option, below, is a bit more straightforward, but will leverage the time dimension to give you a year to date cumulative number:

Cumulative Cases YTD = 
calculate(
    [Cases],
    datesytd('Cases'[Date])
)

Correlation quick measure

Calculating the correlation coefficient in Excel is simple as you can just use the correl() function. However, there is no equivalent to this function in dax, so you’ll have to do the math. Fortunately, there’s a quick measure option that can create this for you.

When selecting the Correlation coefficient quick measure, Power BI will ask you for three parameters: Category, Measure X, and Measure Y. We want to find the correlation coefficient between cases and tests administered to see what kind of relationship they have. For this, we will use our Date value as the category, the sum of New Cases as Measure X, and the sum New Tests as Measure Y. Is there a relationship between the number of tests administered and the number of positive cases?

Finding the correlation coefficient will quantify a relationship between two sets of data. In English, that means we will be able to measure the relationship between two sets of values. The correlation coefficient will evaluate to a number between +1 and -1. The closer to +1 or -1, the stronger the relationship is. The sign of the value will determine the direction of the relationship.

The correlation coefficient, with data through August 27th, is -0.54. This shows a fairly weak relationship. For the relationship to be considered strong, we would want to see a -0.8 or lower, or 0.8 or higher. The negative sign indicates a negative relationship between the 2 numbers - meaning as one number increases, the other decreases. This weak negative relationship may imply that as tests administered go up, cases go down, but it is not statistically significant and also does not imply causation (the two sets of data do not influence each other, rather, they are just related).

Rolling Averages

Rolling averages are useful for viewing trends as it will smooth out outliers or spikes in data. This measure, similar to the cumulative measure, cannot be used with Power BI’s date hierarchy, so you’ll need to use just the Date column as the dimension in your visual.

For this measure, we’ll re-use the Cases measure from earlier.

New Cases Rolling Average = 
averagex(
    datesbetween(
        'Cases'[Date],
        dateadd(lastdate('Cases'[Date]), -4, DAY),
        lastdate('Cases'[Date])
        ),
    calculate([Cases])
)

We want to use the averagex() function here as this will allow us to evaluate an average for each row of the table. We don’t want to average the entire new cases column, we only want to average the last 5 days, and calculate this for each date, so this function works well here. The first parameter is to provide the table for the average function. We want to grab the last 5 days and so we want to include the current row as well as the 4 previous rows (or days) to get our 5 day rolling average. For this, we’ll use a combination of the datesbetween(), the dateadd(), and the lastdate() functions to grab all of the values between the latest date in the row (which would be the only date in that row, provided we have the Date column as a dimension in our visual). The datesbetween() function will give us a table of dates in the range we want, and the dateadd() function will help us define that range by adding negative 4 days to our current date - giving us a 5 day range for which to evaluate the expression in the last parameter - calculate([Cases]) - for which we’ll reuse our existing Cases measure we defined earlier.

Positivity Rate

To calculate the positivity rate - the percent of new tests that come back testing positive - we’ll use a simple calculation with the divide() function and reference the cases and tests measures we defined earlier. The divide() function’s third parameter allows us to define a value to use in case the measure tries to divide by zero - so we’ll make this zero so we don’t get an error.

Positivity Rate =
divide(
    [cases],
    [tests],
    0
)


The dashboard is embedded below - and I will continue to update this data daily (except for weekends when the state does not publish data). Not all measures were used to in the visuals on the dashboard - but they are in the data model should I (or other collaborators) choose to use them.

Previous
Previous

ct covid trends part 2 - % changes over time, time intelligence, and date hierarchies in dax and power bi