## Analyzing cohorts with Power BI (Stock market)

Recently I had an interesting conversation about the return of investment of different investment types. I was arguing that it would be difficult to lose money with stocks (such as index fonds based on the NASDAQ or DAX) if the shares are kept long enough. Being interested in data analysis and knowing this requirement as a special type of cohort analysis, I had to create this in Power BI. The intention was to (hopefully) prove the point that shares are a reliable form of investment in the long run. The result looks like a typical time-based cohort analysis with a noticeable diagonal:

This view is based on the ETF NASDAQ-100 (^NDX) that closely maps the NASDAQ-index. On the rows the year when the share was bought is listed, the columns show the year sold. The figures shown are the yearly returns on invested capital for that combination whereas the average prices of the years sold and years bought are used. When multiple years are in between the purchase and the sales, the interest on interest has to be calculated. The compound annual growth rate (CAGR) is an easy way to accomplish this computation:

CAGR = ( EV / BV)^{1 / n} – 1

where:

EV = Investment’s ending value

BV = Investment’s beginning value

n = Number of periods in between

First a few words on the business question. Between 1985 and 2015 the NASDAQ mostly showed an upturn-trend. The year 2000 and upcoming years are the obvious exception to this rule. Having invested in 2000 it would have taken 14 years until a negative break-even could be avoided. Similarly investments in 1999 took 12 years to recover, from 2001 9 years. Aside from this dot-com bubble positive return on investment was always reached after keeping the shares for more than 5 years.

### Implementation

The raw figures can be downloaded from here. I am only interested in the date column and the closing value what is why I only copied these two columns into my Excel-file:

I added another column Share in case that I want to be able to store other stocks as well.

I am using the Power Query Ribbon to import the data from that table:

Additionally I need to import two date-tables into my data-model:

The resulting data model is fairly simple:

It is worth mentioning that both relationships are inactive. I will create two measures that will express the price when a share was bought and the price when it was sold – as mentioned before I am calculating the average price:

Price Bought :=

CALCULATE (

AVERAGE ( [Close] ),

USERELATIONSHIP ( Shares[Date], ‚Date Bought'[Date Bought] )

)

Price Sold :=

CALCULATE (

AVERAGE ( [Close] ),

USERELATIONSHIP ( Shares[Date], ‚Date Sold'[Date Sold] )

)

Our final measure that appeared on the pivot tables is defined according to the definition of the compound annual growth rate:

ROI Per Year % :=

IF (

MIN ( ‚Dates Bought'[Date] ) < MIN ( ‚Dates Sold'[Date] ),

POWER (

DIVIDE ( [Price Sold], [Price Bought] ),

DIVIDE ( 1, [No of Years Between] )

)

– 1

)

Whereas [No Of Years Between] is defined as:

No of Years Between :=

MIN ( ‚Dates Sold'[Year] ) – MIN ( ‚Dates Bought'[Year] )

Since ROI Per Year % is based on the measures Price Bought and Price Sold that leverage the USERELATIONSHIP-function it is using a relationship from the fact-table to both date tables.

Adding the year-fields of both date-tables and that measures to the pivot-table…

…and adding conditional formatting will show the pivot-table from the beginning.

### Normalizing time

In order to facilitate answering our original questions (how long did it take at different investment times until a positive ROI was reached?) we could normalize the length of investment. Instead of showing the year of sales on columns it could be helpful to show the number of years since the shares were bought – similar to this example that we posted 2 years ago about normalizing top customers or products since inception.

First we need to add a simple table with a sequential list of numbers to our data model – it could be sourced from an Excel-Table:

Furthermore we need a way to identify the difference in years or months between entries of our date tables. We call this years and months relative – 0 reflects the current month/year, – 1 stands for one month/year before the current month etc. pp. This time we are using our date page in NAV to create these two columns. Alternatively it would be possible to use Power Query to create the columns or even Power Pivot as we did two years ago in the aforementioned blog-post.

Having this new column YearsRelative and our Periods Relative-table available, we can add the following measure:

Price Sold Relative :=

IF (

AND (

HASONEVALUE ( ‚Period relative'[Periods relative] ),

HASONEVALUE ( ‚Dates Bought'[YearsRelative] )

),

CALCULATE (

[Price Sold],

‚Dates Sold'[YearsRelative]

= VALUES ( ‚Dates Bought'[YearsRelative] )

+ VALUES ( ‚Period relative'[Periods relative] )

)

)

The important part is the filter condition of that calculate-expression. It is filtering the dates sold table to the extent that only years that match the year bought extended by the number YearsRelative will be considered.

The remaining measure follows the same logic as before:

ROI Per Year % Relative :=

IF (

AND (

AND (

HASONEVALUE ( ‚Period relativ'[Periods Relative] ),

[Price Sold Relative] > 0

),

[Price Bought] > 0

),

POWER (

DIVIDE ( [Price Sold Relative], [Price Bought] ),

DIVIDE ( 1, VALUES ( ‚Period relativ'[Periods Relative] ) )

)

– 1

)

The concepts of cohorts and normalized time can be transferred to many other domains. Cohort analysis are for example common to analyze subscriptions (publishing or software-as-a-service):