Sunday, January 21, 2024

Whisky Rating Estimation

You are standing in one of your favorite places in the world - a whisky shop.  You are surrounded floor to ceiling by containers providing olfactory joy.  You want them all.  But how do you choose?  By distillery? Brand? Age? Filtration method? 

After doing a full regression analysis of my whisky ratings I only need the brand, distillery, and ABV using the the following formula:

(Average Brand Score + Average Distillery Score)/2 + (Additional ABV over 40)/2.5 – 2.5

This formula gives me the estimated score of the whisky on a 0-100 scale. Average brand and distillery scores are simply the average of all my scores from that brand or distillery.  If it is a new brand then the score is estimated by 


93% of the Average Distillery Score + (Additional ABV)/2 

and if it is a new distillery then the score can be estimated by 

96% of the Average Brand Score + (Additional ABV)/3.

The Error

The error is rather large, +/-15, meaning that if the estimated score is 50 there is a 32% chance the final score will be less than 35 or more than 65.  In addition, I also considered region, bottler, owner, cost, age, filtration method, and color addition.  The first three are not statistically relevant and the error of the estimate with a complete analysis was +/-14.2.  While it is a slightly more accurate estimate, I'll accept the added 6% error in exchange for time savings.

Region

Surprisingly, region was never a statistically significant variable. The simple reason is that regional effects are taken into account with the distillery score.  But even when I removed distillery (and the brand) from the regression, region was still not statistically significant while age, cost, strength, filtration method and color addition and bottler were significant.  

Cost

Cost was usually statistically significant but varied greatly between adding 0.022 to 0.56 points per Euro spent on the whisky.  The underlying data also has an uncertainty because it is the cost when I most recently tried the whisky which may be a few years old by now and have not taken any recent price increases into account.

Age

Age was also a statistically significant variable both when I used whiskies with only age statements or a simple flag indicating the whisky has an age statement.  It was much more consistent than the cost and normally fell between 0.5 and 0.7 points added per year old.  Using an age statement flag added 1-2 points.

Non-Chill Filtration and Color Addition

Before starting this effort I polled social media and asked what they considered.  Many responses indicated they looked for whiskies that were non-chill-filtered and no color added (NCF/NCA).  My analysis showed non-chill-filtered whiskies without color added score ~3 points higher than others.  What was interesting is that when I removed brand and distillery from the analysis and only used strength, age, cost and NCF/NCA, whiskies that were NCF/NCA were lower than others by ~5 points.  My guess is that because the production methodology is very closely correlated with brand, much of the relationship between production methods was already taken into account.

Conclusion

After performing over 30 regression analysis with different combinations of nine different variables, a only three variables (distiller, brand and strength) are needed to produce a quick estimation.  The error of +/-15 means overall result is sufficiently accurate for gross differentiation but that I shouldn't take it too seriously!  To keep this post readable and hopefully mildly entertaining I kept it short and concise but if you have any questions leave a comment or fine me on socials.  Slainte! 

No comments:

Post a Comment

Blog Archive