You are standing in one of your favorite places in the world - a whisky shop. You are surrounded floor to ceiling by containers providing olfactory joy. You want them all. But how do you choose? By distillery? Brand? Age? Filtration method?
After doing a full regression analysis of my whisky ratings I only need the brand, distillery, and ABV using the the following formula:
(Average Brand Score + Average Distillery Score)/2 + (Additional ABV over 40)/2.5 – 2.5
This formula gives me the estimated score of the whisky on a 0-100 scale. Average brand and distillery scores are simply the average of all my scores from that brand or distillery. If it is a new brand then the score is estimated by
93% of the Average Distillery Score + (Additional ABV)/2
and if it is a new distillery then the score can be estimated by
The Error
The error is rather large, +/-15, meaning that if the estimated
score is 50 there is a 32% chance the final score will be less than 35 or more
than 65. In addition, I also considered
region, bottler, owner, cost, age, filtration method, and color addition. The first three are not statistically
relevant and the error of the estimate with a complete analysis was +/-14.2. While it is a slightly more accurate estimate, I'll accept the added 6% error in exchange for time savings.
Region
Surprisingly, region was never a statistically significant
variable. The simple reason is that regional effects are taken into account
with the distillery score. But
even when I removed distillery (and the brand) from the regression, region was still not statistically
significant while age, cost, strength, filtration method and color addition and
bottler were significant.
Cost
Cost was usually statistically significant but varied
greatly between adding 0.022 to 0.56 points per Euro spent on the whisky. The underlying data also has an uncertainty because
it is the cost when I most recently tried the whisky which may be a few years
old by now and have not taken any recent price increases into account.
Age
Age was also a statistically significant variable both when
I used whiskies with only age statements or a simple flag indicating the whisky
has an age statement. It was much more
consistent than the cost and normally fell between 0.5 and 0.7 points added per
year old. Using an age statement flag added
1-2 points.
Non-Chill Filtration and Color Addition
Before starting this effort I polled social media and asked what
they considered. Many responses indicated
they looked for whiskies that were non-chill-filtered and no color added (NCF/NCA). My analysis showed non-chill-filtered whiskies
without color added score ~3 points higher than others. What was interesting is that when I removed brand
and distillery from the analysis and only used strength, age, cost and NCF/NCA,
whiskies that were NCF/NCA were lower than others by ~5 points. My guess is that because the production methodology
is very closely correlated with brand, much of the relationship between
production methods was already taken into account.
Conclusion
After performing over 30 regression analysis with different combinations of nine different variables, a only three variables (distiller, brand and strength) are needed to produce a quick estimation. The error of +/-15 means overall result is sufficiently accurate for gross differentiation but that I shouldn't take it too seriously! To keep this post readable and hopefully mildly entertaining I kept it short and concise but if you have any questions leave a comment or fine me on socials. Slainte!