LEGOs are very popular around here in the Twin Cities of Minnesota. They keep our kids, such as my grandson Archer, occupied during the long winter when our cold weather limits outdoor activity. See his creative solar-powered banana-research station pictured. No wonder the local Mall of America features a LEGO Imagination Center!
Thus, naturally, this recent Journal of Statistics and Data Science Education publication on “Building a Multiple Linear Regression Model with LEGO Brick Data” caught my eye. The article lays out a fun class-project by two Iowa State University Statistics Department Associate Professors—Anna Peterson and Laura Ziegler. They developed an “innovative activity that uses data about LEGO sets to help students self-discover multiple linear regressions” that “explore the relationship between the Amazon price and the number of pieces per set for two sizes of bricks, small and large.” The students start with graphical displays, then progress to simple linear regression, and, finally, develop models that uncover interactions of factors.
Using the spreadsheets provided by Profs Peterson and Ziegler, I used the Import tools in Design-Expert® software (DX) to reproduce their results.
First off, Graph Columns revealed a strong correlation (r=0.986) between the total number of pieces and the number of unique pieces per LEGO set—this being a measure of the potential cost for individual molds. Seeing this I decided not to include both factors in my modeling—going forward only with the total number of pieces, as did Peterson and Ziegler.
Next, I did a Design Evaluation of a polynomial model with the main effects of size (A), theme (B) and number of pieces (C), plus their three two-factor interactions (AB, AC and BC), and the quadratic term for the number of pieces (C2). The results revealed an aliasing between size and theme—only the Duplo came in the large size. Thus, theme dropped out of my focus.
I then deployed DX to do a regression on the model A, C, AC and C2. Residual diagnostics revealed via the Box-Cox plot that a log transformation would do significantly better. The only catch in this metric is a high Cook’s Distance for the large-pieced Duplo Modular Playhouse set—not a problem, per se, but curiously influential.
In the end I reproduced the interaction shown in Figure 4 of the publication, but with a bit of flair for some curviness and the addition of confidence bands as seen below.
You can see that the effect on price by the number of LEGO pieces depends greatly on size of the bricks. My conclusion is that going for the small sized LEGOs is by far the most cost-effective way to keep kids busy, provided them being old enough to do so safely and with the exceptional focus needed to make something out of them.
PS While researching this blog, I noticed that in just the few years from when the costs got gathered by Peterson and Ziegler, LEGO prices went way up on Amazon. Given the recent performance of stocks and bonds, you might do well by investing in these toys per January’s Research in International Business and Finance. See the highlights (average long-term return of 11%–better than gold!) at LEGO: THE TOY OF SMART INVESTORS.