Scott E. Page

The Model Thinker

“Only by collecting a diverse and often mutually contradictory set of narratives can we eventually develop a more complete understanding of the crisis.” No single model suffices.11

cover the uses of models: to reason, explain, design, communicate, act, predict, and explore. These form the acronym REDCAPE, a notso-subtle reminder that many-model thinking endows us with superpowers.1

When constructing a model, we take one of three approaches. We can aim for realism and follow an embodiment approach. Such models include the important parts and either strip away unnecessary dimensions and attributes or lump them together. Models of ecological glades, legislatures, and traffic systems take this approach, as do climate models and models of the brain. Or we can take an analogy approach and abstract from reality. We can model crime spreading like a disease and the taking of political positions as choices on a left-right continuum.

The Mississippi River Basin Model Waterways Experiment Station, which covers nearly 200 acres near Clinton, Mississippi, is a miniature replica of the river’s basin built on a horizontal scale of 1:100.

For this reason, advocates of the critical design movement engage in speculative fictions to generate new ideas.19

For example, in the 1990s the Gates Foundation and other nonprofits advocated breaking up schools into smaller schools based on evidence that the best schools were small.5 To see the flawed reasoning, imagine that schools come in two sizes—small schools with 100 students and large schools with 1,600 students—and that student scores at both types of schools are drawn from the same distribution with a mean score of 100 and a standard deviation of 80. At small schools, the standard deviation of the mean equals 8 (the standard deviation of the student scores, 80, divided by 10, the square root of the number of students). At large schools, the standard deviation of the mean equals 2. If we assign the label “high-performing” to schools with means above 110 and the label “exceptional” to schools with means above 120, then only small schools will meet either threshold. For the small schools, an average score of 110 is 1.25 standard deviations above the mean; such events occur about 10% of the time. A mean score of 120 is 2.5 standard deviations above the mean; an event of that size should occur about once in 150 schools. When we do these same calculations for large schools, we find that the “high-performing” threshold lies five standard deviations above the mean and the “exceptional” threshold lies ten

In brief, pay for skill; do not pay for luck. Better-run corporations do in fact pay less for luck.

A reliance on data—and that often means linear regression models—can steer us toward marginal actions and away from big new ideas. A business, government, or foundation that gathers data, fits a linear regression model, and finds the variable with the largest statistically significant coefficient almost cannot stop itself from adjusting that variable and taking the marginal gain.

thinking is new-reality thinking. Big-coefficient thinking widens roads and builds high-occupancy vehicle lanes to reduce traffic. New-reality thinking builds train and bus systems. Big-coefficient thinking subsidizes computers for low-income students. New-reality thinking gives everyone a computer and reduces mail delivery to three days a week. Big-coefficient thinking changes the width of airline seats.

raise the driving age. That may work, but so too might more novel policies such as curfews that prohibit nighttime driving, automated monitoring of teenage drivers through smartphones, or limits on the number of passengers in teenagers’ cars. These new-reality policies might produce larger effect sizes than riding the big coefficient.

Most phenomena of interest are not linear. For that reason, regression models often include nonlinear terms such as age squared, the square root of age, or even the log of age. To account for nonlinearities, we can also arrange linear models end to end. These concatenated linear models can approximate a curve in much the same way as we can use straight-edged bricks to construct a curved path.

Your value will be not what you know; it will be what you share. —Ginni Rometty

The example reveals a possible disconnect between the percentage of seats a party controls and its power. Parties A and B control almost identical numbers of seats, but A has three times the power of party B, which has no more power than party C or party D.

add value. In thinking about the value of corporations or other multinational organizations, Shapley value may be a better measure. In these cases, exit may not be a viable option. An energy company participates in an energy generation game, an energy distribution game, a real estate game, an environmental game, an employment game, and so on. The company’s total added value equals the sum of its added values across the various domains.

Social networks have more equal distributions than the World Wide Web, the internet, and citation networks, all of which have long tails.

Network Statistics Degree: The number of neighbors (also the number of edges) of a node. Path length: The minimum number of edges that must be traversed to get from one node to another. Betweenness: The number of paths of minimal length connecting two other nodes that pass through a node. Clustering coefficient: The percentage of a node’s pairs of neighbors that are also connected by a edge.

**Tags:** orange

power-law network, has a power-law degree distribution. A handful of nodes has many connections, but most nodes have very few networks. A fourth type of network, a small-world network, combines features of geographic and random networks.5 To construct a small-world network, we begin with a geographic network and then “rewire” it by randomly selecting an edge and replacing one of the nodes it connects with a random node.

**Tags:** orange

We would expect planned networks to be robust to the failure of nodes. The fact that emergent network structures are robust is more of a puzzle.

**Tags:** orange

If we let the probability of connecting to a node be proportional to its degree, we produce a power-law degree distribution. In that model, early-arriving nodes will be far more likely to be of high degree. A shortcoming of the model is that it does not allow for any difference in node quality. Higher quality nodes should have higher degree. The quality and degree network formation model corrects that omission while also producing a long-tailed distribution.

**Tags:** orange