Discrete choice modeling: Market share of electricity production technologies in an integrated assessment model

Integrated assessment models (IAMs) are complex simplifications of the world around us. They integrate social and economic systems, and biogeochemical and atmospheric chemistry models such that the sources of greenhouse gases are modeled side by side with the atmospheric processes that determine what happens to them. In the climate community IAMs are often used to determine the effect of increased emissions on social and economic systems, as well as to model the costs of emission mitigation measures.

At the Joint Global Change Research Institute (JGCRI) our model, Global Change Assessment Model (GCAM), began in the 1980's with the intention of forecasting future energy consumption. As might be expected, early models were limited by the available computing power, and faced data quality and availability issues. Modeling decisions, including parameter selection, were often made in an environment lacking much data. Instead, theory and best guesses sometimes guided model formulation and parameter selection. The model that determines the market share of electricity production technologies is one such sector of GCAM where parameters are based off of best guesses.

The current assignment of market share in the electricity production sector of GCAM uses a nested logit framework and assumes that errors are Weibull distributed. In the upper nests (i.e., branches) the model determines the fuel to be used by the production technology, while the lower nests (i.e., twigs) selects the particular electricity production technology. However, there are two issues with the present model configuration. First, the parameters of the logit formulation are not derived from historical data; instead, they represent best guesses. Second, the assumption of a Weibull distributed error term can lead to non-intuitive model results given that a non-zero probability of selecting a higher priced alternative exists. The goal of this project is twofold: first, replace the logit formulation as specified (i.e., errors distributed Weibull) with the McFadden logit formulation (i.e., errors distributed extreme value) and second, estimate the parameters using historical data.

The McFadden logit formulation is a multinomial discrete choice model, which has an assumption referred to as the independence from irrelevant alternatives (IIA). This assumption simply states that the difference in market share between alternatives should only be a function of the characteristics of the alternatives. We could envision a number of situations where this assumption might be violated. For example, if a car shopping consumer doesn't know how to drive a manual transmission then we wouldn't expect their decision between an automatic transmission and manual transmission vehicle to be solely a function of cost. Similarly, power plant operators looking to build a new power plant are likely to build power plants similar to those that they already operate (i.e., same fuel type) because it is expensive and time consuming to develop the skills necessary to optimally operate a power plant.

We find the assumption of independence from irrelevant alternatives (IIA) of the McFadden formulation to be violated. Given the violation of the IIA assumption, we use the nested logit structure of the current GCAM formulation whereby fuel choice and technology selection are two distinct levels of the model. The nested logit model assumes that the errors are distributed generalized extreme value. The nested logit model allows the error terms to be correlated within each nest but independent across nests. One approach to solving this model is to estimate it sequentially with the twigs estimated first and then the branches through the construction of a new variable referred to as the inclusive value (IV).

Estimating the model can be accomplished sequentially (or using full information maximum likelihood estimation). We can think of the probability of choosing a particular technology in the nested logit framework as the conditional probability of first choosing a particular fuel (i.e., nest) and then choosing a particular electricity generation technology in that nest. In the first step, we estimate the McFadden logit formulation for each nest. Next we generate the inclusive value (IV) parameter for each nest, which discrete choice theory identifies as the expected utility of the nest (in our case its more of an average cost). Finally, using the generated IV variable as data, we estimate another McFadden logit which is predicting the market share of each nest (i.e., available fuel choice).

We model the annual capacity addition for each production technology as a function of its investment and marginal costs using the McFadden logit formulation. More information on the data cleaning and transformation process, as well as the data itself and a draft of the resulting paper can be found in my github repository (https://github.com/andymd26/fuzzy-waffle). The first model we estimate uses the total cost, which we define as the sum of the marginal and fixed cost for each technology, as the covariate. We estimate this model formulation first because this is the current parameterization in GCAM. In this way we can compare parameters estimated using historical data with the best guesses currently guiding GCAM. Economic theory suggests that this version of the model would perform worse than other possible model formulations because according a rational actor might be willing to spend more on technologies with higher fixed costs if it meant lower marginal costs across the lifetime of the asset. However, this behavior is not captured in this version of the model.

For more detail on this study and its results please see my github repository.