Power Analysis Code (Python)


A fundamental part of experimental economic research is writing pre-analysis plans, which serve as a commitment device for the hypotheses we wish to test and the number of participants we aim to recruit. In outlining the specific questions we are seeking to answer with a specific sample size, we can reassure future readers that the findings we describe are not a coincidence which we stumbled upon as we analyzed our data, lending increased credibility to the experimental results. The following sections describe one particular (preliminary) exercise conducted in the pre-analysis plan for a working paper of mine, Campos et al (2019).

Our project is a natural follow-up to a submitted project of mine, Goette et al (2019), in which we derive and test comparative static predictions of the KR model in the endowment effect context with heterogeneous gain-loss types. Specifically, we consider two types of agents: loss averse and gain loving. Loss averse individuals, roughly speaking, are defined as those who would accept a 50-50 gamble of +$10+x, -$10 for x>0; intuitively, these agents dislike losses around a reference point (assumed 0 here) more than equal sized gains, and would thus need to be compensated with a bigger payoff to accept the possibility of a $10 loss. The larger the x before they accept, the more loss averse. Gain lovers, in the same rough terms, would actually accept gambles where x<0 because, intuitively, they enjoy the surprise of the lottery, and enjoy gains above the reference point more than commensurate losses.

In Goette et al (2019), we show that lab participants previously measured to be gain loving vs loss averse respond quite differently to a treatment that is commonly used to test the KR model. Importantly, prior analysis of these types has tended to ignore the heterogeneity and test the treatment assuming that people are loss averse on average. Although this is empirically true, the 15-30% of gain lovers have an outsized role in aggregate treatment effects, which we uncover in our paper. The pre-analysis exercise described below applies this same experimental framework to a different domain, to explore whether these gain-loss classifications predict behavior in the real effort setting.

Power Analysis Overview and Code

Before we run our experiments in this new domain, we want to be sure that the theoretical predictions yield interesting, testable implications that can be measured with reasonable sample sizes. To get a feel for this, we run simulations on bootstrapped data, allowing us to recover expected treatment effect sizes under heterogeneous populations. From the data in Goette et al (2019), we obtain a distribution of gain-loss attitudes measured in a lab population — which we assume to be representative despite the domain change. From Augenblick and Rabin (2018), we obtain MLE estimates of the cost of effort function under a particular functional form assumption, using the same task as we will in this experiment (see Table 1 for parameter estimates).  With these distributions of parameters in hand, we have all the requisite information to generate simulated behavior under the KR model, specifically the CPE assumption.

For a range of sample sizes, we bootstrap from these distributions and generate simulated behavior, which we subsequently feed into our regression of interest. For each of the sample sizes we consider, we store the estimated regression coefficient as well as the minimum detectable effect size (approximated by 2.8*SE(coef) as in Page 16 of these JPAL slides), which we ultimately plot against the bootstrapped sample. This plot helps inform us of what types of effect sizes we can reject at different sample sizes; by examining our simulated effect sizes, we are able to map the results and determine the number of participants we require.

Note that there are a number of simplifications in this code, and the final sample size will be determined using a slightly different procedure. Specifically, the analysis herein assumes we know the gain-loss value (lambda), whereas in our study, we will estimate it from experimental data. Because this introduces additional noise, we expect attenuation bias in our parameter estimates. This, and other details that were skipped over, will be discussed at length in the pre analysis plan; as soon as it is posted, I will link to it.

This Python code implements the aforementioned procedure, generating a preliminary MDE curve.

Code Overview: Using data from Goette et al (2019) and Augenblick and Rabin (2019),
we bootstrap a distribution of gain-loss and cost of effort function parameters to
conduct a power analysis on our experimental hypothesis. Ultimately, we hope to
determine the requisite sample size to test our coefficient of interest with 80%
power at the 5% two-sided level.

Author: Alex Kellogg

#Import required modules for the ensuing analysis
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.formula.api as sm

We assume the distribution of gain-loss attitudes follows that in Goette et al (2019)
for  this analysis. This data was estimated via MLE in a prior project, and is
generally representative of the experimentally measured distributions throughout the

#Read in the relevant columns from full Goette et al (2019) lab data
ggks_dataset=pd.read_csv("dir/pooled_data.csv", usecols=['stage1_lambda','structla'])

Because our task is adopted from Augenblick and Rabin (2019), who have prior
MLE estimates of the cost of effort function given their assumed functional
form, we opt to model effort costs in the form below. This allows us to introduce
heterogeneity in a rigorous manner to both cost of effort and gain-loss attitudes.

def cost_of_effort(e, c_slope, c_curv):
    :param e: Effort, or the number of tasks (b/w 0 and 100).
    :param c_slope: Slope parameter, normalizing tasks to dollar costs.
    :param c_curv: Curvature parameter determining convexity of cost function.
    :return: (Negative of) Utility from completing e tasks.
    return (1/(c_slope*c_curv))*((e+10)**c_curv)

We assume individual utility follows KR06 CPE, so that there is no gains/losses
in the effort domain as the number of tasks are preselected, but uncertainty
in wages yields gain-loss comparisons against reference points. Specifically,
each feasible ex-ante outcome is compared to the others, weighted by their a-priori

def cpe_utils(e, c_slope, c_curv, wage, fixed, lam):

    :param e: The number of tasks considered.
    :param c_slope: Slope parameter, normalizing tasks to dollar costs.
    :param c_curv: Curvature parameter determining convexity of cost function.
    :param wage: Piece-rate (per task) rate of payment.
    :param fixed: Outside option, which is earned regardless of effort with 50%.
    :param lam: Gain-loss Attitude parameter, lam>1 implies loss aversion.
    :return: KR06 CPE utility of working e tasks given the preference parameters and wage rates.

    return 0.5*e*wage+0.5*fixed-0.5*(lam-1)*abs(fixed-wage*e)-cost_of_effort(e,c_slope,c_curv)

Unfortunately, there is no closed form solution to the problem of optimal effort with
fixed vs piece-rate wages. Thus, to determine the optimal utility given the parameters,
we conduct a grid search over the possible values of effort, which can range between 0
and 100 tasks. The alternative to a grid search is a series of elseif conditions to
determine how the Marginal Benefit and Marginal Cost curves relate to each other, but
the number of checks is extensive and thus the computational cost of the grid search
outweighs the speed but increased error rate of the checklist approach.

#Loop over the possible task choices for the agent given the parameters, and store the optimal
def optimal_effort_solver(c_slope, c_curv, wage, fixed, lam):
    :param c_slope: Slope parameter, normalizing tasks to dollar costs.
    :param c_curv: Curvature parameter determining convexity of cost function.
    :param wage: Piece-rate (per task) rate of payment.
    :param fixed: Outside option, which is earned regardless of effort with 50%.
    :param lam: Gain-loss Attitude parameter, lam>1 implies loss aversion.
    :return: The number of tasks that yields maximum CPE Utils.
    effort_space = np.arange(0, 100.1, 0.1)
    for a in effort_space:
        tempU= cpe_utils(a, c_slope, c_curv, wage, fixed, lam)
    utils_vec = np.asarray(utils_vec)
    max_ind = np.argmax(utils_vec)
    return effort_space[max_ind]

#Define a function the computes the between subjects interaction regression.
def treatment_effect_bootstrapped_between_lambda(data):
    :param data: Dataframe containing the relevant variables for the regression specification.
    :return: Regression Result.

    #generate a new effort variable that captures only the relevant effort given treatment


    result = sm.ols(formula="Effort ~Treatment+Lambda+Interaction", data=data).fit()
    return result

#Create a function to plot the Minimum Detectable Effect size
def mde_plotter(n_list, mde_list):

    :param n_list: np.array of the sample sizes we considered in the MDE analysis.
    :param mde_list: List of the mde's generated in the analysis.
    :return: Plot the MDE for each sample size.
    plt.xlabel('Bootstrapped N')

#Set up the parameters for our MDE analysis loop.

Iterate over each of the sample sizes and compute the MDE at that sample size.
To do so, we will sample bootstrap_N gain-loss, cost slope (constant here),
and cost curvature parameters from their assumed distributions, based on prior work.
We take this sample to be our experimental subject pool, and simulate the decisions
we would observe from these subjects if they decided according to KR CPE. We then run
our interaction specification on our full sample and record the parameter estimate for
coefficient of interest, as well as the mde, which is approximated by 2.8*SE(beta).
for bootstrap_N in bootstrap_N_list:
    #sample lambdas with replacement from the empirical distribution to create our own distribution
    id = np.random.choice(np.arange(len(ggks_dataset.stage1_lambda)), bootstrap_N, replace=True)
    sampled_lambda = np.asarray(ggks_dataset.stage1_lambda[id])
    sampled_structla = np.asarray(ggks_dataset.structla[id])

    #for the cost function, we borrow numbers from Table 1 (pg 29) in AR19 assuming the large sample properties of MLE.
    #that is, we take the estimated MLE and associated sd to be distributed normally, and draw from them.
    # cost slope represents phi in the AR cost function.
    cost_slope = [724] * bootstrap_N
    # cost curvature represents gamma.
    cost_curvature = np.random.normal(2.138, 0.692, size=bootstrap_N)

    #To cut the computation in half, we solve for the optimal effort in the condition for which this simulant will ultimately wind up.
#This is a between subjects regression, so it is unaffected.
    treatment_assignment=np.random.randint(2, size=bootstrap_N)

    for i in np.arange(0,bootstrap_N):
        if treatment_assignment[i]==0:
            effort_choices_l.append(optimal_effort_solver(cost_slope[i], cost_curvature[i], 0.25, 5, sampled_lambda[i]))
            effort_choices_h.append(optimal_effort_solver(cost_slope[i], cost_curvature[i],0.25,20,sampled_lambda[i]))

    #Define the dataframe to feed into the regression function.
        'Effort_Choice_Low_Fixed':list(map(float, np.asarray(effort_choices_l))),
        'Effort_Choice_Hi_Fixed':list(map(float, np.asarray(effort_choices_h))),
        'Lambda': sampled_lambda,
        'Structla': sampled_structla,
        'Cost_Slope': cost_slope,
        'Cost_Curvature': cost_curvature,
        'Treatment': treatment_assignment
    bootstapped_data = pd.DataFrame(df_cols)


#plot the MDE curve

The resulting plot is displayed below. Our median effect size is roughly 15 tasks in this particular simulation, which would suggest a sample of about 400 to be sufficient.


Augenblick, Ned and Matthew Rabin (2018). “An Experiment on Time Preference and Misprediction in Unpleasant Tasks”.
Goette, Lorenz, Thomas Graeber, Alexandre Kellogg, and Charles Sprenger (2018). “Het- erogeneity of Gain-Loss Attitudes and Expectations-Based Reference Points”.
Kőszegi, Botond and Matthew Rabin (2006). “A model of reference-dependent preferences”. In: The Quarterly Journal of Economics, pp. 1133–1165.
Kőszegi, Botond and Matthew Rabin (2007). “Reference-Dependent Risk Attitudes.” In: American Economic Review (4): 1047–73.

Understanding Workers’ Valuations of Various Amenities: A Summary of Mas and Pallais (Forthcoming)

The following summary and thoughts on Mas and Pallais (Forthcoming in the AER, 2017) is taken in part from a report I put together for a course in labor economics. In this study, the authors present their results from the first large-scale field experiment attempting to elicit workers’ valuations of specific amenities (e.g. working from home, flexible hours, flexible scheduling). The paper provides a critical foundation for future research in understanding workers’ preferences over a variety of work arrangements that are commonly offered by employers.

Overview of Mas and Pallais (forthcoming AER 2017)

To gather data on workers’ willingness to pay (WTP) for different amenities, the authors recruit staffers for a national call center for the purpose of administering surveys unrelated to this project. Advertisements were posted online in 68 large metro areas, and potential applicants were able to click-through into the application, wherein they (optionally) listed their race, ethnicity, and gender. Next, the applicants specified which of two job opportunities they would prefer: the “baseline” job at a specified wage or a “treatment” job at a potentially different wage. The main treatments included: work from home (ability to work from home, Mon-Fri 9am-5pm), flexible scheduling (ability to choose how to allocate 40 hours per week), flexible hours (ability to choose the amount of hours up to 40 hours per week), and employer discretion (the employer sets your schedule every week with a one week notice, and work times can include weeknights or weekends).

In order to estimate the distribution of worker’s WTP, the authors randomly selected wages and assigned them to one of the two jobs. For each pair the applicant saw, one job always had the maximum wage of $16 per hour (or 19, depending randomly on the city) while the other had a wage within 5 dollars of the maximum wage (+/- $0.25, $0.50, $0.75, … , $2.75, $3, $4,or $5). The applicants were told that this choice would not affect their hiring decision, and would only be seen by the employer after a hiring decision had been made. Thus, this field experiment is a between-subjects design with around 7,000 applicants – 150 of whom have been offered a job with the “best amenities” (maximum wage that applicant saw, the ability to work from home, and scheduling flexibility).

From this experiment, the authors learned that the majority of workers do not value scheduling flexibility (setting the total number of hours or setting the schedule for 40 hours per week), but, on average, workers were willing to take an 8% pay cut to work from home (see Fig 4 reproduced from the paper below). Not surprisingly, workers had a strong distaste for the employer discretion job offer: the average worker was willing to take a 20% lower wage to avoid these jobs, and close to 40% of applicants preferred the baseline job even if the employer discretion offered a 25% higher wage. Although these average effects are important, the amount of heterogeneity in valuations was striking and leaves room for further investigation. For more results, consult Figures 2-6 from Mas and Pallais (Forthcoming in the AER, 2017).

Mas Fig 4
Reproduced from Mas and Pallais (Forthcoming).

As with all field experiments, external validity is a natural concern; do these results only apply to the subsample of applicants observed in the data (people who would apply for a position as a survey administrator), or do they generalize to the population at large? To address this issue, the authors presented numerous supplemental experiments as well as additional empirical work. In particular, to obtain a more nationally representative sample (as opposed to the self-selected sample of phone survey applicants), the authors asked essentially the same questions (this time, completely unincentivized) to participants in the Understanding America Study (UAS) (a nationally representative Internet panel conducted by USC, with around 6,000 total households). The results from this alternative data source were consistent with the field experiment. Since these specific questions are hypothetical, however, the robustness of these results isn’t totally assured; nevertheless, the evidence that more nationally representative samples of workers had similar valuations (as well as a number of robustness checks included in the full paper) increases my confidence in their results.

Finally, the authors do some preliminary exploration of workers’ heterogeneity in WTP for the various arrangements. In particular, using the UAS (where they have more data on covariates), they determine that workers tend to sort into their preferred arrangements (those with the highest WTP for an amenity tend to pay for it), and find that mothers of young children value the ability to work from home twice as much as men.


Overall, this paper provides one of the first in depth analyses of worker’s valuations for different job arrangements. Although some literature exists on this topic, much of it is imprecisely estimated, which makes this field experiment all the more valuable as it presents a novel approach to an old question. Moreover, the authors are very thorough in their work, providing a multitude of robustness checks for each of their major findings. Finally, the nature of the data collection is very rich as it allows readers access to the raw WTP averages, from which a distribution can be estimated. I perceive these to be the major strengths of this paper.

As with all papers, however, there are some shortcomings. The main issue, as I see it, is the incentive structure behind the field experiment: the results would be stronger if the hired applicants received their actual choice between baseline and “treatment” instead of the highest wage and most liberally arranged job. In this way, the authors break from a traditional field experiment, and produce more of what might be called a “survey experiment in the field”. Luckily, the applicants were very unlikely to know the details of the final job offer (the authors offered each successful applicant a job with the “best amenities”), so there is almost certainly no inadvertent impact on the workers’ answers.

I’m also curious why they chose the occupation they did: although perhaps the authors expected lower skilled workers would value the amenities more, I suspect that college graduates might actually be willing to pay much more for these options. It would be interesting to see if that is the case, since it seems (from the supply side) that companies like Google or Facebook offer many of these amenities and flexibilities.

In any case, I am looking forward to learning more about the heterogeneity of valuations for amenities. In particular, I’m curious about the points of excess mass that the authors discovered in the Cumulative Distribution Function of workers’ WTP. Since the CDFs in the figures above represent the proportion of workers who are willing to pay $X or less for the amenity, the large spikes at certain prices are perhaps indicative of some behavioral phenomena. For example, these spikes could potentially represent the price associated with the uncertainty of switching amenities away from the default, the mental cost of making a decision, or a reference point of some sort.

Ultimately, I believe there are many interesting questions to be asked about all of the findings presented in this paper, and expect to see this literature rapidly expand in the coming years.


Mas A, Pallais AValuing Alternative Work Arrangements. American Economic Review. Forthcoming

Economics of Terrorism

This post is based on a recent lecture by Eli Berman, which was based in large part on the paper “Sect, Subsidy, and Sacrifice: An Economist’s View of Ultra-Orthodox Jews“.

Much of human behavior can be analyzed through the lens of economics, including religious practice. Adam Smith had some thoughts on the matter in “An Inquiry into the Nature and Causes of the Wealth of Nations“, where he argued that competition among religious sects would lead to less political clout for the church and harder working church officials.

However, not much progress had been made in the ensuing 216 years. Eventually, Laurence Iannaccone picked up the mantle in 1992 and revolutionized the economics of religion by modeling religion as a club good — meaning that it is non-rival but excludable. That is, multiple people can practice the same religion at the same time, but individuals can be forbidden from practicing within a certain sect or church (e.g. through ex-communication). Thus, religion invites the free-rider problem, wherein certain practitioners don’t contribute to the religious experience but still get benefits.

At this point, we turn our attention to Laurence Iannaccone’s groundbreaking work. According to his model, free-riding agents diminish the experience of the entire group, and would ideally (for the sect) be restricted from practicing in the future. That is to say, an ideal sect — from a group member’s perspective — is mostly filled with people who will devoted their time to the practice. This way, each of the members is willing to spend time taking care of the others within the sect based on anticipated reciprocity, providing a sort of insurance for members. Berman described an ideal sects in terms of an ideal study group: you want members to have read the papers you will be discussing, so you want to incentivize members to spend their time reading. This can be done by limiting study partners’ outside options (e.g. no drinking on weekends) or by expelling members who don’t contribute.

People who have relatively low wages (and thus a lower opportunity cost of time) are theoretically more willing to devote their time to religion — a hypothesis that is empirically validated (see this paper for more detailed proof). Thus, under this model, churches might want to attract low wage individuals in order to provide a better experience for the group as a whole. (Note that attracting richer individuals who substitute time for donations also plays a role in more complicated models, but recruiting lower wage individuals will nevertheless increase the benefits to joining a particular sect.) Since the church cannot observe their practitioners’ wages, how can they exclude high wage people masquerading as low wage people? In terms of the study group analogy, how can the members tell if a potential new recruit is willing to put in the time and do the reading?

It turns out that this is a signaling problem; thus the church thus has to design incentives such that low wage and high wage people self select (or separate) in equilibrium.  There are two ways a church goes about this: prohibitions and sacrifices. Strict dress codes, the barring of alcohol/caffeine/sex, and time commitments are examples of prohibitions and sacrifices imposed by some religions. Together, these tools can be used to weed out high wage earners who value their time relatively more. That is, high earners are more likely to prefer working more and forgoing these particular restrictions as opposed to joining the sect, dedicating a substantial amount of time, and following the strict rituals. Bringing back the analogy of the study group, setting up meetings on Friday or Saturday nights (thereby increasing the costs of going out drinking) might increase the likelihood that members read prior to meetings, thereby making meetings more productive.

Connecting the economics of religion to terrorism, Berman and Iannaccone argue in “Religious Extremism: The Good, the Bad, and the Deadly” that the aforementioned religious organizational structure is a major contributor to violent terrorism, more so than a belief in afterlife rewards or a specific theology.  To develop this intuition, consider a violent organization intent on conducting an act of terrorism. In order to succeed, the group must plan the attack, which requires coordination among the members. However, coordination invites the threat of defection, since any member could turn on the group and receive a large reward. How can these terrorist organizations reduce the likelihood of defection?

Just like the aforementioned radical sects, these violent organizations will seek members who are willing to sacrifice their time and succumb to prohibitions. In other words, radical sects provide a ready-made pool of ideal participants from the point of view of these terrorist groups; the core members of these sects are a self-selected pool of highly committed individuals with a low opportunity cost of time who are willing to endure various prohibitions to be part of the club. This idea is reflected in interviews of jailed terrorists, who tend to join their respective violent organizations for many of the same reasons that people join certain religions or political parties.

To test this idea, one can compare the violent behaviors of different sects within an overarching theology. Berman and Laitin’s “Religion, terrorism and public goods: Testing the club model” provides us with just this empirical test; their findings confirm that members of religious groups that require more sacrifices and prohibitive behaviors attempt significantly more attacks (and are more effective) than others with similar, less prohibitive beliefs.

So, where do we go from here? Since most of the violent organizations come from relatively impoverished countries, Eli Berman suggests that increasing access to public goods and property rights is fundamental. Providing more public goods will lead to less demand for the “clubs” that are religious sects and violent terrorist organizations, since would be members would have more alternatives to receive the benefits that these groups are relied upon to provide. In addition, improved property rights and contract enforcement would partially solve some of the missing market problems that incentivize people to join radical sects; if people don’t have to worry about losing their food supply or their home, joining a radical sect is relatively more costly. Finally, since all models are imperfect descriptions of reality, social scientists should continue to focus on these topics so that we may derive better policy going forward.




If you’re interested in learning more about this, I suggest reading the sources below or visiting Eli and Laurence’s websites to find some cool papers.


Berman, Eli, “Sect, Subsidy and Sacrifice: An Economist’s View of Ultra-Orthodox Jews,” Quarterly Journal of Economics, August 2000

Berman, Eli, and David D. Laitin. “Religion, Terrorism and Public Goods: Testing the Club Model.” Journal of Public Economics 92.10-11 (2008): 1942-967.

Iannaccone, Laurence R (1998) “Introduction to the Economics of Religion,” Journal of Economic Literature, 36, pp. 1465-1496.

Iannaccone, Laurence R. “Sacrifice and Stigma: Reducing Free-riding in Cults, Communes, and Other Collectives.” Journal of Political Economy 100.2 (1992): 271-91.

Iannaccone, Laurence R., and Eli Berman. “Religious Extremism: The Good, the Bad, and the Deadly.” Public Choice 128.1-2 (2006): 109-29.

Post, Jerrold, Ehud Sprinzak, and Laurita Denny. “The Terrorists in Their Own Words: Interviews with 35 Incarcerated Middle Eastern Terrorists∗∗This Research Was Conducted with the Support of the Smith Richardson Foundation.” Terrorism and Political Violence 15.1 (2003): 171-84.

Smith, Adam, An Inquiry into the Nature and Causes of the Wealth of Nations (Reprint of 1776 version) Modern Library: New York; 1965. Book V, Chapter I, Part III, Article III “Religious