Introduction and Literature Review - Lafayette College



Defining GeogRaphic Markets from Probabilistic Clusters:A Machine Learning Algorithm Appliedto Supermarket Scanner DataDraft: 9/19/2019By Stephen Bruestle,Luca Pappalardo, and Riccardo Guidotti NOTEREF _Ref18865531 \h \* MERGEFORMAT 3Abstract: We propose that we estimate geographic markets in two steps. First, estimate clusters of transactions interchangeable in use. Second, estimate markets from these clusters. We argue that these clusters are subsets of markets. We draw on both antitrust cases and economic intuition. We model and estimate these clusters using techniques from machine learning and data science. We model these clusters using Blei et al.’s (2003) Latent Dirichlet Allocation (LDA) model. And, we estimate this model using Griffiths and Steyvers’s (2004) Gibbs Sampling algorithm (Gibbs LDA). We apply these ideas to a real-world example. We use transaction-level scanner data from the largest supermarket franchise in Italy. We find fourteen clusters. We present strong evidence that LDA fits the data. This shows that these interchangeability clusters exist in the marketplace. Then, we compare Gibbs LDA clusters with clusters from the Elzinga-Hogarty (E-H) test. We find similar clusters. LDA has a few identifiable parameters. The E-H test has too many parameters for identification. Also, Gibbs LDA avoids the silent majority fallacy of the E-H test. Then, we estimate markets from the Gibbs LDA clusters. We use consumption overlap and price stationarity tests on the clusters. We find four grocery markets in Tuscany. JEL Codes: L100, D400, C380, L400, C150Keywords: defining markets, clustering, interchangeable of use, machine learning, Latent Dirichlet Allocation (LDA), Gibbs Sampling (Gibbs LDA), bags of products, Elzinga-Hogarty test, elbow method, sampling methods, consumption overlap, antitrust markets, economics markets, Markov Chain Monte Carlo (MCMC), silent majority fallacyIntroduction and Literature ReviewMarket definition is a form of clustering. Clustering is organizing objects into groups whose members are similar according to some measure. Marketing definition is organizing transactions into “markets” whose members are similar according to some measure. All methods used to define markets are forms of clustering. Numerical tests are forms of clustering. Human judgement is a form of clustering.In antitrust cases, the size of the markets often determines the outcome of the case. For example, in the United States v. Philadelphia National Bank, the district court defined the market as a broad geographic area, where there were many competitors so the merged firm could not abuse market power. Therefore, the banks were allowed to merge. Then, the Supreme Court overturned this decision. They defined the market as a narrower geographic area, where few firms compete. This would give the merged firm the ability to abuse its dominant position by raising prices. Therefore, the Supreme Court disallowed the merger. You should not confuse this with a cluster market. A cluster market is a different concept. A cluster market is where each firm sells a group of complements. For example, grocery stores are cluster markets. They sell a group of complements, which include meat, milk, vegetables, and so on. [Table 1 about here.]Table 1 shows the leading tests used to define markets. All these tests are forms of clustering.Marshall (1920) defined markets based on the law of one price; that is the “prices of products in the same market tend to equality with due allowance for transportation cost.” We call clusters based on the law of one price: economics markets.The law of one price comes from arbitrage. For example, suppose products 1 and 2 are identical. Suppose store 1 sells product 1 for p1. Store 2 sells product 2 for p2>p1. Then someone could make a profit through arbitrage—buying product 1 and selling it in front of store 2. Or, equivalently, all consumers would buy from store 1 and not store 2. This would induce store 1 to increase p1 and store 2 to lower p2 until p1=p2.We use several standard clustering methods based on the law of one price. These include price correlation comparison (e.g. Nestlé–Perrier merger), stationary tests (e.g. Stigler and Sherwin, 1985; Forni, 2004), Granger causality tests (Cartwright et al., 1989; Slade, 1986), and natural experiments on price movements (Davis and Garcés, 2009, pg. 185-188).The 1984 Horizontal Merger Guidelines define markets based on a hypothetical monopolist test; that is a market is the smallest area or group of products where a hypothetical, profit-maximizing monopolist would impose a 'small but significant and nontransitory' increase in price. We call clusters based on a hypothetical monopolist test: antitrust markets.For example, suppose Y products form a market. Suppose product 2 is the closest substitute to product 1. Product 3 is the second closest substitute to product 1. And, so on. Let the perfectly competitive price be pC. Suppose a single firm sells products 1 through y. Let the single firm’s profit for setting a price of p be πp. Then the antitrust market is the smallest y such that π1.05pC>πpC. We use several standard clustering methods based on this concept. These include the Small but Significant and Nontransitory Increase in Price (SSNIP) test (1984 Merger Guidelines), Critical Loss Analysis (Langenfeld and Li, 2001), and the Full Equilibrium Relevant Market (FERM) test (e.g. Ivaldi and Lorincz, 2005).We often cannot find markets from just one of these tests. Defendants and plaintiffs often combine tests. They often blend market definitions. Sometimes they use economics markets. Sometimes they use antitrust markets. And sometimes they use both. In the United States v. du Pont (Cellophane), the Supreme Court established an additional market definition to be used with the other market definitions. They later refined it in Brown Shoe v. United States and the United States v. Continental Can. They established that products are in the same market if they are interchangeable in use; or equivalently, products are in the same market if consumers use them for the same purpose. Let’s call clusters based on this standard: interchangeability clusters. The courts still use this definition today. You should not confuse these clusters with the set of all functional substitutes. Functional substitutes are all the products that consumers could use for the same purpose. Interchangeability clusters are products that we observe consumers using for the same purpose. For example, consumers could use caviar in the place of salmon in salads; caviar and salmon can be used for the same purpose. They are functional substitutes. But consumers do not use them for the same purpose, because caviar costs a lot more. They are not interchangeable in use (Davis and Garcés, 2009, pg. 166-167).In Brown Shoe v. United States, the Supreme Court established that the standard is interchangeable in use. We need to observe consumers using the products for the same purpose. If consumers could use the products for the same purpose, then the products are not necessarily in the same market. We need to observe consumers using the products interchangeably. Similarly, neighboring towns might not be in the same market. Consumers might not use the stores in the two towns interchangeably. The standard gauge for interchangeability in use is consumption overlap, which is how much of the same consumers use the two products. For example, suppose 30% of the consumers that purchase product 1 purchase product 2. Then, the consumption overlap is 30%. Elzinga and Hogarty (1973; 1978) created the most common way to cluster based on this gauge. We call this method: the Elzinga-Hogarty (E-H) test. It is an algorithm that finds geographic interchangeability clusters.Many criticize the E-H test based on its failure in legal cases. Capps et al. (2001) criticize the test. They claim there is a silent majority fallacy. What is true for some consumers, is not necessarily true for all consumers. Potentially, some consumers substitute product 1 for product 2. While a silent majority would not under any price. This fallacy is because the E-H test relies on consumption overlap. All tests of consumption overlap have this fallacy. We need to evolve beyond consumption overlap. Also, you cannot identify the E-H test. In the E-H test, you split the map into subregions. We usually define a subregion as a ZIP Code or a municipality. The E-H test determines if each subregion is in the market. Therefore, each subregion creates one parameter, which is one if the subregion is in the market; and zero otherwise. This gives the E-H test too many parameters.This problem occurs in most clustering algorithms. In this paper, we address both criticisms by creating a model for interchangeability in use that has few parameters. Each subregion has a random effect governed by these parameters. The model relies on more than consumption overlap. The model uses a more complete view of interchangeability in use. The silent majority fallacy leads to poor model fit. Therefore, we avoid the fallacy when we find good model fit. Our approach follows a result from belonging in the same interchangeability cluster. The result is that the expected shares of products tend to remain constant. For example, suppose there are two transactions. In each transaction, the consumer buys either product 1 or product 2. In transaction 1, we guess that the consumer has a 30% chance of buying product 1. Suppose that in transaction 2, the consumer uses the products for the same purposes. As far as we know, each consumer desires each product equally. Then, in transaction 2, we would guess that the consumer has a 30% chance of buying product 1. Therefore, consumers exchange both products at the same rate.With sufficiently many products, the converse is true. If the expected shares of products remain constant, then the transactions are in the same cluster.Suppose this were false. Then, there exist two transactions with the same product shares and different substitutability. So, there exists some factor that effects elasticities and not product shares. This becomes absurd when there are numerous products. You have numerous equal product shares unaffected by something that affects elasticity, which may be theoretically possible but is not practical. Therefore, with sufficiently many products, clusters are within markets; they are subsets of markets.Yet, these clusters might not equal markets. For example, multiuse goods could create a chain of substitution across clusters. Markets could contain more than one cluster.An antitrust market is a set of clusters. Suppose the monopolist could set individual prices for each transaction, and the monopolist knew how much each consumer valued each good in each transaction. Then, each transaction is one market. Now suppose there exists some small friction or cost to price discrimination. Firms would want to set the same prices for all similar transactions. Transactions with the same interchangeability in use would be close enough for the same price. We see this when firms strategize by consumer segment. Therefore, clusters are markets when there exist no barriers to price discrimination between clusters. When these barriers exist, clusters combine to form markets.Therefore, this paper estimates markets in two steps. First, we estimate the interchangeability clusters. Then, we estimate the markets from the clusters. Finding the clusters first makes it easier to find markets. It reduces the number of dimensions.In this paper, we model and estimate these clusters using techniques from machine learning and data science. We model clusters using Blei et al.’s (2003) Latent Dirichlet Allocation (LDA) model. And, we estimate this model using Griffiths and Steyvers’s (2004) Gibbs Sampling algorithm (Gibbs LDA).LDA is a general clustering model. Blei et al. (2003) made it to cluster documents based on word patterns. Others have used it to: classify genetic sequences based on animal traits (Chen et al., 2010; Pritchard et al., 2000), recognize objects in photographs (Fei-Fei and Perona, 2005; Sivic et al., 2005; Russell et al., 2006; Cao and Fei-Fei, 2007; Wang et al., 2008), cluster videos (Niebles et al., 2008; Wang et al., 2007), cluster music (Hu, 2009), analyze social networks (Airoldi et al., 2008), cluster the disabilities of the elderly population (Erosheva et al., 2007), cluster CEO behavior (Bandiera et al., 2017), and predict user tastes and preferences based on consumer reviews (Marlin, 2004).LDA comes from machine learning. The goal is to automate clustering in a way that mimics human thought. It comes from the same algorithms that we use to create modern search engines like Google, Yahoo, and BING.In this paper, we focus on geographic markets. In the next paper, we plan to focus on product markets. In theory, we should solve for both geographic and product markets jointly. In practice, we solve for them separately (Davis and Garcés, 2009, pg. 163).Marketing and data scientists have been clustering to find consumer segments. They mostly cluster consumers based on survey results (see Sarstedt and Mooi; 2014). Notably, Guidotti and Gabrielli (2017) and Guidotti et al. (2018) cluster based on when consumers make purchases. This paper proposes we take it a step further. We find antitrust markets from clusters.While we wrote this paper for antitrust economists, others might find it useful. This paper might help data scientists develop new clustering techniques. It might help marketing scientists segment consumers. And, it might help us cluster for other purposes. We organize this paper as follows:In section 2, we adapt LDA as a model for defining markets. We cluster transactions based on the patterns of purchases, and we give two illustrative examples.In section 3, we summarize Gibbs LDA. This is the technique used to estimate LDA, and we give one simulated example. In section 4, we apply these ideas to a real-world example. We use transaction-level scanner data from the largest supermarket franchise in Italy (sec. 4.1). We estimate 14 Gibbs LDA clusters and model fit (sec. 4.2). Then, we compare these results to results found using the E-H test (sec. 4.3). We get similar clusters. And, we discuss some advantages of using Gibbs LDA over the E-H test. Then, we find markets from the Gibbs LDA clusters (sec. 4.4).In section 5, we conclude. Latent Dirichlet Allocation (LDA) ModelIn this section, we adapt the Latent Dirichlet Allocation (LDA) model to a new situation. We do not change the model. We change what the model clusters. Blei et al. (2003) made it to cluster documents based on word patterns. We use LDA to cluster transactions based on purchase patterns. This model is a two-step process. First, a consumer draws a cluster. Second, the consumer draws a product from that cluster.In section 3, we solve for the reverse of this model to estimate the clusters. We see the products drawn. We use the inverted model to estimate the clusters.The ModelThere exist I consumer segments. Each consumer segment is a region or subregion of the map. Each could be a municipality, a ZIP Code, a city block, a household, or even a consumer. For each consumer segment i=1,…,I, some random process determines its expenditure Xi. This random process is not critical to anything that follows. LDA models product selection and does not model the choice of the number of purchases.Expenditure could depend on prices or incomes. It could depend on the variables in the model. It is ancillary to the model. It does not matter how it is determined. The cost of each transaction is some fixed amount, P.This creates a fixed number of transactions. Each consumer segment i has Ni=Xi/P transactions. The total number of transactions is N=i=1INi.This purposely ignores pricing effects. We are trying to find the boundaries of markets. Pricing effects are within markets, not across boundaries. Each consumer segment i draws a random K-dimensional vector θi of tastes from a Dirichlet Distribution with a K-dimensional parameter α, where K is an exogenously given number of clusters. Consumer segments buy from clusters in different proportions. Consumer segment i's vector of these proportions is θi. The kth element θi,k is the probability that a given purchase by consumer segment i is from cluster k.For example, the "college student" consumer segment purchases more beer than wine, so θstudent,beer>θstudent,wine. The "downtown" consumer segment purchases more wine than beer, so θDowntown,beer<θDowntown,wine.The Dirichlet distribution draws random variables θi on the (K-1)-simplex and has a probability density function given by:pθi|α=Γk=1Kαkk=1KΓαkk=1Kθi,kαk-1(1)Then, products are purchased in a two-step process:First Step: for each consumer segment i and for each purchase n=1,…,Ni, a random cluster ki,n∈1,…,K is drawn from the multinomial distribution with a parameter of θi. The probability of choosing cluster k is θi,k.These draws of clusters are technically not independent. They are conditional on θi. Within each consumer segment, they are independent and identically distributed. Therefore, we assume that the order of purchases does not matter.For example, suppose two clusters exist: products from the east mall and products from the west mall. In the first step, each consumer determines which mall they are purchasing from. The probability that a purchase from consumer segment i is from the east mall is θi,east.Second Step: for each consumer segment i and for each purchase n=1,…,Ni, a random product yi,n∈1,…,Y is drawn from the multinomial distribution with a parameter ?k conditioning on the cluster ki,n. The probability of purchasing y is the parameter ?k,y.For example, suppose two clusters exist: products in the east mall and products in the west mall. The second step does not depend on the consumer segment. It just depends on the mall. The probability that a purchase from the east mall is sneakers is ?east,sneakers.Note that cluster shares, Φ=?1,…,?K , can depend on prices and incomes. The model treats Φ?as constant. It is ancillary to the model. It does not matter how it is determined. We are trying to find the boundaries of markets. Within market forces determine cluster and market shares. You can analyze within market forces with a different model when you assess market power. It would be a separate analysis.A good analogy for this model is that a cluster is a bag of products. For each consumer segment i and for each transaction n=1,…,Ni, first a random bag of products ki,n is drawn from the multinomial distribution with a parameter of θi. Then, a random product yi,n∈1,…,Y is drawn from the bag of products using the multinomial distribution with a parameter ?ki,n. The economist does not observe the bags drawn or the structure of the bags. The structure of the bags is Φ=?1,…,?K ?and θ=θ1,…,θI. He or she only observes y, all the products purchased. In section 3, we show how the economist estimates the structure of the bags from y.For example, suppose Santa randomly distributes toys using this model. First, he randomly draws either "naughty" or "nice". This depends on the neighborhood. Some neighborhoods are naughtier, and some are nicer. Second, he randomly picks a toy from the bag. The toy drawn depends on the bag and not the neighborhood. The economist observes the number of each toy delivered to each neighborhood. He or she does not know there is a "naughty" bag and a "nice" bag. He or she does not know if kids in the inner-city are naughtier or nicer than kids in the suburbs. He or she infers the bags based on the toys delivered.Example 1: Corner Stores & Grocery StoresFigure 2 shows an illustrative example of the LDA model.[Figure 2 about here.]Here, the cost of one transaction is the time and money spent on one trip to a corner store or to a grocery store. Therefore, a transaction is a shopping trip. There exist two bags of products or equivalently stores (K=2). Each bag of stores is a different cluster. Cluster 1 is `Corner Stores'. Corner stores are in the city center. Cluster 2 is `Grocery Stores’. Grocery stores are in the outskirts. Also, three consumer segments exist: urbanites, suburbanites, and country folk. Urbanites live in the inner-city. Suburbanites live in the suburbs. And, country folk live in the outskirts.For each consumer segment, consumers make a fixed number of shopping trips. This is determined by some random process. Urbanites make N1=11 shopping trips, suburbanites make N2=8 shopping trips, and country folk make N3=15 shopping trips. Each consumer segment draws a θ. This represents the probability of drawing each bag of stores. Urbanites draw a corner store all the time. Therefore, their θ1=100%. Suburbanites draw a corner store half the time and a grocery store half the time. Therefore, their θ1=50% and θ2=50%. Likewise, for country folk, θ2=100%. First Step: for each shopping trip, n=1,…,Ni, consumer segment i selects the store that it visits by drawing a random bag of stores. The probability of drawing bag k is θi,k. Second Step: the consumer draws the store from the selected bag. This process only depends on the bag. This process is the same for any consumer drawing from the same bag. When a consumer visits a cluster, the location of their home does not affect the choice of the store. If a consumer visits a corner store, then he or she a has 25% chance of visiting `7-Eleven', a 20% chance of visiting `Speedway', and so on. If a consumer visits a grocery store, then he or she a has 25% chance of visiting `ShopRite', a 20% chance of visiting `Aldi', and so on.Note that suburbanites buy from both corner stores and grocery stores, but do not use both sets of stores for the same purpose. They would not use a corner store in the place of a grocery store. And, they would not use a grocery store in the place of a corner store. Corner stores and grocery stores have consumption overlap, but they are not interchangeable.The economist observes the number of shopping trips to each store by each consumer segment. He or she does not observe the bags, the number of bags, and which stores belong to which bag. In section 3, we show how he or she uses the stores visited to estimate the clusters. Example 2: Televised Sports Programs in IndiaFigure 3 shows another illustrative example of the LDA model. In India, the north and west love cricket, and the south loves soccer (i.e. world football).[Figure 3 about here.]Here the cost of one transaction is an hour of a consumer's time.There exist two bags of sports programs (K=2). Each bag of sports programs is a different cluster. Cluster 1 is `Cricket'. Cluster 2 is `Soccer'. Also, two consumer segments exist: `North & West India' and `South India'.For each consumer segment, consumers watch a fixed number of hours of programs. This is determined by some random process. Northern and western Indian consumers watch N1=9 hours of sports programs. Southern Indian consumers watch N2=10 hours of sports programs. Each consumer segment draws a θ. This represents the probability of drawing each bag of sports programs. Northern and western Indian consumers watch more cricket. They watch cricket 95% of the time, so their θ1=95% and θ2=5%. Southern Indian consumers watch more soccer. They watch soccer 80% of the time, so their θ1=20% and θ2=80%.First Step: for each hour of television, n=1,…,Ni, consumer segment i selects what it watches by first drawing a random bag of sports programs. The probability of drawing bag k is θi,k.Second Step: the consumer draws the sports program from the selected bag. This process only depends on the bag. This process is the same for any consumer drawing from the same bag. If the consumer watches cricket, then he or she has a 35% chance of watching the `Ranji Trophy', a 30% chance of watching the `Duleep Trophy', and so on. If the consumer watches soccer, then he or she has a 43% chance of watching the `Asian Cup', a 27% chance of watching the `King's Cup', and so on. Note one multiuse good exists. `India TV Sports News’ is both a cricket and a soccer program. Consumers could watch the program to get cricket news. Or, consumers could watch the program to get soccer news. Therefore, it belongs to both clusters. The economist observes the number of viewers for each sports program by each consumer segment. He or she does not observe the bags, the number of bags, and which programs belong to which bag. In section 3, we show how he or she uses the number of viewers to estimate the clusters.Model EstimationEstimation TechniqueIn this section, we describe how to estimate the model from the previous section. This estimates the clusters from the purchases. We can identify the LDA model because it has few parameters. These parameters affect the model globally and each consumer segment as a random effect. The global parameters define the distribution of the random effects.Unfortunately, you cannot directly solve for the most likely parameters. No closed-form solution exists. It is intractable due to coupling of the parameters (see: Blei et al., 2003).In this paper, we estimate the model with a form of Markov Chain Monte Carlo (MCMC).Specifically, we use Griffiths and Steyvers’s (2004) Gibbs Sampling algorithm (Gibbs LDA). Gibbs sampling is an efficient and easy to implement form of MCMC. The idea is that if in addition to the purchases y, we know all the cluster assignments but the last purchase for consumer i (i.e. ki,1,…,ki,Ni-1), then the probability distribution for this unknown cluster assignment would be:pki,Ni=k|ki,1,…,ki,Ni-1,y∝a+ci,kaK+Ni-1*β+ck,yβY+Lk-1(2)where a and β are smoothing parameters; ci,k is the number of consumer i’s other purchases in cluster k; ck,y is the number of cluster k’s other purchases of product y; and Lk is the total number of cluster k’s purchases.In Gibb LDA, you start with a random cluster assignment. Then, you update all the cluster assignments using (2) until convergence.We discuss Gibbs LDA in more detail in Appendix B. This appendix makes Gibbs LDA more accessible. The Gibbs LDA literature ignored a few statistical issues. We address these issues. This paper is the first to estimate the standard errors of the model fit of Gibbs LDA. In addition, we show how to ensure convergence. Small Simulated ExampleIn this section, we test Gibbs LDA with a small simulated example. We see if we derive the same generative structure.Suppose there exist 20 consumer segments. Suppose each consumer segment makes ten purchases. We draw these purchases using LDA with α=0.2. There exist two clusters: “cold weather shoes” and “warm weather shoes”. A draw from the “cold weather shoes” cluster has a 40% chance of being wool slippers, a 40% chance of being snow boots, and a 20% chance of being sneakers. A draw from the “warm weather shoes” cluster has a 20% chance of being sneakers, a 40% chance of being sandals, and a 40% chance of being flip flops.The economist estimates the clusters from the observed purchases. He or she does not know there is a "cold weather shoe" cluster and a "hot weather shoe" cluster. He or she delineates the clusters based on the observed purchases.[Table 4 about here.]Table 4 shows that Gibbs LDA quickly estimates the cluster structure. To make it easier to interpret, we ordered the consumer segments by θ2. Low θ2 is north. High θ2 is south. Panel (a) is the initial cluster assignments. We assigned these randomly. And, panel (b) is the cluster assignments of the 50th iteration of Gibbs LDA. ?A white dot means the purchase is assigned to the first cluster (cold weather shoes). And, a black dot means the purchase is assigned to the second cluster (warm weather shoes).Gibbs LDA performed well with a small number of iterations and purchases. It correctly guessed 91.5% of the cluster assignments.Example from Italian Supermarket Scanner DataIn this section, we apply these ideas to a real-world example. We use transaction-level scanner data from the largest supermarket franchise in Italy (sec. 4.1). We estimate Gibbs LDA clusters and we defend model fit (sec. 4.2). Then, we compare these results to results found using the Elzinga-Hogarty (E-H) test. We get similar clusters. And, we discuss some advantages of Gibbs LDA clusters over E-H clusters (sec. 4.3). Then, we find markets from the Gibbs LDA clusters (sec. 4.4).Data DescriptionIn this real-word example, we use data from Unicoop Tirreno, which is part of Coop Italia. Coop Italia is the largest supermarket franchise in Italy. The data comes from the loyalty cards of residents of Tuscany. Coop knows all the purchases each member made at each of their stores. Therefore, we know if a consumer shops at one store or multiple stores. The data is all their purchases from 2010 to 2015 in stores in Tuscany. This is 99.7% of Unicoop Tirreno’s revenue from the stores in Tuscany. [Table 5 about here.]Table 5 shows some summary statistics for the data. The data is from 71 stores in 34 municipalities (or “comunes”) and 5 provinces. Between 2010 and 2015, 7 of these stores closed, and 12 of these stores opened. An average 122,852 consumers visited at least one store per month. Each of these consumers visited an average 1.375 stores in a month. The average revenue was 46.094 million EUR per month. Therefore, a consumer spent an average 375 EURs per month. [Table 6 about here.]Table 6 shows how much consumers living in each province spent in stores in each of the provinces from Sept. – Nov. 2015. This is the period that we use to estimate Gibbs LDA. Consumers from Massa & Carrara spent 98.29% in stores in their own province. They spent 1.66% in Lucca. They spent 0.04% in Livorno. And so on. Consumers from Lucca spent 99.58% in stores in their own province. They spent 0.29% in Massa & Carrara. And so on. Tuscany residents not in the 5 provinces spent 0.00% of their expenditure in Massa & Carrara. They spent 0.75% in Lucca. And so on. Consumers mostly spent money in their own provinces. There are a couple of exceptions. We discuss these exceptions in section 4.4.Estimate the Clusters and Model FitIn this section, we run Gibbs LDA on this data. We create a cross sectional sample from the Coop data. And, we split this cross-sectional data into three samples (sec. 4.2.1). We use the first sample to find the optimal number of clusters (sec. 4.2.2). We use the second sample to estimate the model using that number of clusters (sec. 4.2.3). We use the third sample to test out-of-sample model fit (sec. 4.2.4). Then, we run Gibbs LDA on the combined cross-sectional data. This gives me our main results (sec. 4.2.5). And, then, we rerun the entire process to verify the results (sec. 4.2.6). Data SamplingFirst, we create our cross-sectional dataset and then split it into three samples. We take a cross-section because the model assumes constant cluster shares, Φ. Over time, we would expect prices to change, which would change the cluster shares. We do not model these changes. Therefore, we pick a small time period to keep Φ constant. We only use data from a three-month period: Sept. - Nov. 2015. This is the most recent three-month period without an Italian holiday.Gibbs LDA uses three variables. These are: the consumer segment, the product ID, and the number of transactions. We set our consumer segments as municipalities. There were 287 municipalities in Tuscany. We observe sales from residence in 210 of them. We choose to segment consumers at this level to be consistent with the courts. We set our product as individual stores. We observe 66 stores, so we observe 66 products.We set a transaction as 10 EUR of spending. Thus, we count 10 EUR of bananas in the same way we count 10 EUR of mangos. This is a nice round number. And, it requires rounding in less than .001% of transactions.We round to whole numbered expenses with stochastic rounding. We round X to X with a probability 1-X-X and to X+1 with a probability?X-X. Unlike rounding to the nearest, this gives unbiased results. The machine learning literature often uses stochastic rounding (e.g. Gupta et al., 2015).We limit the number of transactions to 10,000 per consumer segment. Truncation is common with Gibbs LDA. It makes Gibbs LDA run faster. And, it gives more weight to consumer segments with fewer observations. There are 91 consumer segments with too many transactions. For each of these segments, we select 10,000 transactions with random sampling. Therefore, there are three variables. The first variable is the consumer’s municipality. The second is the store. The third is the number of transactions (in 10 EUR increments). As we mention earlier, we split the cross-sectional data into three samples. Let the first sample be Sample A. We use it to find the number of clusters (sec. 4.2.2). Let the second sample be Sample B. We use it to estimate the model and the clusters (sec. 4.2.3). And, let the third sample be Sample C. We use it to test out-of-sample model fit (sec. 4.2.4). [Table 7 about here.]Table 7 shows how we sample. We split the samples by municipality. Sample A comes from 20% of the municipalities. Sample B comes from a different 40% of the municipalities. Sample C comes from the remaining 40% of the municipalities. We sample this way to be consistent with the Gibbs LDA literature. Determine the Number of ClustersNext, we use Sample A to determine the optimal number of clusters, K.The optimal K is not the K that maximizes model fit. Better fit is not always better or preferable. A larger K means a more complex model. There are more clusters to fit the data. This gives the model more flexibility to fit the error. As a result, increasing K always increases model fit.Therefore, we choose the smallest K such that adding a cluster does not meaningfully increase model fit. This is called the elbow method. You choose the K=K* such that lower values meaningfully increase fit while higher values do not meaningfully increase fit.Figure 8 shows how this works with our data. [Figure 8 about here.]Figure 8 plots model fit against the number of clusters. For each K=2,…,40, we run Gibbs LDA for 2000 iterations on Sample A. Then, we calculate the model fit of the results. We give fit in terms of the average log-likelihood of a transaction.Figure 8 shows that the optimal K is 14. When K is smaller, adding a cluster meaningfully increases fit. When K is larger, adding a cluster does not meaningfully increases fit.Note the fact that a clear elbow exists also indicates that the model fits the data. The elbow method would not work well if the data was not clearly clustered.Estimate the Model with In-Sample DataNext, we use Sample B to estimate the model and find the clusters. We keep K=14.We run Gibbs LDA for 2000 iterations on Sample B. We estimate the model fit of each iteration.[Figure 9 about here.]Figure 9 shows that model fit converges within 20 iterations. We choose to discard the first 64 iterations. This avoids any bias from the initial random cluster assignments. 64 exceeds 20 iterations, and it is a convenient number when we estimate error. See Appendix B.5.1 for more on how to choose the discard period. [Table 10 about here.]Table 10 shows the model fit of the remaining iterations. We find a high average log-likelihood of -1.447 and a tiny standard error of 3.022E-06.This high likelihood and tiny standard error indicates that the model fits the data. Test the Out-of-Sample Model FitNext, we use Sample C to test out-of-sample model fit. We use Wallach et al.’s (2009) Chib-style estimator to reduce error. Out-of-sample fit in MCMC is not straightforward. Each estimate from the in-sample data incorporates some error. The previous Gibbs LDA literature ignores this error. They estimate out-of-sample fit using the last iterative estimate on Sample B. They ignore the variation in estimating Sample B when estimating the model fit of Sample C.Therefore, we test how Sample C fits several estimates of running Gibbs LDA on Sample B. We use 44 estimates of running Gibbs LDA on Sample B. We use the estimate from the 108th iteration of Gibbs LDA on Sample B. We use the estimate from the 152nd iteration of Gibbs LDA on Sample B. And so on. Each time, we run the Chib-style estimator for 200 iterations. We find an average log-likelihood of -3.464. This is high but not as high as the in-sample model fit. We should expect this. In addition, we find a tiny standard error of 1.136E-02.This high out-of-sample likelihood and tiny standard error indicates that the model fits the data. Also, we estimate out-of-sample fit in the same way in the previous literature. We only use the estimate from the last iteration of Gibbs LDA on Sample B. We run the Chib-style estimator for 1000 iterations. We find a high average log-likelihood of -3.471.This high out-of-sample likelihood indicates that the model fits the data. Estimate the Model with Data from all the MunicipalitiesThen, we run Gibbs LDA on the cross-sectional dataset from all municipalities. This combines Samples A, B, and C. These are the most accurate results, because these results come from the most data. We run Gibbs LDA for 2000 iterations. We discard the results from the first 64 iterations. Table 10 shows the model fit statistics. We find a high average log-likelihood of -1.451 and a tiny standard error of 4.232E-06.This similar finding indicates that the model fits the data.Then, we create geographic maps of each cluster. [Figure 11 about here.]Figure 11 shows six examples of the maps. There are maps of large and small clusters. k4 and k7 are large clusters. k10 and k3 are small clusters. We shade each municipality by expenditure in the cluster. Dark green means that residents of the municipality spent a lot in the cluster. Light green means that they spent a little in the cluster. Yellow means that we do not observe any data from that municipality. In general, the clusters appear to be contiguous. Red stars are stores with at least 20% of the cluster’s expenditures. Purple triangles are stores with between 5% and 20% of the cluster’s expenditures. Blue dots are stores with less than 5% of the cluster’s expenditures. In general, stores locate within or close to their consumers. To see all maps, refer to Web Appendix E.1 (<link>). This geographical clustering indicates that the model fits the data. Gibbs LDA uses expenditure data. We do not use data on proximity. We do not use data on which stores neighbor each other. We do not use data on which municipalities neighbor each other. Geographical clustering indicates that these are the true clusters.To interpret our results, we create names for each cluster.Also, we order clusters approximately from north-to-south. [Table 12 about here.]Table 12 shows our names for the fourteen clusters and their total expenditures. These names make our clusters easy to interpret. For more information about each cluster, refer to Web Appendix E.1 (<link>).Easily interpretable clusters indicate that the model fits the data. In addition, we use our results to test model fit from data from every month 2010 – 2015. This requires no MCMC iterations, because we already have estimates for Φ?and θ. We discard the results based on the first 64 iterations of the cross-sectional data. We calculate the model fit statistics from the remaining iterations. We find high average log-likelihoods of -1.56 to -1.17 with some evidence of seasonality. In addition, we find tiny standard errors of 6.011E-06 to 3.712E-05.These high likelihoods and tiny standard errors on the out-of-sample monthly data indicate that the model fits the data. Rerun the Experiment to Verify the ResultsFinally, we rerun the entire process to verify our results. We rerun the data sampling, all the tests of model fit, and Gibbs LDA. Let’s call this `run 2’ and the previous analysis `run 1’.We find the same optimal number of clusters. We find similar model fit statistics. For more detail, see Table 10. These similar statistics indicate that the model fits the data. [Figure 13 about here.]Figure 13 shows that we find similar clusters in run 2. It shows the Kullback-Leibler (KL) distance between clusters in run 1 and run 2. The KL distance measures the similarity between clusters. A KL distance of 0 means the distribution of stores equal in both clusters. A higher distance means the clusters differ more. In Figure 13, we shade higher KL distances darker. The lower distances on the diagonal show that overall we find similar clusters. These similar results indicate that the model fits the data.Elzinga-Hogarty (E-H) TestIn this section, we compare Gibbs LDA to Elzinga and Hogarty’s (1973; 1978) E-H test. It is an algorithm that finds interchangeability clusters. We often use the E-H test in antitrust cases. First, we define the E-H algorithm that we use for comparison (sec. 4.3.1). Then, we create a dataset from the Coop data (sec. 4.3.2). Then, we find the E-H clusters (sec. 4.3.3). We compare the E-H clusters to the Gibbs LDA clusters that we estimate in the previous section. Finally, we discuss the advantages of Gibbs LDA over the E-H test (sec. 4.3.4). E-H AlgorithmFirst, we define the E-H algorithm that we compare against Gibbs LDA.The E-H test consists of two parts: a demand-side test and a supply-side test. We call the demand-side test: "little from the outside" (LIFO). It tests whether nearly all consumption by the candidate area comes from stores in the candidate area. This tests for consumption overlap with stores from outside the area. We call the supply-side test: "little out from the inside" (LOFI). It tests whether nearly all sales volume from the candidate area comes from consumers in the candidate area.These two tests complement each other. ?LIFO tests demand but ignores supply. LOFI tests supply but ignores demand. Elzinga and Hogarty (1973) argue that a market satisfies both tests.Formally, the LIFO and LOFI tests are:LIFO=Consumption in the Candidate Area from Stores in the Candidate AreaConsumption in the Candidate Area (LIFO)=100%-% Imports≥90% LOFI=Sales Volume in the Candidate Area going to Consumers in the Candidate AreaSales Volume in the Candidate Area (LOFI)=100%-% Exports≥90% The courts apply the E-H test in many ways. See Frech et al. (2003) for a review.To compare Gibbs LDA clusters and E-H clusters, we settle on one algorithm based on what is most commonly accepted in court:Start with an initial set of store(s). This consists of all the stores in some initial municipality.Set the candidate area as the municipality or municipalities of the set of stores.If the candidate area satisfies both (LIFO) and (LOFI), then you are done. This is the E-H cluster. Otherwise, add a municipality to the candidate area. Use the municipality with the highest volume of sales of the remaining municipalities.If this additional municipality has more stores, then add those stores to the set of stores and return to step (2). Otherwise, return to step (3). E-H DatasetThen, we create a data sample for the E-H test out of the Coop data.It is okay to use the E-H test on supermarket data. Although grocery stores sell many goods, consumers treat store choice as one good. Grocery shopping is a nested problem. First, consumers select a store. Then, consumers decide which products to buy (e.g. Besanko et al., 1998).Therefore, we treat grocery sales as one good. To apply the E-H test, we create a price-adjusted measure of volume of sales. We normalize sales volumes by a price index. Then, we use data from Sept. – Nov. 2015 to create an E-H dataset. These are the same purchases that we use in Gibbs LDA. First, we create store-level price indices. Hausman (1996) creates similar price indices and remove seasonality. For details on the creation of the price indices, see Appendix D.Then, we use these price indices to create a price-adjusted measure of the volume of sales. We divide expenditures at a store by its price index.This creates a cross-sectional dataset. One observation is the price-adjusted volume of sales from a seller-municipality to a buyer-municipality.Similar ClustersNext, we run the E-H algorithm on the E-H data. We start the algorithm from every store. This creates 25 unique clusters. On average, each cluster contains 2.68 municipalities. 1.96 of which contain one of the stores. And, it shares 0.92 municipalities with another cluster. Then, we match each E-H cluster with its most similar Gibbs LDA cluster. Each Gibbs LDA cluster matches at least one similar E-H cluster. Often several E-H clusters match the same Gibbs LDA cluster. Many E-H clusters are similar. And, 21 of the 25 E-H clusters match a Gibbs LDA cluster. We expect a few non-matches. The two algorithms use different thresholds. [Figure 14 about here.]Figure 14 shows E-H clusters superimposed on their Gibbs LDA clusters. The red dashed and dotted lines are the boundaries of the different E-H clusters.Both algorithms seem to estimate the same clusters. Advantages of Gibbs LDA over the E-H TestAlthough you get similar clusters with both Gibbs LDA and the E-H test, Gibbs LDA has some advantages. Gibbs LDA assigns every transaction to a cluster. The E-H test often assigns the same transaction to multiple clusters. As a result, Gibbs LDA handles contested regions better. The E-H test treats a contested region as belonging to both clusters. While, Gibbs LDA assigns a contested region probabilistically to each cluster.Gibbs LDA does not fall for the silent majority fallacy. As we mention earlier, the E-H test could fail because it relies on consumption overlap. Suppose there exists consumption overlap between two stores. Let’s call the consumers who buy from both stores: travelers. Further, suppose there exists a silent majority not willing to substitute one store for another. We would infer the silent majority would substitute one good for another based on the travelers. This leads to false results in the LIFO test (through consumption overlap). Yet, Gibbs LDA would not lead to false results. It relies on a more complete view of interchangeability in use. In Gibbs LDA, the silent majority fallacy leads to poor model fit. Therefore, we avoid the fallacy when we find good model fit. Gibbs LDA is identifiable. The E-H test is not. Gibbs LDA has many measures of model fit. The E-H test does not.Therefore, Gibbs LDA is preferable to the E-H test in some situations. Estimate Markets from Gibbs LDA ClustersIn this section, we estimate the markets from the Gibbs LDA clusters. Potentially, we could do this with the E-H clusters. Yet, it would be more difficult. There are too many overlapping E-H clusters. It is easier to define markets from Gibbs LDA clusters than from municipalities. There 66 are municipalities and 14 clusters. Gibbs LDA reduces the number of dimensions in the problem. First, we looked at consumption overlap. We start with the consumption overlap between clusters. Then, we look at the consumption overlap between provinces. The courts often use consumption overlap to define markets.[Figure 15 about here.]Figure 15 shows how much consumption overlap exists between clusters. The horizontal axis is the threshold that we use to determine which municipalities are in a cluster. It is in terms of expenditure. Let the threshold be ω%. The municipalities that belong to cluster k are the smallest set that total ω% of cluster k’s expenditures. You order municipalities from largest to smallest expenditure in cluster k. The vertical axis is the average number of shared municipalities with other clusters. It measures consumption overlap. Figure 15 shows moderate evidence of clusters overlapping. Yet, it reacts significantly to the threshold. There is no established threshold. Therefore, multiple clusters could be in the same market. We cannot tell. We need to look at others measures at the cluster-level. Before we do that, let’s look at the province-level.[Figure 16 about here.]Figure 16 shows how much consumption overlap exists between provinces. It shows how much consumers from each province spent in stores in each province. For example, Siena residents spent 13.80% of their expenditures in Grosseto.Figure 16 shows most provinces are separate. A couple of exceptions exist:Grosseto and Siena could be in the same market. Siena residents spent 13.80% of their expenditures in Grosseto. Siena residents did not spend much. Siena stores did not take in much revenue. There was only one store in Siena, and consumers in Siena mostly lived near the western border close to Grosseto.Massa & Carrara and Lucca could be in the same market. Massa & Carrara residents spent 1.67% of their expenditures in Lucca. There were only two stores in Massa & Carrara, and consumers in Massa & Carrara mostly lived near the southern border close to Lucca.Therefore, we investigate three regions: (a) the province of Massa & Carrara and the province of Lucca, (b) the province of Livorno, and (c) the provinces of Grosseto and Siena.Next, we create cluster-level price indices to test for markets within these three regions. We start with the store-level prices that we create for the E-H test. Then, we create the cluster-level price indices from the store-level price indices. Cluster k’s index was the weighted average of the store-level indices, weighing by our estimate of ?k. And, we remove seasonality. For details on the creation of the price indices, see Appendix D. [Table 17 about here.]Then, we choose the pricing test to define markets within these three regions. we choose Forni’s (2004) stationarity test based on a process of elimination. You use different tests in different situations. It depends on the data. Table 17 shows this process applied to the Coop data. Stationarity tests come from the law of one price. The idea is that the ratio of prices of two goods in the same market should remain constant. Therefore, if this ratio is stationary, the products are in the same market. If this ratio is not stationary, the products are not in the same market.Forni (2004) combines two tests: one that shows products are in the same market and one that shows that products are not in the same market. The Augmented Dickey Fuller (ADF) test has the null hypothesis of non-stationarity. Therefore, a rejection of the null hypothesis shows that the stores are in the same market. But failing to reject the null hypothesis does not show if stores are in the same market or not. The Kwiatkowski-Philips-Schmidt-Shin (KPSS) test has the null hypothesis of stationarity. Therefore, a rejection of the null hypothesis shows that the stores are not in the same market. But failing to reject the null hypothesis does not show if stores are in the same market or not. By combining the ADF and the KPSS tests we test whether stores are in the same market or not. [Table 18 about here.]Tables 18(a)-(c) shows the results of running these tests on the log of the price ratios between clusters. Following Forni (2004), we test ADF with orders of 4 and 8. We reject the ADF test if either order rejects at a 10% significance. We test KPSS with Bartlett windows of eight and sixteen months. We reject the KPSS test if either window rejects at a 10% significance. If we both reject the ADF test and we fail to reject the KPSS test, then this shows that the clusters are in same market. We indicate this with a "S". ?If we both reject the KPSS test and fail to reject the ADF test, then this shows that the clusters are in different markets. We indicate this with a "D". If we fail to reject both tests, then we cannot determine if the two stores are in the same market. We indicate this with a "?". This only occurs once.Table 18(a) shows the results between clusters in the province of Massa & Carrara and the province of Lucca. All the clusters (k1 - k3) are in the same market.Table 18(b) shows the results between the clusters in the province of Livorno. There exist two markets: the City of Livorno (k4) and the rest of the province (k5 - k9).Table 18(c) shows that results between the clusters in the provinces of Grosseto and Siena. The tests indicate that there potentially exist two overlapping markets. One potential market consists of k10, k11, k12, and k14. Let's call this M1. And, another potential market consists of k10, k12, k13, and possibly k14. Let's call this M2. M1 concentrates in the north. M2 concentrates in the southM1 and M2 overlap enough to be one market. 47.6-49.9% of M1's expenditures are in the overlapping clusters. And, 68.3% - 69.3% of M2's expenditures are in the overlapping clusters. Therefore, although k11 and k13 do not directly compete. They indirectly compete through a chain of substitutes with a significant market share.In Brown Shoe v. United States, the court accepted Joan Robinson's position that market boundaries could only be drawn at gaps in chains of substitutes (see Werden, 1992, pg. 158).Therefore, Grosseto and Siena are one market.Therefore, there exist four markets: (1) Massa & Carrara and Lucca (k1 - k3), (2) the City of Livorno (k4), (3) the rest of Livorno (k5 - k9), and (4) Grosseto and Siena (k10 - k11).ConclusionMarket definition is a form of clustering. There are many recent advances in clustering. There are advances in marketing science, data science, and machine learning. We should evaluate and adapt these advances to the problem of market definition.In this paper, we show that interchangeability clusters are subsets of markets.We demonstrate that you should estimate markets in two steps. First, you solve for the interchangeability clusters. Then, you estimate the markets from the clusters. Finding the clusters first makes it easier to find the markets. It reduces the number of dimensions.Also, we show how to estimate clusters with Gibbs LDA. We show how to find the number of clusters and how to test model fit. Also, we present strong evidence that LDA fits the Coop data. This shows that these interchangeability clusters exist in the marketplace.In addition, we show how to interpret the results of Gibbs LDA. In addition, we discuss some advantages of Gibbs LDA clusters over E-H clusters. Gibbs LDA is identifiable. It has better measures of model fit. And, it avoids the silent majority fallacy. In addition, we show how to estimate geographic markets from Gibbs LDA clusters.In the next paper, we plan to explore product-markets. Potentially, we could use Gibbs LDA to define both geographic and product markets together. In future papers, we need to evaluate other clustering algorithms as methods for empirically defining markets. For example, we could potentially use hierarchical LDA (Griffiths, et al., 2004) to find niche markets.Also, we can extend Gibbs LDA to model for pricing effects. Gibbs LDA is very flexible and easy to extend. Potentially, we can let Φ depend on price with some adjustment to (2). Further, we need to explore the results of probabilistic clusters. The resulting markets are not necessarily "lines in the sand".Also, it would be useful to apply Gibbs LDA to different industries. This would help us establish better standards for its use in court.In Conclusion, this paper is a beginning of a new line of research, not an end. We hope this encourages economists to use modern clustering techniques to define markets. Businesses use more and more big data. They use clustering to segment consumers. It is becoming part of how firms think about their customers. We need to explore its potential for future antitrust cases.ReferencesAiroldi, Edoardo?M, David?M Blei, Stephen?E Fienberg, and Eric?P Xing, “Mixed membership stochastic blockmodels,” Journal of Machine Learning Research, 2008, 9 (Sep), 1981–2014.Angelino, Elaine, Matthew?James Johnson, Ryan?P Adams et?al., “Patterns of scalable bayesian inference,” Foundations and Trends in Machine Learning, 2016, 9 (2-3), 119–247.Bandiera, Oriana, Stephen Hansen, Andrea Prat, and Raffaella Sadun, “CEO Behavior and Firm Performance,” Technical Report, National Bureau of Economic Research 2017.Besanko, David, Sachin Gupta, and Dipak Jain, “Logit demand estimation under competitive pricing behavior. An equilibrium framework,” Management Science, 1998, 44 (11-part-1), 1533–1547.Blei, David?M., Andrew Ng, and Michael Jordan, “Latent Dirichlet allocation,” JMLR, 2003, 3, 993–1022.Cao, Liangliang and Li?Fei-Fei, “Spatially coherent latent topic model for concurrent segmentation and classification of objects and scenes,” in “Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on” IEEE 2007, pp.?1–8.Capps, Cory?S, David Dranove, Shane Greenstein, and Mark Satterthwaite, “The silent majority fallacy of the Elzinga-Hogarty criteria: a critique and new approach to analyzing hospital mergers,” Technical Report, National Bureau of Economic Research 2001.Cartwright, Phillip?A, David?R Kamerschen, and Mei-Ying Huang, “Price correlation and granger causality tests for market definition,” Review of Industrial Organization, 1989, 4 (2), 79–98.Chang, Jonathan, “Package ‘lda’: Collapsed Gibbs Sampling Methods for Topic Models [r software],” 2015. [Accessed: 2018-04-05].Chen, Xin, Xiaohua Hu, Xiajiong Shen, and Gail Rosen, “Probabilistic topic modeling for genomic data interpretation,” in “Bioinformatics and Biomedicine (BIBM), 2010 IEEE International Conference on” IEEE 2010, pp.?149–152.Cournot, Antoine Augustin, “Researches into the Mathematical Principles of the Theory of Wealth,” Macmillan, 1897.Davis, Peter and Eliana Garcés, Quantitative techniques for competition and antitrust analysis, Princeton University Press, 2009.Elzinga, Kenneth?G and Thomas?F Hogarty, “The Problem of Geographic Market Delineation in Antimerger Suits,” Antitrust Bull., 1973, 18, 45.______ and ______, “The problem of geographic market delineation revisited: the case of coal,” Antitrust Bull., 1978, 23, 1.Erosheva, Elena?A, Stephen?E Fienberg, and Cyrille Joutard, “Describing disability through individual-level mixture models for multivariate binary data,” The annals of applied statistics, 2007, 1 (2), 346.Fei-Fei, Li and Pietro Perona, “A bayesian hierarchical model for learning natural scene categories,” in “Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on,” Vol.?2 IEEE 2005, pp.?524–531.Forni, Mario, “Using stationarity tests in antitrust market definition,” American Law and Economics Review, 2004, 6 (2), 441–464.Frech, Harry E?III, James Langenfeld, and R?Forrest McCluer, “Elzinga-Hogarty tests and alternative approaches for market share calculations in hospital markets,” Antitrust LJ, 2003, 71, 921.Geyer, Charles?J, “Practical markov chain monte carlo,” Statistical science, 1992, pp.?473–483.Gilks, Walter?R, Sylvia Richardson, and David Spiegelhalter, Markov chain Monte Carlo in practice, CRC press, 1995.Glynn, Peter?W and Donald?L Iglehart, “Simulation output analysis using standardized time series,” Mathematics of Operations Research, 1990, 15 (1), 1–16.______ and Ward Whitt, “Estimating the asymptotic variance with batch means,” Operations Research Letters, 1991, 10 (8), 431–435.Griffiths, Thomas?L and Mark Steyvers, “Finding scientific topics,” Proceedings of the National academy of Sciences, 2004, 101 (suppl 1), 5228–5235.______ and ______, “Matlab Topic Modeling Toolbox 1.4,” 2011. [Accessed: 2018-03-29].______, Michael I Jordan, Joshua B Tenenbaum, and Mark Steyvers, “Hierarchical topic models and the nested chinese restaurant process,” Advances in neural information processing systems, 2004, 17-24.Guidotti, Riccardo and Lorenzo Gabrielli, “Recognizing residents and tourists with retail data using shopping profiles,” International Conference on Smart Objects and Technologies for Social Good, Springer, Cham, 2017, pp.?353–363.______, ______, Anna Monreale, Dino Pedreschi, and Fosca Giannotti, “Discovering temporal regularities in retail customer’s shopping behavior,” EPJ Data Science, 2018, 7 (1), 6.Gupta, Suyog, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan, “Deep learning with limited numerical precision,” in “International Conference on Machine Learning” 2015, pp.?1737–1746.Hausman, Jerry?A, “Valuation of new goods under perfect and imperfect competition,” in “The economics of new goods,” University of Chicago Press, 1996, pp.?207–248.Hu, Diane?J, “Latent dirichlet allocation for text, images, and music,” University of California, San Diego, 2009, 26, 2013.Ivaldi, Marc and Szabolcs Lorincz, “A full equilibrium relevant market test: application to computer servers,” 2005.Jones, Galin?L, Murali Haran, Brian?S Caffo, and Ronald Neath, “Fixed-width output analysis for Markov chain Monte Carlo,” Journal of the American Statistical Association, 2006, 101 (476), 1537–1547.Kaplow, Louis, “Market definition: Impossible and counterproductive,” Antitrust Law Journal, 2013, 79 (1), 361–379.Lancichinetti, Andrea, M?Irmak Sirer, Jane?X Wang, Daniel Acuna, Konrad K?rding, and Lus A?Nunes Amaral, “High-reproducibility and high-accuracy method for automated topic classification,” Physical Review X, 2015, 5 (1), 011007.Langenfeld, James and Wenqing Li, “Critical loss analysis in evaluating mergers,” The Antitrust Bulletin, 2001, 46 (2), 299–337.Marlin, Benjamin?M, “Modeling user rating profiles for collaborative filtering,” in “Advances in neural information processing systems” 2004, pp.?627–634.Marshall, Alfred, “Principles of Economics (London, 1920),” Book V, 1920, p.?324.MathWorks, “Matlab Text Analytics Toolbox 2018a [MATLAB software],” 2018. [Accessed: 2018-03-29].Mimno, David, “The Dirichlet-multinomial distribution [INFO 6150 Class Handout],” 2016. Retrieved from [Accessed: 2018-03-07].Minka, Thomas and John Lafferty, “Expectation-Propagation for the Generative Aspect Model,” Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence, 2002, pp.?352–359.Niebles, Juan?Carlos, Hongcheng Wang, and Li?Fei-Fei, “Unsupervised learning of human action categories using spatial-temporal words,” International journal of computer vision, 2008, 79 (3), 299–318.Phan, Xuan-Hieu and Cam-Tu Nguyen, “GibbsLDA++: A C/C++ Implementation of Latent Dirichlet Allocation [C/C++ software],” 2007. [Accessed: 2018-04-05].Pritchard, Jonathan?K, Matthew Stephens, and Peter Donnelly, “Inference of population structure using multilocus genotype data,” Genetics, 2000, 155 (2), 945–959.Riddell, Allen, “lda 1.0.5: Topic modeling with latent Dirichlet allocation [python software],” 2015. [Accessed: 2018-04-05].Robert, Christian?P and George Casella, “Monte Carlo statistical methods,” 2004.Russell, Bryan?C, William?T Freeman, Alexei?A Efros, Josef Sivic, and Andrew Zisserman, “Using multiple segmentations to discover objects and their extent in image collections,” in “Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on,” Vol.?2 IEEE 2006, pp.?1605–1614.Sarstedt, Marko and Erik Mooi, “Cluster analysis,” in “A concise guide to market research,” Springer, 2014, pp.?273–324.Schwarz, Carlo, “ldagibbs: A command for topic modeling in Stata using latent Dirichlet allocation,” The Stata Journal, 2018, 18 (1), 101–117.Sivic, Josef, Bryan?C Russell, Alexei?A Efros, Andrew Zisserman, and William?T Freeman, “Discovering objects and their location in images,” in “Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on,” Vol.?1 IEEE 2005, pp.?370–377.Slade, Margaret?E, “Exogeneity tests of market boundaries applied to petroleum products,” The Journal of Industrial Economics, 1986, pp.?291–303.Steyvers, Mark and Tom Griffiths, “Probabilistic topic models,” Handbook of latent semantic analysis, 2007, 427 (7), 424–440.Stigler, George?J and Robert?A Sherwin, “The extent of the market,” The Journal of Law and Economics, 1985, 28 (3), 555–585.Teh, Yee-Whye, David Newman, and Max Welling, “A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation,” in “NIPS” 2006.Unicoop Tirreno, “Il Bilancio Consuntivo 2012,” June 2013.______, “Il Bilancio Consuntivo 2012,” June 2015.______, “Il Bilancio 2015,” June 2016.United Nations. Statistical Division, Classification of Individual Consumption According to Purpose (COICOP) 2018, Vol. Series M. No. 99, United Nations Publications, 2018.U.S. Trade Commission, Improving healthcare: a dose of competition; a report by the Federal Trade Commission and Department of Justice (July 2004), with various supplementary materials, Springer, 2005.Wallach, Hanna?M, Iain Murray, Ruslan Salakhutdinov, and David Mimno, “Evaluation methods for topic models,” in “Proceedings of the 26th annual international conference on machine learning” ACM 2009, pp.?1105–1112.Wang, Xiaogang, Xiaoxu Ma, and Eric Grimson, “Unsupervised activity perception by hierarchical bayesian models,” in “Computer Vision and Pattern Recognition, 2007. CVPR’07. IEEE Conference on” IEEE 2007, pp.?1–8.______, ______, and ______, “Spatial latent dirichlet allocation,” in “Advances in neural information processing systems” 2008, pp.?1577–1584.Werden, Gregory?J, “The history of antitrust market delineation,” Marq. L. Rev., 1992, 76, 123.AppendixTables and FiguresTable 1: Tests for Defining MarketsCriteria:Test:Economics Market: prices of goods in the same market tend to equality with due allowance for transportation cost(Marshall, 1920)? Price Correlation Comparison (e.g. Nestlé–Perrier merger)? Stationary Tests (e.g. Stigler and Sherwin, 1985; Forni, 2004)? Granger Causality Tests (Cartwright et al., 1989; Slade, 1986) ? Natural Experiments on Price Movements (Davis and Garcés, 2009, pg. 185-188)Antitrust Market: the smallest area or group of goods where a hypothetical, profit-maximizing monopolist would impose a 'small but significant and nontransitory' increase in price(1984 Merger Guidelines)? Small but Significant and Nontransitory Increase in Price (SSNIP) test (1984 Merger Guidelines)? Critical Loss Analysis (Langenfeld and Li, 2001)? Full Equilibrium Relevant Market (FERM) test (e.g. Ivaldi and Lorincz, 2005)Interchangeability Cluster: goods are in the same market if they are interchangeable in use (U.S. v. DuPont (Cellophane); Brown Shoe Co. v. U.S.; and U.S. v. Continental Can)Qualitative:? Functional Substitutes (Davis and Garcés, 2009, pg. 166-167).? Contiguity in Geographic MarketsQuantitative:? Elzinga-Hogarty Test (Elzinga and Hogarty, 1973; 1978)? Latent Dirichlet Allocation (proposed this paper)Figure 2: Example 1 (Corner Stores & Grocery Stores)First, a consumer randomly draws a bag (i.e. a cluster). This random process depends on the consumer segment. Then the consumer draws a random product from the bag. In this case, the consumer segments are urbanites, suburbanites, and country folk. And, the bags are corner stores and grocery stores.Figure 3: Example 2 (Televised Sports Programs in India)One multiuse good exists. `India TV Sports News’ is both a cricket and a soccer program. Table 4: Illustration of Gibbs Sampling Applied to a Small LDA ExampleThe initial random cluster assignmentsThe cluster assignments after 50 iterations of Gibbs SamplingThere are 20 consumer segments. Each consumer segment makes ten purchases. We draw these purchases using LDA with α=0.2. Two clusters exist: “cold weather shoes” and “warm weather shoes”. A draw from the “cold weather shoes” cluster has a 40% chance of being wool slippers, a 40% chance of being snow boots, and a 20% chance of being sneakers. A draw from the “warm weather shoes” cluster has a 20% chance of being sneakers, a 40% chance of being sandals, and a 40% chance of being flip flops. A white dot means the purchase is assigned to the first cluster (cold weather shoes). And, a black dot means the purchase is assigned to the second cluster (warm weather shoes).Table 5: Summary Statistics of Coop Data2010- 2015Sept. - Nov. 2015(sample for price indices)(sample for Gibbs LDA & E-H test)#StoresAverage Monthly#StoresAverage MonthlyProvince:#Consumers*Revenue(mil. EUR)#Consumers*Revenue(mil. EUR)Massa & Carrara27,148.262.31626,501.332.838Lucca710,114.713.51779,826.334.323Livorno2975,288.7629.9572768,530.6734.425Grosseto3232,134.5110.2612829,231.3311.948Siena1129.080.0421128.000.051All Tuscany:71122,851.5046.09465112,503.7053.585* = province-level number of consumers do not aggregate to Tuscany-level number of consumers because some consumers shop in more than one province. Table 6: Expenditures by Consumer’s Province and Store’s Province (Sept. – Nov. 2015)Avr. Monthly Expenditure (mil. EUR)Fraction of Expenditure spent in each Province:Massa & CarraraLuccaLivornoGrossetoSienaConsumer's ProvinceMassa & Carrara2.87498.29%1.66%0.04%0.01%0.00%Lucca4.2850.29%99.58%0.12%0.01%0.00%Livorno33.9000.00%0.01%99.31%0.68%0.00%Grosseto11.7440.00%0.00%0.36%99.63%0.01%Siena0.0570.00%0.00%0.66%13.45%85.89%Others (Tuscany)0.7250.00%0.75%97.89%1.27%0.09%Total53.5855.30%8.07%64.24%22.30%0.09%Table 7: Subsampling SummarySample:Split:Number of Municipalities:Number of Transactions:Sample A – Find the Number of Clusters20%42309,266Sample B – Estimate the Clusters40%84427,920Sample C – Out of Sample Testing40%84430,691Cross-Sectional Dataset= Combined Samples A, B & C100%2101,167,877Note: all samples cover Sept. – Nov. 2015.Figure 8: Model Fit by Number of ClustersFor each K=2, …, 40, we run Gibbs LDA for 2000 iterations on Sample A. This graph is the estimated average (per transaction) log-likelihood for every value of K. Figure 9: Model Fit of the first 30 Iterations of Gibbs LDA on Sample BAverage log-likelihood converges within 20 iterations. We avoid transient bias by discarding the results from the first 64 iterations.Table 10: Model Fit Statistics for Samples of Gibbs LDASampleAverageLog-LikelihoodStandardErrorRun 1Sample B*-1.4473.022E-06Sample C (44 iterations of B)#-3.4641.136E-02Sample C (last iteration of B)!-3.471--Cross Sectional Dataset*-1.4514.232E-06Run 2Sample B*-1.4132.868E-06Sample C (44 iterations of B)#-3.6154.208E-03Sample C (last iteration of B)!-3.622--Cross Sectional Dataset*-1.4754.124E-06* = We run Gibbs LDA for 2000 iterations. We discard the results from the first 64 iterations. We estimate the average log-likelihood of each remaining iteration. We report the average of these average log-likelihoods. We estimate the standard errors using the method of batch means (see Appendix B.5.1). # = We estimate the average log-likelihood using Wallach et al.'s (2009) Chib-Style estimator for out-of-sample LDA standard errors. We use 44 estimates from Sample B. We use the estimate from the 108th iteration of Gibbs LDA on Sample B. We use the estimate from the 152nd iteration of Gibbs LDA on Sample B. And so on. Each time, we run the estimator for 200 iterations. The average and standard error are the average and standard error of these 44 estimates.! = We estimate the average log-likelihood using Wallach et al.'s (2009) Chib-Style estimator for out-of-sample LDA standard errors. We only use the estimate from the last iteration of Gibbs LDA on Sample B. We run the Chib-style estimator for 1000 iterations.Figure 11: Example Maps of Gibbs LDA Clustersk4: City of Livorno (N. Prov. of Livorno)[38.59 mil. EUR Expenditures]k7: Piombino (Southern Livorno)[11.84 mil. EUR Expenditures]k11: Follonica (North-Western Grosseto)[11.11 mil. EUR Expenditures]k8: Island of Elba[6.18 mil. EUR Expenditures]k10: Northern Grosseto[3.48 mil. EUR Expenditures]k3: Bagni di Lucca (Eastern Lucca)[0.35 mil. EUR Expenditures]Note: municipality boundaries data powered by MapIt ().We shade each municipality by expenditure in the cluster. Dark green means that residents of the municipality spent a lot in the cluster. Light green means that they spent a little in the cluster. Yellow means that we do not observe any data from that municipality. Red stars are stores with at least 20% of the cluster’s expenditures. Purple triangles are stores with between 5% and 20% of the cluster’s expenditures. Blue dots are stores with less than 5% of the cluster’s expenditures. These results are typical of all 14 clusters. Maps of all the clusters are available in Web Appendix E.1 (<link>). Table 12: Cluster Descriptions and ExpendituresCluster:Description:Total Expenditurein mil. EUR(Sept. - Nov. 2015)Province of Massa and Carrara& Province of Lucca:k1Northern Lucca0.42k2S. Massa & Carrara and W. Lucca14.89k3Bagni di Lucca (Eastern Lucca)0.35Province of Livorno:k4City of Livorno(N. part of the Prov. of Livorno)38.59k5North-Central Livorno4.28k6Cecina (South-Central Livorno)8.11k7Piombino (Southern Livorno)11.84k8Island of Elba6.18k9etc. Livorno7.88Province of Grosseto:k10Northern Grosseto3.48k11Follonica (North-Western Grosseto)11.11k12City of Grosseto (Central Grosseto)7.08k13Southern Grosseto4.91Province of Siena:k14Western Siena0.51Figure 13: Stability of Clusters between Different RunsRun 2k10'k2'k9'k3'k13'k8'k12'k4'k5'k11'k6'k14'k1'k7'KL Dist.:Run 1k1008.98.79.39.19.39.710.58.310.48.411.41011.126k29.12.918.320.821.220.619.721.61221.517.321.819.81724k99.717.24.621.421.921.422.122.214.515.920.813.221.914.922k3919.114.26.321.120.715.618.819.821.220.722.316.121.920k1310.619.721.121.26.721.222.12213.92211.322.821.722.518k89.119.220.715.816.37.821.621.619.521.121.122.420.920.816k129.118.818.519.117.7198.521.314.521.29.622.12121.714k411.120.214.621.620.321.822.69.720.922.521.823.422.32312k51017.519.419.617.519.613.420.311.311.519.8212020.710k1111.120.321.718.122.62211.322.920.416.120.123.722.623.38k69.319.217.220.715.416.621.521.514.819.416.719.418.1206k148.918.819.81918.817.117.415.718.321.216.718.220.921.74k1919.220.210.319.314.321.621.714.121.520.622.52122.12k79.515.320.720.517.319.815.214.714.12217.822.921.822.40This table shows the Kullback-Leibler (KL) distance between cluster distributions of the two runs. A higher distance means the clusters are less similar.Figure 14: Maps of similar Elzinga-Hogarty clustersk4: City of Livorno (N. Prov. of Livorno)2 similar E-H clustersk7: Piombino (Southern Livorno)3 similar E-H clustersk11: Follonica (North-Western Grosseto)2 similar E-H clustersk8: Island of Elba1 similar E-H clusterk10: Northern Grosseto2 similar E-H clustersk3: Bagni di Lucca (Eastern Lucca)1 similar E-H clusterNote: municipality boundaries data powered by MapIt ().These maps are the same as Figure 12 with the E-H clusters superimposed. The red dashed and dotted lines are the boundaries of different E-H clusters.Maps of all the clusters are available in Web Appendix E.2 (<link>). Figure 15: Consumption Overlap between Clusters (Sept. – Nov. 2015)The horizontal axis is the threshold that we use to determine which municipalities are in a cluster. It is in terms of expenditure. Let the threshold be ω%. The municipalities that belong to cluster k are the smallest set that total ω% of cluster k’s expenditures. You order municipalities from largest to smallest expenditure in cluster k. The vertical axis is the average number of shared municipalities with other clusters. It measures consumption overlap. Figure 16: Consumption Overlap between Provinces (Sept. – Nov. 2015)Total Expenditure (mil. EUR)Fraction of Expenditure spent in each Province:Massa & CarraraLuccaLivornoGrossetoSienaConsumer's ProvinceMassa & Carrara6.2298.28%1.67%0.05%0.01%0.00%Lucca9.310.28%99.60%0.11%0.01%0.00%Livorno72.990.00%0.01%99.33%0.66%0.00%Grosseto25.070.00%0.00%0.35%99.64%0.01%Siena0.130.00%0.00%0.30%13.80%85.90%Others1.540.00%0.75%97.89%1.26%0.10%Total115.265.33%8.15%64.30%22.13%0.10%Table 17: Picking the Appropriate Market Definition TestAntitrust Market Tests:Action:Reason:? SSNIP test (1984 Merger Guidelines)? Critical Loss Analysis(Langenfeld and Li, 2001)? FERM test (e.g. Ivaldi and Lorincz, 2005)EliminatedNot able to estimate the price and cross-price elasticities of this data without making huge modeling assumptions.Economics Market Tests:Action:Reason:? Price Correlation Comparison (e.g. Nestlé–Perrier merger)EliminatedNot relied on by the courts. (see Werden, 1992, pg. 212)? Natural Experiments on Price Movements (Davis and Garcés, 2009, pg. 185-188)EliminatedDo not know of enough naturally occurring events to test effects between all clusters.? Granger Causality Tests(Cartwright et al., 1989; Slade, 1986)EliminatedGiven the cyclical nature or prices and the fact that the dataset is only six years long.? Stigler and Sherwin’s (1985) Stationary TestEliminatedAmbiguous results, because there is no clear-cut threshold on what is stationary and what is not.? Forni’s (2004) Stationary TestsUsed in this paper.There is enough data to get clear results.Note: You use different tests in different situations.Table 18: Results for Stationarity Tests between ClustersProvince of Massa & Carrara and Province of Luccak1k2k2S--k3SSProvince of Livornok4k5k6k7k8k5D--------k6DS------k7DSS----k8DSSS--k9DSSSSProvinces of Grosseto and Sienak10k11k12k13k11S------k12SS----k13SDS--k14SSS?Key:S = Tests show that the price ratio is stationary, which suggests that the clusters are in the same market. D = Tests show that the price ratio is non-stationary, which suggests that the clusters are in different markets? = Not enough data to conclude either way. Fail to reject both tests. Gibbs Sampling Algorithm for LDAIn this section, we define Gibbs LDA. Also, the Gibbs LDA literature ignored a few statistical issues. We address these issues here.B.1 Derivation of the Gibbs EstimatorFirst, we define Gibbs LDA. Griffiths and Steyvers (2004) make the following simplifying assumption: Each ?k is drawn from a Y-dimensional Dirichlet distribution with a Y-dimensional parameter of β, where Y is the number of possible products.This assumption does not prevent cluster shares from depending on prices and incomes. It only assumes that the result, ?k, is approximated by a Dirichlet distribution.Griffiths and Steyvers (2004) estimate the model backwards. The model defines the probability of drawing y given k. They estimate the probability of k given a draw of y.The conditional distribution of k given y satisfies:pk|y=pk,ypy∝pk,y=pkpy|k(B1)The term pk is the probability of a random draw k from a Dirichlet Conjugate Multinomial distribution. This distribution is a random draw of from a multinomial where the parameter θ?is a random draw from a Dirichlet distribution. In this case, ki,n is drawn by: first drawing consumer segment i's tastes θi~Dirα, and then drawing ki,n~Multθi. From Lemma 2 (see Appendix C), we have the probability that consumer i purchases from cluster k given his cluster assignments for the consumer’s other Ni-1 transactions is:pki,Ni=k|ki,1,…,ki,Ni-1=αk+ckA+Ni-1≡θi,Ni(B2)where ck is the number of other purchases in cluster k; and A=kαk.Note that limNi→∞θi,Ni=θi,Ni.Likewise, py|k is the probability of a random draw y from the Dirichlet Conjugate Multinomial distribution.From Lemma 2 (see Appendix C), we have that the probability that a draw from cluster k is product y given the other product assignments for the cluster’s other Lk-1 transactions is:pyk,Lk=y|yk,1,…,yk,Lk-1=βy+cyB+Lk-1≡?k,Lk(B3)where ck is the number of other purchases in cluster k.Similarly, note that limLk→∞?k,Lk=?k,Lk.Therefore, we have:pki,Ni=k|ki,1,…,ki,Ni-1,y∝αk+ckA+Ni-1*βy+cyB+Lk-1(B4)B.2 Gibbs Sampling AlgorithmThe Gibbs sampling algorithm is:Choose α, β, and K.Randomly assign each product purchase, yi,n, to an initial cluster ki,n∈1,…,K. Call this cluster assignment ki,n0, and call the collection of all these cluster assignments k0.For t=1,..., until convergence, randomly sample each product purchase, yi,n, to a new cluster assignment using (B4) with kt-1 in place of k. Call these cluster assignments kt.B.3 Simplifying the EstimatorNote that (B4) uses αk and βy as smoothing factors. αk assigns the probability of observing cluster k when there are few or no observations in the consumer segment. This effect disappears as Ni approaches infinity. βy assigns the probability of purchasing product y when there are few or no observations assigned to a cluster. This effect disappears as Lk goes to infinity.In large datasets, it is not practical to solve for α?and β. Changing αk and βy has little effect on (B4). Therefore, Steyvers and Griffiths (2007) recommend we use α=50/K and β=.01.Note that this does not mean that all clusters are of equal size and this does not mean that all product purchases are equally likely. It only assumes smoothing. It has almost no effect on model fit and clustering.Therefore, (B4) simplifies to (2). B.4 ConvergenceGriffiths and Steyvers (2004) do not prove convergence, so we discuss it here. Gibbs Sampling of this form does not always statistically converge, as it is not reversible. However,?Robert and Casella (2004, pg. 375-376) show that the following move from kt-1 to kt is reversible and thus does converge:Randomly index each product purchase in a new order from τ1t,...,τNt.Simulate?kτ1tt: using k=?,kτ2tt-1,…,kτNtt-1.Simulate?kτ2tt: using k=kτ1tt,?,kτ3tt-1,…,kτNtt-1.…Simulate?kτNtt: using k=kτ1tt,…,kτN-1tt,?.In practice, data scientists do not sample Gibbs LDA this way. Sampling this way slows computation time. Instead, they numerically check for convergence of model fit (see Figure 9). We do not sample this way to be consistent with the literature and to enable me to use their programs.Also, we have that when it does converge (i.e. when the MCMC algorithm is stationary), it necessarily converges to the true parameters (i.e. the target distribution). Because all conditional densities exist and are positive everywhere, there is a positive probability to pass from any state to any other state. Therefore, by Markov's Theorem, Gibbs LDA is ergodic. Robert and Casella (2004, Theorem 10.6) show that this is sufficient for a stationary distribution to be the target distribution.B.5 Estimating Standard ErrorsIn this section, we discuss standard errors. In B.5.1, we describe how to estimate standard errors in a MCMC algorithm with the method of batch means. This is the first application of this method with Gibbs LDA. In B.5.2, we describe a cluster index switching problem that makes most standard errors in LDA impossible to estimate. We explain why this problem does not apply to estimating standard errors of model fit. This paper is the first to estimate standard errors of the model fit of Gibbs LDA.B.5.1 Batch Means Standard ErrorsIn this section, we explain how to estimate standard errors using the?method of batch means. This paper is its first application with Gibbs LDA.This method is:Suppose X is a random variable (or a vector of random variables) in a MCMC. Let Xt be the tth iterative random draw of X. Suppose we want to estimate μ=E[g(X)].The idea is to run the Markov Chain for N=ab iterations, where a is the number of batches and b is number of iterations in a batch. Then the kth batch mean is:Yk≡t=k-1+1kbgXtb(B5)Assuming that b is large enough to ensure independence of batch means, the batch mean estimate of the mean of Yk is μ≡1ak=1aYk, and the batch mean estimate of the variance of Yk is:σ2≡ba-1k=1aYk-μ2(B6)Then batch mean estimate of μ?is μ?with a standard error of σ/N. See Jones et al. (2006) for details.Jones et al. (2006) recommend using b=?N?. Glynn and Iglehart (1990) and Glynn and Whitt (1991) show that fixed b does not consistently estimate the standard error. Jones et al. (2006) show that b=?Nφ?, where φ∈0,1, is consistent. Then, they show that b=?N? performs better than b=?3N? in several realistic examples.In practice, it is customary to discard the first group of samples before the first batch. According to statistical theory, a discard period is unnecessary, because we do not rely on stationarity. In practice, a discard period is useful. It eliminates transient bias from a random initial X0.Statisticians debate over the best size of the discard period. Angelino et al. (2016) recommends discarding the first half of your results. Geyer (1992) argues this is unnecessary. You should not throw away more iterations than it takes for the autocovariance to become negligible. He claims that a discard period of 1% or 2% usually suffices. Gilks et al. (1995, pg. 14) claims that it depends on the rate of convergence. You need to look at global convergence, or you need to run independent parallel chains to test for the speed of convergence.In this paper, we discard the first 64 iterations and run 44 batches of 44 iterations. This is consistent with the observed rate of convergence. And it gives me 2000 iterations, which is consistent with the LDA literature.B.5.2 Cluster Index Switching ProblemUnfortunately, a cluster index switching problem makes estimating most standard errors impossible. The same cluster could associate with different index numbers in two different samples (Griffiths and Steyvers, 2004, pg. 5230).For example, Gibbs LDA could switch the index numbers of the "north" and "south" clusters. In one iteration, the first cluster could be "north" and the second cluster could be "south". While in another iteration, the first cluster could be "south" and the second cluster could be "north".As a result, we cannot estimate the standard errors of Φ?and θ. These matrices depend on the index numbers of the clusters. That said, we can estimate the standard errors of model fit statistics. Model fit does not depend on the index number assignments of the clusters.This paper is the first to estimate standard errors of the model fit of Gibbs LDA.Math AppendixThe Gibbs LDA literature treats these two essential lemmas haphazardly. The best reference that we could find is Mimno (2016), which is incomplete. Therefore, we include complete proofs here. Lemma 1 (Mimno, 2016).if a K-dimensional vector θ is drawn from a Dirichlet distribution with a positive K-dimensional parameter α, and if a sample x1, ..., xN is drawn i.i.d. from a multinomial distribution with a parameter of θ, then px|α=ΓAΓA+NkΓαk+ckkΓαk where ck is the count of the number of x = k and A=kαk.Proof. From (1):px,θ|α=px|θpθ|α=k=1Kθkck*Γk=1Kαkk=1KΓαkk=1Kθkαk-1=Γk=1Kαkk=1KΓαkk=1Kθkck+αk-1This works because the Dirichlet and the multinomial are a conjugate pair.Because the probability distribution of any variable must integrate to one when we integrate over all possible values:Γk=1Kαkk=1KΓαkk=1Kθkαk-1dθ=1Γk=1Kαkk=1KΓαkk=1Kθkαk-1dθ=1k=1Kθkαk-1dθ=k=1KΓαkΓk=1Kαkk=1Kθkck+αk-1dθ=k=1KΓαk+ckΓA+NTherefore:px|α=Γk=1Kαkk=1KΓαkk=1Kθkck+αk-1dθ=Γk=1Kαkk=1KΓαkk=1Kθkck+αk-1dθ=Γk=1Kαkk=1KΓαkk=1KΓαk+ckΓA+N=ΓAΓA+Nk=1KΓαk+ckk=1KΓαk ?Lemma 2 (Mimno, 2016).if a K-dimensional vector θ is drawn from a Dirichlet distribution with a positive K-dimensional parameter α, and if a sample x1, ..., xN is drawn i.i.d. from a multinomial distribution with a parameter of θ, then pxN+1=kx1, ..., xN=αk+ckA+N where ck is the count of the number of x = k and A=kαk.Proof. By Lemma 1 and because Γ(x + 1) = xΓ(x):pxN+1x1, …, xN=px1, …, xN+1px1, …, xNpxN+1=k'x1, …, xN=ΓAΓA+N+1αk+ck'kΓαk+ckkΓαkΓAΓA+NkΓαk+ckkΓαk=αk+ck'A+N ?Construction of Price IndicesIn this section, we create price indices from the Coop data. First, we create store-level price indices. Hausman (1996) creates similar price indices. Then, we create the cluster-level price indices from the store-level price indices. And, we remove seasonality.First, we map the products to the United Nation's revised classification of individual consumption by purpose (COICOP). The UN developed the COICOP to classify individual consumption expenditures according to their purpose. COICOP has four levels: divisions, groups, classes, and subclasses. The most general level is division. There are fifteen divisions. Each division divides into groups. There are 63 groups. Each group divides into classes. There are 186 classes. Each class divides into subclasses. There are 338 subclasses. See U.N. (2018) for more detail. Then, we create class-level price indices for every store. A class-level price index is its total expenditures (in EUR) divided by its total number of units.Then, we create a market bundle. This bundle is the total number of units of every class sold in Tuscany from Sept. - Nov. 2015. We drop classes with too few observations. Classes with too few observations account for 5.94% of expenditures from Sept. - Nov. 2015. Then, we create aggregate store-level price indices. Each index is how much it would cost to buy the market bundle at the class-level price indices. We drop one store that closed before Sept. 2015. Then, we create cluster-level price indices. Cluster k’s index is the weighted average of the store-level indices, weighing by our estimate of ?k.Then, we seasonally adjust each price index. We use the US Census's X-13 Arima program. We use the x11 method with log-additive seasonal adjustments.Web Appendix (make available online)This section will be available online. E.1 All Maps of Gibbs LDA Clusters and their Profilesk1: Northern Lucca[0.42 mil. EUR Expenditures]Consumer Concentration: second least concentrated; 71.8% of expenditures are from consumers residing in four municipalities (Castelnuovo di Garfagnana, Castiglione di Garfagnana, Pieve Fosciana, and Villa Collemandina).Store Concentration: second most concentrated; 98.6% of revenues from one store.k2: Southern Massa & Carrara /Western Lucca[14.89 mil. EUR Expenditures]Consumer Concentration: least concentrated; 88.2% of expenditures are from consumers residing in five municipalities (Carrara, Massa, Camaiore, Pietrasanta, and Viareggio).Store Concentration: least concentrated; 97.7% of revenues from four stores.k3: Bagni di Lucca (Eastern Lucca)[0.35 mil. EUR Expenditures]Consumer Concentration: fourth most concentrated; 78.4% of expenditures are from consumers residing in two municipalities (Bagni di Lucca and Borgo a Mozzano).Store Concentration: most concentrated; 99.8% of revenues from one store.k4: City of Livorno (N. Prov. of Livorno)[38.59 mil. EUR Expenditures]Consumer Concentration: most concentrated; 93.9% of expenditures are from consumers residing in one municipality (Livorno).Store Concentration: moderately concentrated; 92.1% of revenues from three stores.k5: North-Central Livorno[4.28 mil. EUR Expenditures]Consumer Concentration: moderately concentrated; 86.3% of expenditures are from consumers residing in two municipalities (San Vincenzo and Castagneto Carducci).Store Concentration: fifth least concentrated; 92.6% of revenues from two stores.k6: Cecina (South-Central Livorno)[8.11 mil. EUR Expenditures]Consumer Concentration: moderately concentrated; 76% of expenditures are from consumers residing in two municipalities (Rosignano Marittimo and Cecina).Store Concentration: third most concentrated; 98.4% of revenues from one store.k7: Piombino (Southern Livorno)[11.84 mil. EUR Expenditures]Consumer Concentration: fifth most concentrated; 94.4% of expenditures are from consumers residing in two municipalities (Campiglia Marittima and Piombino).Store Concentration: second least concentrated; 95.6% of revenues from four stores.k8: Island of Elba[6.18 mil. EUR Expenditures]Consumer Concentration: fourth least concentrated; 89.4% of expenditures are from consumers residing in four municipalities (Campo nell'Elba, Capoliveri, Porto Azzurro, and Portoferraio).Store Concentration: moderately concentrated; 91.6% of revenues from four stores.k9: etc. Livorno[7.88 mil. EUR Expenditures]Consumer Concentration: third most concentrated; 85.2% of expenditures are from consumers residing in two municipalities (Rosignano Marittimo and Castiglione della Pescaia).Store Concentration: moderately concentrated; 96.9% of revenues from three stores.k10: Northern Grosseto[3.48 mil. EUR Expenditures]Consumer Concentration: moderately concentrated; 89% of expenditures are from consumers residing in two municipalities (Massa Marittima, and Roccastrada).Store Concentration: third least concentrated; 96.2% of revenues from four stores.k11: Follonica (North-Western Grosseto)[11.11 mil. EUR Expenditures]Consumer Concentration: fifth least concentrated; 88.1% of expenditures are from consumers residing in four municipalities (Follonica, Gavorrano, Massa Marittima, and Scarlino).Store Concentration: fourth most concentrated; 91.3% of revenues from two stores.k12: City of Grosseto (Central Grosseto)[7.08 mil. EUR Expenditures]Consumer Concentration: second most concentrated; 92.2% of expenditures are from consumers residing in one municipality (Grosseto).Store Concentration: fifth most concentrated; 90% of revenues from two stores.k13: Southern Grosseto[4.91 mil. EUR Expenditures]Consumer Concentration: moderately concentrated; 88.2% of expenditures are from consumers residing in two municipalities (Monte Argentario and Orbetello).Store Concentration: fourth least concentrated; 88.4% of revenues from three stores.k14: Western Siena[0.51 mil. EUR Expenditures]Consumer Concentration: third least concentrated; 75.1% of expenditures are from consumers residing in three municipalities (Monticiano, Civitella Paganico, and Grosseto).Store Concentration: moderately concentrated; 97% of revenues from two stores.Note: municipality boundaries data powered by MapIt ().We shade each municipality by expenditure. Dark green means that residents of the municipality spent a lot. Light green means that they spent a little. Yellow means that we do not observe any data from that municipality. Red stars are stores with at least 20% of the cluster’s expenditures. Purple triangles are stores with between 5% and 20% of the cluster’s expenditures. Blue dots are stores with less than 5% of the cluster’s expenditures. E.2 All Maps of E-H Clustersk1: Northern Lucca1 similar E-H clusterk2: Southern Massa & Carrara /Western Lucca2 similar E-H clustersk3: Bagni di Lucca (Eastern Lucca)1 similar E-H clusterk4: City of Livorno (N. Prov. of Livorno)2 similar E-H clustersk5: North-Central Livorno1 similar E-H clusterk6: Cecina (South-Central Livorno)1 similar E-H clusterk7: Piombino (Southern Livorno)3 similar E-H clustersk8: Island of Elba1 similar E-H clusterk9: etc. Livorno1 similar E-H clusterk10: Northern Grosseto2 similar E-H clustersk11: Follonica (North-Western Grosseto)2 similar E-H clustersk12: City of Grosseto (Central Grosseto)2 similar E-H clusters33020254000k13: Southern Grosseto1 similar E-H clusterk14: Western Siena1 similar E-H clusterNote: municipality boundaries data powered by MapIt ().These maps are the same as Figure 12 with the E-H clusters superimposed. The red dashed and dotted lines are the boundaries of different E-H clusters. ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download