University of Southern California



Growth and popularity in markets for free digital productsGil AppelMarshall School of BusinessUniversity of Southern Californiagappel@marshall.usc.eduBarak LibaiArison School of BusinessInterdisciplinary Center (IDC), Herzliyalibai@idc.ac.ilEitan MullerStern School of BusinessNew York University Arison School of BusinessInterdisciplinary Center (IDC), Herzliyaemuller@stern.nyu.eduJune 2016The authors would like to thank Gal Elidan, Zvi Gilula, Jacob Goldenberg, Hema Yoganarasimhan, Scott Neslin and Oded Netzer for their advice and helpful comments during the research process.Growth and popularity in markets for free digital productsAbstractFree digital products (FDPs) dominate online markets, yet our knowledge and theories about their growth are based mainly on conventional goods. We demonstrate how FDPs’ growth dynamics differ from those observed for conventional new products, using a large-scale dataset that documents the growth of close to 60,000 FDPs, and supported by an additional growth analysis of thousands of mobile apps. We find that FDPs display three distinct patterns of growth: bell-shaped pattern (“Diffuse”); exponential-type decline (“Slide”); and a combination of the two (“Slide and Diffuse”). We further show a robust relationship between FDP popularity and growth pattern ubiquity, providing the first evidence of a correlation between products’ popularity and growth patterns. We further show how FDP-related growth phenomena help to explain the patterns that emerge, and elucidate the need to adapt our knowledge on new product growth and its modeling to the fast-moving world of free digital products.Keywords: diffusion of innovations; free products; mobile applications; product life cycle; social influence; software1. IntroductionAn intriguing development in the consumer market landscape is the substantial increase in the number of digital products available for free (Anderson 2009). Free digital products (FDP) have been available for a while for computer software products supplied via online platforms, joined recently by similar FDPs for smartphones and web applications. Some of this availability stems from the “freemium” business model, under which a certain percentage of adopters will eventually upgrade to a less restricted version or purchase in-app byproducts (Kumar 2014). Yet the increase in FDPs also follows other developments such as the rise of open-source software collaboration projects, where many users join forces to produce software products that will be free except for technical support (Mallapragada et al. 2012). Recent reports highlight the ubiquity of the phenomenon: More than 90% of recently downloaded smartphone applications were free, with this percentage expected to continue rising in the foreseeable future (Olson 2013; AppBrain 2016). In established markets (e.g., task management tools and anti-virus programs), a fierce battle is being waged among “freemium” and “premium” business models (Dunn 2011; Woods 2013).The question we put forth is whether the new product growth generalizations and insights developed over the years in markets for conventional products apply also to FDPs. In particular, we focus on the essential shape of growth. Previous research has maintained that growth in digital environments in general (Rangaswamy and Gupta 2000) and FDP in particular (Jiang and Sarkar 2010; Lee and Tan 2013) follows the commonly observed S-shaped diffusion patterns – with bell-shaped non-cumulative growth – and can thus be analyzed using traditional diffusion models. However, when we began to examine the growth patterns of tens of thousands of FDPs (to be described later), the picture that emerged was different: While a bell-shaped pattern implies growing demand early on, we find that in the FDPs examined, the correlation of month-to-month growth in the first year was positive for only about 40% of the products. What further stood out was that the extent of the phenomenon was highly correlated to the level of product popularity: The percentage of positive correlations monotonically decreased with the popularity level, down to 26% positively correlated patterns for the less popular bottom 10% of popularity. This is not the conventional pattern of growth we read about in the new product textbooks.What can drive this phenomenon? Note that conventional products are associated with significant R&D costs, as well as costs of manufacturing, marketing, and maintaining market presence. Therefore, firms will invest in screening and testing of the products before market launch, and will be motivated, internally or due to channel pressure, to take a product off the shelf if it seems to fail. The case of FDP differs, in particular due to the low barriers to development and introduction of many digital products into the market, which may be reflected in two ways: First, the cost of adapting the product and offering it to small, specific niches is low, which leads to a “long tail” of supply (Brynjolfsson et al. 2010). If past research emphasized the ability of digital channels to enable a long tail of physical goods (Brynjolfsson et al. 2011), then the fact that the goods themselves are digital enables even better tailoring to small niches. Second, given the lack of barriers, there is an increased presence of small and less experienced suppliers with low resources to invest in marketing, so that we can expect to find many products whose low popularity stems from inability to reach larger audiences even if targeted otherwise. Indeed, it is reported that a large share of FDPs are considered failures, and eventually do not even cover development costs (Foresman 2012; Rubin 2013).Overall, whether the niche market was intended at the outset or not, consumers should face a large share of low-popularity products when considering supply in the FDP market, at least in the absolute number of offerings. Empirical data suggest that this is the case for markets such as free PC software (Zhou and Duan 2012) and mobile applications (Zhong and Michahelles 2012). In early 2016, for example, among the 1.87 million free android apps available, more than 60% had fewer than 1,000 downloads, and only about 1% were downloaded more than 1 million times (AppBrain 2016).This phenomenon raises an interesting question regarding the prevalent growth pattern of FDPs. Our knowledge on the diffusion of new products has been largely shaped in markets such as durables, pharmaceuticals, and services looking typically at highly popular cases of growth (Peres et al. 2010). In fact, one of the essential concerns with the understanding of innovation diffusion is that nearly all knowledge comes from successful innovations (Greve 2011; Rogers 2003). This lack of evidence on the growth pattern for what may be the majority of the FDP market is an issue of significant managerial and theoretical importance. The shape of the growth curve is considered “the most important and most widely reported finding about new product diffusion” (Chandrasekaran and Tellis 2007). Studying growth patterns is a fundamental stepping stone to the understanding of markets for new products: It is used to understand the driving forces of new products’ success; as a base for modeling and optimizing firm behavior in the context of new product introductions; for decisions of termination or further support for new products; and for segmentation by adoption times (Golder and Tellis 1997; Peres et al. 2010).Here we study the full spectrum of growth patterns in FDPs, providing comprehensive evidence for a fundamental difference between the growth of highly studied superstars, and the growth of the less popular majority. The ability to track information in the case of FDP markets provides an opportunity to conduct a large-scale analysis in a way seldom available to past new product growth researchers, and to overcome the problem of a left truncation bias to lack of data on the product’s early days (Jiang et al. 2006). We use data on the monthly level of downloads from launch-day of a large number of software products in multiple categories, with downloads per product ranging from a few hundred to millions, making this one of the largest new product diffusion studies to date. Our main data source is the SourceForge database, which enables us to study the growth of almost 60,000 free software products. We are able to complement this analysis by also looking at data on the growth of close to 7,000 mobile apps, which shows consistent results. The main insights can be summarized as follows:Three pattern archetypes dominate the growth of FDPs in our datasets: a bell-shaped curve (largely left skewed) that we label diffuse, an exponential-like decline starting at launch labeled slide, and a combination of the first two – slide & diffuse. Diffuse patterns represent about half of the cases in our database.The dynamics that lead a product to the “underdog” part of the long tail differ from the pattern that leads a product to become a superstar, as the ubiquity of the three archetypes is strongly related to the popularity of the products. Bell shapes are dominant in popular products, yet become a minority in small niche products. The fact that the very popular products are almost exclusively bell shaped may help to explain how previous research, which has been based on popular products, missed this relationship.Two phenomena that characterize FDP markets help explain the shape of growth: The first is the inception effect, representing disproportional early-onset external effects, which explains the slide phenomenon in the presence of social influence. The second is the recency effect, which implies that in free digital markets, recent adoptions (and not only cumulative adoptions as traditionally used in diffusion models) help explain the dynamic effect of social influence on growth.Recency is in particular important in helping to differentiate between popular and less popular products. The association of recency and growth is more than double among the top popular 10% compared to the bottom 10% in popularity. We further find evidence that recency level in a category is associated with the shape of the popularity curve, so that higher average recency level in the category is associated with higher inequality, captured by the Gini coefficient. These findings are significant to our understanding of FDP growth, and to attempts to model and optimize growth in such markets. In a broader theoretical sense, these findings imply that generalizations that developed along the product life cycle, its turning points, and its drivers (Golder and Tellis 2004) may need re-examining in the rapidly growing, dynamic world of free digital products.2. Background2.1. Related literatureOur study relates to a number of research avenues:Markets for free digital products: Research on FDPs has examined issues such as optimal initial spread of freeware as part of profit maximization in the longer run (Cheng and Liu 2012; Niculescu and Wu 2014), free-riding and competitive dynamics (Haruvy and Prasad 2005), and the impact of the creation process on success (Grewal et al. 2006). Other research has focused on the effect on demand of bestseller ranking and consumer ranking (Carare 2012; Lee and Tan 2013), as well as other factors such as price discounts on in-app purchases (Ghose and Han 2014). We add to this growing literature by providing the first large-scale analysis of the growth patterns of FDPs, which is significant in particular given the assumption that FDPs grow and should be modeled in a manner similar to other products typically described by the Bass diffusion model (Jiang and Sarkar 2010; Yogev 2012; Lee and Tan 2013).The long tail. From another angle, this work is also related to efforts to understand the nature and significance of supply and demand inequality in electronic commerce, often considered in the context of a “long tail”. Previous literature in that area has focused on the factors that affect the pattern of sales, and in particular whether it leads to higher shares of sales among low-selling niche products, or alternatively among high selling “superstars”. Looking at both supply-side factors, such as broader product variety and distribution channel dynamics and lower stocking costs, and demand-side factors, such as reduced search costs (Elberse and Oberholzer-Gee 2007; Brynjolfsson et al. 2009, 2011; Hinz et al. 2011; Kumar et al. 2014), considerable attention has been given to the inter-customer effect in the form of recommendations and reviews in the creation of the long tail; yet also providing more “thrust” to superstars (Fleder and Hosanagar 2009; Oestreicher-Singer and Sundararajan 2012; Hervas-Drane 2015; Zhu and Zhang 2010).We add to this literature an exploration of the dynamics at the individual product level along the curve. If previous approaches have generally accepted the existence of “underdog products” and “superstar products”, we ask how a product gets to become one or the other.Patterns of innovation growth: In a more general sense, our effort is related to the ongoing efforts to study the pattern of new product growth, which spans numerous disciplines (Rogers 2003). The fact that the adoption rate of successful innovations follows a bell-shaped or logistic-type curve, and a cumulative S-shaped curve, is considered one of the fundamental discoveries of social science, and was largely attributed to the dynamic role of social influence among customers in various forms (Young 2009; Peres et al. 2010). While there is evidence of some exceptions to the S-shaped curve with a cumulative r-shaped (non-cumulative exponential decline) pattern for entertainment goods such as movies and for supermarket goods (Gatingon and Robertson 1985; Sawhney and Eliashberg 1996), the perception across disciplines is that “the S-curves are everywhere” (Bejan and Lorente 2012). Indeed, these patterns form the bases of diffusion-of-innovations theory and forecasting new product growth using consistent growth shapes, such as the Bass model, Gompertz, or logistic curves (Meade and Islam 2006).We add to this literature in two ways. First, we highlight FDPs as an additional, yet separate category that is not necessarily dominated by S-shaped curves, and show how growth characteristics of FDPs can explain the various shapes. In a more general sense, we provide initial evidence for the relationship between product popularity and the shape of growth, an unexplored issue in a research stream that has focused on highly popular products.2.2. Modeling FDP growthSince our aim is to examine growth along the FDP popularity curve, we will need to model the growth of an individual free digital product. Two fundamental effects that lead to the commonly observed S-shaped curve are considered when modeling the growth of new products (Mahajan et al. 1990): The internal influence captures the impact of previous adopters via word of mouth, imitation, and network externalities, typically considered a function of the number of cumulative adopters to date. The external influence captures influences outside of the group of previous adopters, such as advertising and mass media. We argue that an adaptation is needed in both types of influence is to capture the growth in FDP markets as follows:Internal influence and recency effect. While diffusion modelers have largely used the number of cumulative adopters as a sole indicator of internal influence, some recent work points to a possible need to separate the effect of recent adopters from that of cumulative number of adopters, attributed to the difference in intensity of word of mouth in the two groups (Hill et al. 2006; Iyengar et al. 2011). It has been suggested, for example, that recent adopters may be more contagious than consumers who adopted less recently, as the former are more enthused and/or credible (Risselada et al. 2014).We contend that in particular the growth in FDPs should allow this distinction. First, it is often reported that for many FDP users, usage and engagement center on the time right after adoption (Danova 2015). Second, it is well accepted that adopters of FDPs (and other digital goods) rely heavily on popularity ranking information as appears in social media, app stores, and download sites (Carare 2012; Garg and Telang 2013; Ghose and Han 2014; Lee and Raghu 2014). Yet, as is clearly observable, rankings do not necessarily reflect cumulative downloads, but rather reflect past period popularity (Neitz 2015). This means that the recent number of downloads, and not only cumulative ones, may play a pivotal role in FDP download decision making. In fact, popularity rankings may also affect users who do not consider this information explicitly, but rely on search. For example, it is reported that search results of engines belonging to Google and Apple also largely depend on recent popularity ranking when displaying results (Walz 2015).External influence and the Inception effect. External influence is traditionally a parameter that captures the marketing mix in the industry, in particular that of advertising (Mahajan et al. 1990). In the absence of large-scale advertising support for many FDPs, and given the dominance of social media, much of the external influence comes from social media articles and experts’ recommendations and ratings. However, attention to new products may be short lived: Given the large number of launched products, the attention given to a new product centers on the beginning of its life cycle. In fact, even when considering firms that do invest in advertising to promote FDPs, there is a strong motivation to focus on the early period of growth. It is argued that FDP producers have a short window of time in which to generate the groundswell that can lead to attention by sources such as the charts in the app stores, and thus they must act early on (Rice 2013; Kimura 2014). Consequently, those FDP developers who invest in marketing may often do so in “burst campaigns” that are meant to get them on consumers’ radar early in the game (ADA 2014; Klein 2014). Overall, we can expect that for FDPs, external influence will be particularly strong early on in the new product’s life, a phenomenon that we label the inception effect. This effect can be reflected in decay in the external influence parameter’s value over time.Following these, we will use an FDP growth model that takes into consideration the inception and recency effects. We begin with the fundamental Bass product growth model, which is widely used to model the growth of new products. Under this approach, expected adoptions at time period t (between t and t+1) are assumed to be reflected in the following equation, where N is the market potential, Xt is the cumulative number of adoptions up to time t, p is the force of external influence, and q is that of internal influence:(1)To capture the inception effect, we let the external parameter be a varying function of time with an initial external influence parameter (p), using an external decay parameter (δ), to capture the decay in marketing effect over time. If the external decay parameter is positive, external influence intensity decays with time. To capture the recency effect, we separate the internal influence into two sub-parameters: As in the classic diffusion approach, parameter q captures the effect of cumulative adoptions. The recency parameter r captures the effect of recent adoptions, so that we multiply the relative change in the past period by r. We can now write the model as follows:(2) 3. Growth Patterns at SourceForge3.1. DatasetOur primary source of data is , a large, open-source software (OSS) repository that empowers software developers to control and manage open-source software, and enables users to download these products for free (Madey 2013). As of June 2013, when we scraped the data, SourceForge offered about 400,000 registered projects, with 3.4 million registered developers and 4 million downloads a day. As such, it is among the largest download sites, and home to some well-known consumer software products such as VLC media player, eMule, and 7-Zip. In fact, many users may not be aware that products they download from various software download sites are actually hosted by SourceForge.Scraping SourceForge, we retrieved the monthly history of downloads for a large number of products. The number of downloads is largely used to assess the success of open-source products (Grewal et al. 2006; Daniel et al. 2013) and in a broader sense acts as a proxy for the success of free products (Chandrashekaran et al. 1999). While SourceForge contains a large number of products (close to 400,000), many of them are inactive and had zero downloads, and thus are not relevant to our analysis. We focused on the download patterns for the 59,343 products that met the following criteria:Data from five years of growth. We looked at a 60-month window for all products. Naturally, the life cycles of FDPs are considerably shorter than the typically analyzed growth of durables (although for some products, the cycle may be longer). Thus, to reduce cases of right censoring and to use a consistent time frame, we considered only products launched before mid-2008. Nonetheless, our analysis suggests that we covered the majority of downloads for the various products.At least 200 downloads at the five-year window. This criterion enabled us to capture actual growth processes that are not affected much by possible developers’ noise over the product life cycle.The distribution of downloads in our data points to a large variance in downloads among the products (see Figure 1). In our dataset, 41% of products had less than 1,000 downloads, while about 0.6% (329 products) had more than one million downloads. The Gini coefficient is 0.96, which indicates a high concentration, larger than those reported for markets such as videos and books (Oestreicher-Singer and Sundararajan 2012). As our focus of interest is the shape of the growth, and in order to be able to compare between patterns, we scale each pattern to a (0,1) scale by dividing each observation by the total sum of downloads. We further elaborate on this scaling in Section 4.2.Figure 1: Distribution of popularity in SourceForge3.2. Patterns and estimation of data and modelWe break identifying patterns of growth into two stages: We first use the FDP model presented in Section 2 to smooth the data, particularly essential given our use of monthly data, which is much noisier than the classical annual diffusion data. Our analysis shows that not only does using the FDP model have the advantage of being theoretically driven, but it also creates a better smoothing algorithm than do alternatives such as HP filters. See Appendix A for a discussion.For the estimation we use general nonlinear optimization. Since we estimate scaled data (with a sum of 1), we use the augmented Lagrange multiplier method to ensure that our estimations always sum to one. To examine our estimations’ fit, we consider two fit measures: The first is R2, with an average value of 47.7% (as we use nonlinear estimation, an adjusted R2 coefficient could not be calculated). Recall, however, that large-scale monthly data can be very noisy. We do find a positive correlation between the R2 values and the log of each product’s downloads (ρ = 0.35), suggesting that when downloads are few and the data tends to be more noisy (as is the case with much of our data), the R2 levels may be lower. However, the R2 measure can be biased toward capturing the peaks (where the variance can be larger) compared to the entire curve.As an alternative to the R2 measure, we also use the Kullback-Leibler divergence (KL divergence) to measure the difference between our estimations and the data (Dzyabura and Hauser 2011; Gilula and McCulloch 2013). KL divergence weighs each observation and thus is less sensitive to the absolute value of the difference, weighing the relative difference instead. The average KL divergence is 0.2 (with a standard deviation of 0.22, median of 0.14, and mode of 0.05), and most of the divergence values are close to zero.Classifying the patterns and matching parameters to patterns. In a second stage, we determine the patterns that emerge. Consistent with past efforts to identify patterns and turning points in diffusion data, we use a peaks-and-valleys algorithm for the classification (Goldenberg et al. 2002; Golder and Tellis 2004; Chandrasekaran and Tellis 2011) using these rules:We count the number of peaks and troughs in the data.We require peaks to be substantial (10% over the start period or the previous valley) as well as troughs (a drop of 10% or more since the start period), or else they are ignored (Goldenberg et al. 2002).Using a difference algorithm, we find that there were at most two peaks and one trough in each pattern, which leads us to three general patterns:If a pattern climbs from the start toward a peak, then it is pattern that we label Diffuse, as it is consistent with what we may expect given diffusion theory.If a pattern begins with a drop in adoptions with no peaks, it is labeled a Slide pattern [named after playground slides].If a pattern begins with a drop in adoptions but has a later peak, it is labeled Slide & Diffuse (S&D).Table 1 presents some statistics on the resultant archetypes as well as the estimated model parameters per archetype. As we can be seen in the descriptive statistics in part a) of Table 1, the Diffuse pattern is the most ubiquitous in the data, with 48% of the software exhibiting this pattern (28,490 patterns); S&D patterns were nearly 28% of the data (16,583 patterns), and Slides accounted for 24% of the data (14,270 patterns).Table 1: Archetype pattern characteristics and estimationa) Descriptive StatisticsDiffuseSlideS&DNo. of patterns28,49014,27016,583Share (%) of patterns48%24.1%27.9%Average no. of downloads110,0936,3639,481Median no. of downloads2,698996900b) Model parameter valuesDiffuseSlideS&Dp (initial external effect)0.0090.0670.032q (cumulative effect)0.040.0380.053r (recency effect)0.4890.1840.318δ (external decay parameter)0.180.5461.829We next examine the relationship between popularity and the shape patterns. One way of doing so is to divide the dataset by equal download bins in terms of number of products (top 10%, 11%-20%, etc.). Figure 2 shows the relationship between bin membership and the percentage of each archetype in every popularity bin, in the case of equal-size bins, separating the top 1% from the rest of the top bin. We observe a strong monotonic increase in the share of Diffuse patterns from low-popularity products to high-popularity ones, and a monotonic decrease in the share of Slide and S&D patterns as products increase in popularity. While Diffuse is the clear majority (89% of the products at the top 1%), it represents a minority among the less popular products (less than third of the cases in the bottom 10% of the products).We can also see this pattern visually going back to Figure 1 above. In Figure 1, Diffuse patterns received a dark shade, while Slide and S&D received a pale shade. We can see how the color of adoptions is dark around the area of high downloads, and lightens as we look at the long tail of adoptions.Figure 2: Growth patterns and popularity in SourceForge data* Note that the top 1% is presented separately.3.3. A log scale analysisOne of the challenges of an equal decile analysis in a concentrated distribution is that the range with some bins may be very large. As can be seen in the upper portion of Table 2 below, when considering equal deciles, the range of downloads is very large in the upper decile, while in the lower decile, the range is small. To limit this variance, we took a long scale of the range of downloads (200 to over 300M) and divided it into 10 bins of download size. As can be seen at the bottom portion of Table 2, the within-bin discrepancy is now lower, however, in the upper bins there are far fewer products.Table 2: Range of downloads in each bin with equal and log-based binsEqual decilesBin 1 -Bottom Bin 2Bin 3Bin 4Bin 5Bin 6Bin 7Bin 8Bin 9Bin 10 - TopFrequency5,9345,9355,9345,9345,9345,9355,9345,9345,9355,934Min2003024476499571,4636494,0408,02723,066Max3024476499571,4632,3229578,02723,059329.4MLog-based binsBin 1 -Bottom Bin 2Bin 3Bin 4Bin 5Bin 6Bin 7Bin 8Bin 9Bin 10 - TopFrequency21,66318,33311,1885,1352,02168923162165Min2008363,50614,71661,767259,2551.1M4.5M19.2M80.5MMax8353,50514,71561,766259,2541.1M4.5M19.2M80.5M329.4MFigure 3 presents the shape ubiquity of the log scale deciles. We see that the ubiquity pattern seen in Figure b2 continues, displaying an even larger difference among the bins. The two bins that contain 21 products of 19.2 M downloads and up are composed of 100% Diffuse pattern.Figure 3: Growth patterns and popularity in SourceForge data (log-based bins)3.4. CategoriesThe patterns we see above reflect a blend of many product categories. Is the pattern ubiquity driven by a subset of the dataset, or is consistent across product types? To see which, we repeated the analysis of Figure 2 with the six most popular categories SourceForge uses. Figure 4 presents the results for the larger categories, and in Appendix B we can see the distribution of the patterns in all 16 categories. As can be seen in Figure 4, the ubiquity pattern identified above generally remains stable.Figure 4: Ubiquity of shapes in equal-download deciles by category (top six categories)3.5. Does it work outside of SourceForge?To what extent can our findings from open-source software be generalized to other freeware environments? In particular, smartphones have become a prominent freeware distribution outlet, to the extent that the vast majority of smartphone apps are freeware (Olson 2013). While data on large-scale smartphone app adoption over time is not readily available to researchers (Garg and Telang 2013), we were able to obtain the cooperation of a global firm, which we will call “Mobility” so as not to reveal its identity. Mobility is a player in a market of helping businesses create free smartphone apps that can be used as part of their business. Under this business model, Mobility creates the app and helps manage it for the client for a monthly fee. Mobility clients are varied and include service providers such as restaurants, artists, musicians, educational institutions, and non-profits. These clients offer free apps created on the Mobility platform for their own end users, who are typically individual customers or prospects. Mobility can track these apps’ downloads by end users over time.Figure 5: Growth patterns and popularity in Mobility data* Note that the top 1% is presented separately.The Mobility dataset is more limited than that of SourceForge in several aspects: The Mobility apps are specific to certain service providers, so are naturally relevant to much smaller market segments. In addition, unlike the case of open-source software, there is an entity (Mobility clients) that may make dedicated efforts to push the freeware via external effects, which we do not observe. While the time span we have for Mobility downloads is more limited, it is more detailed, as we observe weekly data for downloaded apps (between February 2011 and November 2013). Due to the smaller magnitude of adoption, we used data on apps that had at least 50 downloads, taking a minimum of 52 weeks, and truncated at 52 weeks. We thus had weekly adoption data for 6,914 smartphone apps. We repeated the analysis as in the first dataset, and found notably similar results to the SourceForge case. The patterns that emerged were grouped again in the same order of size into the archetypes of Diffuse (49%), Slide (30%), and S&D (21%). We can see that while the share of Diffuse patterns is close to that of SourceForge, the share of Slides is higher at the expense of S&D.The Mobility data can also help us examine the data from another angle, which might affect the archetypes: the issue of versions. FDP creators often release new versions (i.e., software updates) and in the SourceForge database, 63% of the products have released more than one version over the examined life cycle. One might wonder if the demand for versions can fundamentally affect the archetypes we see and their relationship to popularity. We looked at the issue in two ways: First, the Mobility data includes only one version, and as we can see, the extent and the pattern of archetypes remains the same. Second, in the SourceForge dataset, we looked at products that had only one version to see if within this group the dynamics of the entire groups reported above change. Here also, we found that the dynamics of archetypes’ ubiquity and popularity shown in Figure 2 largely remain the same for the one version only. Thus, versioning does not appear to be the driver of the phenomena we identify here. The descriptive statistics and parameter estimation per archetype for Mobility’s data is found in Appendix C in Table C1, parts a) and part b) respectively.4. Recency, inception, and the share of patterns4.1. Parameter values per shapeOur next aim was to see to what extent our data can help us to understand the relationship between popularity and shape ubiquity identified above. We thus now turn to examine the implication of parameter values that emerge from the FDP model we used, and see their relationship to popularity.Looking first at part b) of Table 1, we see a difference between parameter values of the different shapes. The difference between each pair of the archetypes was significant using a two-sample Hotelling’s t2 test, and similarly with a two-sample t-test. Following that, we want to ensure that our model’s parameters actually define and drive these patterns, and that the classification results are determined by the model and its parameters. We used a random forest classifier (Breiman 2001) to see if we can correctly match the classified patterns using the parameters of the freeware model. We indeed see that the random forest classifier shows a very low out-of-bag error of 2.27%. The resultant confusion matrix is found in Table 3:Table 3: Random forest confusion matrix results for the classification of the three patterns-66040952500PredictedActualDiffuseSlideS&DClassification errorDiffuse98.4%0.7%0.9%1.6%Slide1.4%96.6%2.0%3.4%S&D1.1%1.4%97.5%2.5%* Percentages are of the actual number of patterns. 4.2. Share of effects and popularityWe now turn to see if the effects represented by the parameters are related to product popularity. An interesting feature of diffusion modeling is that it allows us to further understand how the various parameters drive the distinctive shapes (Mahajan et al. 1990). Let T be the time horizon (T = 60 in the SourceForge data, and T = 52 in the Mobility data). Thus we set where m is a scaling factor of the observed data and is the total number of downloads up to time T. In addition, in order to remove the effect of popularity on our shape analysis, each observation is divided by the sum of observations over time (), and the equation has been translated into percentages in the standard manner by dividing both sides by. We calculated the sources of growth by breaking down Equation 2 into the main components that drive adoptions as follows:(3) Cumulative w-o-m effect (4)Recency effect (5) Inception effect Turning to Table 4, we see a difference between the three archetypes in both parameter value and share of patterns. While the inception effect is especially dominant for Slides, it has the lowest share for the other two archetypes. Table 4: Share of pattern attributed to each effect by archetype patternShare of pattern attributed to:DiffuseSlideS&Dp + δ (inception effect)26.8%49.3%11.5%q (cumulative w-o-m effect)41.2%40.1%67.5%r (recency effect)32.0%10.6%21.0%Similar results were found in Mobility’s data in Table C1 (part c) of Appendix C. Figure 6 further elucidates the relationship between share of effect and popularity. We see the average share of influence of the external, recency, and cumulative effects in various popularity tiers, generated from Equations 3-5. The direction is clear: The share of recency increases dramatically from 15% at the lowest popularity products to 40% at the upper 10%, while the cumulative effect and the inception effect monotonically decrease in their share from less popular to more popular product tier.Figure 6: Share of patterns and profitability To further understand the source of difference among the patterns in this respect, consider Figure 7, in which we graph the dynamics of the share of each effect over time for each pattern archetype, using the average parameter values for each pattern presented in part b of Table 1.Figure 7: The temporal dynamics’ shares of effects in the three patternsFigure 7a: Diffuse Figure 7c: Slide & Diffuse Figure 7b: SlideConsider the case of a Slide pattern (Figure 7b). An exponential-like decline in demand was considered in the past in two types of markets. In the case of low-involvement supermarket goods, the explanation – a slide-like pattern – was attributed to lack of inter-customer social influence, and a dominant role of external effects such as advertising (Fourt and Woodlock 1960; Gatingon and Robertson 1985). However, FDPs typically are not promotion driven, and there is no reason to assume that they are unaffected by social influence (see Aharony et al. 2011 for the role of social influence in such markets). In the case of entertainment goods such as movies, a Slide pattern was identified in particular for blockbusters, and was explained by the anticipation leading up to the movie’s release, which on the one hand can create a social influence process pre-release, and on the other hand drives marketers to invest large resources in advertising and screens early on in these blockbusters’ life cycles (Moe and Fader 2002; Ainslie et al. 2005). It is not unlikely that such a phenomenon will be relevant to the continuous flow of free products we examine. In particular, we see an opposite effect to that of movies: For FDPs, it is the least popular products that exhibit a Slide pattern, not the most popular ones, indicating a different process.What Figure 7b suggests instead is an inception that is driving demand in particular early on. However, later on it is joined by a cumulative effect, which has a large share – though not as large as inception – in driving growth. Thus, while we don’t need to assume a lack of social influence to explain the declining Slide pattern, the inception effect is not enough to create a highly popular product. To create a bell-shaped Diffuse pattern, social influence should kick in relatively early, and become dominant. What is in particular interesting from the Diffuse dynamics in Figure 7a is the role of the recency effect. For products that enjoy social influence, recency becomes the dominant social influence early on, immediately following the external influence ignited by the inception effect. Only later when there are enough adopters does the cumulative effect become dominant. As can be seen in Table 4, the cumulative effect has the overall largest share in Diffuse, yet not much larger than the recency effect. If the recency was not there to start the social process that eventually will bring in a larger number of adopters, the product could have remained a lower-popularity Slide.As Figure 7c suggests, S&D begins with an inception effect that is stronger than that of a Diffuse (yet weaker than that of a Slide), and a recency effect that is weaker than that of a Diffuse (yet higher than that of a Slide). In such a case, while the initial pattern is that of a decline, the product creates enough social influence to turn the pattern around later on, i.e., social influence begins to dominate due to the recency effect, and eventually the cumulative effect becomes dominant. Overall, it seems that a recency effect is critical to create a popular FDP, explaining why we see a large difference in the role of recency between the high- and low-popularity products (Figure 6). Popular products create enough social influence early on, whether by direct word of mouth or by ranking information to potential adopters, so that recency and later on the cumulative effect begin to drive demand upward. What could have remained a less-popular Slide thus become a more-popular Diffuse.5. Discussion and Conclusion5.1 The fundamental findingsThe first core insight emerging from our findings is that Free Digital Goods have distinct growth patterns. A large body of research has used information on the adoption of traditional durables and services to teach us how new products grow, and constitute the base for managerial thinking on product introduction in general. The fundamentally different shape of growth for FDP indicates that we need to be cautious in applying past diffusion knowledge to these environments. One could use the case of movies as an example in that sense: Given the unique dynamic of growth and profitability in the motion picture industry, its growth and dynamics have been largely analyzed separately from other categories (Eliashberg et al. 2006). The case of FDP growth may require a similar consideration.A second essential issue relates to the relationship between popularity and growth. The strong monotonic relationship between popularity and the shape of growth we witness suggests that past focus on popular products when analyzing new product growth can lead to a real bias. This issue is of particular importance for FDPs given the significant dispersion of demand and the presence of a significant long tail. This finding may have major implications for other categories as well, yet given that we have data only on FDPs, the generalizability of our finding to other categories remains to be investigated. The issue of popularity is interesting particularly in light of the rich research stream that has acknowledged the strong dispersion of product popularity in digital environments, and the existence of a long tail of demand (Brynjolfsson et al. 2010). While the 60,000 FDPs analyzed here show a large dispersion indeed, the fact that we could use individual-level growth data and not look at products cross-sectionally based on overall popularity, enabled us a unique opportunity to understand the creation of demand dispersion in digital environments. Our analysis suggests that for FDPs, environments, sales or downloads may start with a drop. The fact that products are free encourages potential users to download them even if they are not completely certain they need them. Since people often hear of FDPs most at the time of their release (the “inception effect”), the time after launch may be relatively high in adoptions. Yet in order to attract to a wider audience, external influence early on and even some word of mouth afterwards may not be enough: The product has to create an engagement that will produce social influence, which in turn will make it popular for a larger market potential. This stage will be driven by the recency effect, which represents the effect of recent adopters on potential ones. This effect is relevant to many product categories because of the high involvement and word of mouth characteristics of recent adopters. But it should have a special meaning for free digital products, as in FDP environments, consumers learn much from recommendation engines, ranking tables, and search results. As we discussed, all of these may be largely affected by the number of recent adopters rather than by cumulative adoption. If a product enjoys a strong enough recency effect early on, it will quickly move to grow in adoption, with a growing cumulative base of adopters that will join the recent ones. The growth then will be bell shaped, and it is thus no wonder that bell shapes are more strongly associated with popular products.In cases where word of mouth is not strong enough early on, the process may take some time, and the product will begin with a slide. However, given enough time, the social process will become dominant enough to spur a growth process that creates a Slide & Diffuse pattern. Our results are thus consistent with studies that cite the communication process, and in particular recommendation systems, as affecting the level of overall popularity (Fleder and Hosanagar 2009; Oestreicher-Singer and Sundararajan 2012). While the recency effect is driven by word of mouth between individuals, important drivers for FDPs are online recommendation systems (and search), which provide products with powerful enough social influence early on so that recent adopters affect new ones and spur the process of real growth. If the product is not appealing enough to draw people early on, the recency effect will not kick in and the product may remain in a slide situation.5.2 Individual product growth and the long tail phenomenonSo far we have not dealt with one of the main interests of the long tail literature – the magnitude of the variance in popularity between best-selling product and slow ones, and the nature of markets that increase this variance. While this highly discussed research question is not our focus, and our ability to examine the issue is partial given the limitation on product specific information in our large scale database, it is still of interest to investigate whether the product growth dynamics we highlighted here can help to explain the within product variance in popularity we see in digital markets. We did take a first look on the matter by considering the variance in popularity in different markets. Aiming to analyse markets that are as homogeneous as possible, we took advantage of the fact that beyond the sub categories used above, SourceForge divides products also to sub-sub categories (SSC). SSC are not always mutually exclusive for individual products, and in some the number of products is small so that within product variance is less applicable to examine. We examined 176 SSC (out of 316) where we had at least 50 products per category. We wanted to see if the value of within product parameters for the SSC can help explain the between-product variance in popularity that is reflected in the Gini coefficient of the specific SSC.Using an OLS regression with SSC Gini coefficient as the dependent variable and the average value of the parameters of the FDG growth model and SSC size as the independent variables, two parameters came out significant regarding the effect on the SSC Gini: SSC size – that is larger SSC’s are less equal, as may be expected (p < 0.05); and the parameter of recency – that is higher recency is associated with higher Gini coefficient and thus the less equal is the SSC (p < 0.001). The effect of recency on inequality is consistent with the insights discussed above on the importance of recency in FDG markets. Beyond the role of recency in the growth and success of individual products, we see indications that in markets where recency is high the difference between the less and more successful products is higher. Given our discussion above on the possible relationship between recency and the effect of recommendation systems, we see this result as supportive of research that highlights how recommendation systems can create inequality in digital markets (Fleder and Hosanagar 2009; Oestreicher-Singer and Sundararajan 2012; Hervas-Drane 2015): in markets when recommendation systems play a stronger role, recency effect may be higher leading to higher inequality between the long tail and the superstars. Yet, to further understand this relationship and additional analysis that uses smaller scale data yet is able to dive into the specifics of markets is needed. We believe this is a promising area for future research. 5.3 ConclusionThe ability to collect precise adoption data in a timely manner and for differing levels of popularity renders digital environments an unprecedented source of knowledge on the growth of new products. The abundance of data, in particular individual-level adoption data, social network data flows, and location information, ensure that much of our knowledge on growth is yet to come, and may demand updating of our beliefs and empirical generalizations created in times when such data were not available. We hope this study took a sizeable step in this direction. ReferencesADA. 2014. Discoverability: How to get noticed in a marketplace overflowing with apps. White Paper, Application Developers Alliance, Washington, DC.Aharony, N., W. Pan, C. Ip, I. Khayal, A. Pentland. 2011. Social fMRI: Investigating and shaping social mechanisms in the real world.?Pervasive and Mobile Comput.?7(6) 643-659.Ainslie, A., X. Drèze, F. Zufryden. 2005. Modeling movie life cycles and market share.?Marketing Sci.?24(3) 508-517.Anderson, C. 2009.?Free: The Future of a Radical Price, 1st ed. New York: Hyperion.AppBrain. 2016. AppBrain Stats. AppBrain, March 26. Available at , A., S. Lorente. 2012. The S-curves are everywhere. Mech. Engrg. 134(5) 44-47.Breiman, L. 2001. Random forests. Machine learn. 45(1) 5-32.Brynjolfsson, E., Y. J. Hu, M. D. Smith. 2010. Long tails vs. superstars: The effect of information technology on product variety and sales concentration patterns. Inform. Systems Res. 21(4) 736-747.Brynjolfsson, E., Y. J. Hu, D. Simester. 2011. Goodbye Pareto Principle, Hello Long Tail: The effect of search costs on the concentration of product sales. Management Sci. 57(8) 1373-1386.Brynjolfsson, E., Y. J. Hu, M. S. Rahman. 2009. Battle of the retail channels: How product selection and geography drive cross-channel competition. Management Sci. 55(11) 1755-1765.Carare, O. 2012. The impact of bestseller rank on demand: Evidence from the app market.?Internat. Econom. Rev.?53(3) 717-742.Chandrasekaran, D., G. J. Tellis. 2007. A critical review of marketing research on diffusion of new products. N. K. Malhotra, ed. Rev. Marketing Res. 39-80.—. 2011. Getting a grip on the saddle: Chasms, or cycles??J. Marketing 75(4) 21-34.Chandrashekaran, M., R. Mehta, R. Chandrashekaran, R. Grewal. 1999. Market motives, distinctive capabilities, and domestic inertia: A hybrid model of innovation generation. J. Marketing Res. 36(1) 95-112.Cheng, H. K., Y. Liu. 2012. Optimal software free trial strategy: The impact of network externalities and consumer uncertainty. Inform. Systems Res. 23(2) 488-504.Daniel, S., R. Agarwal, K. J. Stewart. 2013. The effects of diversity in global, distributed collectives: A study of open-source project success. Inform. Systems Res. 24(2) 312-333.Danova, T. 2015. The App-Store marketing report: User acquisition, retention, and strategies for getting apps to stand out. Business Insider, February 5. Available at , J. E. 2011. Free antivirus grabs more market share, claims Opswat survey. Techworld, June 8. Available at , D., J. R. Hauser. 2011. Active machine learning for consideration heuristics. Marketing Sci.?30(5) 801-819.Elberse, A., F. Oberholzer-Gee. 2007. Superstars and underdogs: An examination of the long tail phenomenon in video sales. Harvard Business School working paper.Eliashberg, J., A. Elberse, M. Leenders. 2006. The motion picture industry: Critical issues in practice, current research, and new research directions. Marketing Sci. 25(6) 638-661.Fleder, D., K. Hosanagar. 2009. Blockbuster culture’s next rise or fall: The impact of recommender systems on sales diversity. Management Sci. 55(5) 697-712.Foresman, C. 2012. iOS app success is a “lottery”: 60% (or more) of developers don’t break even. Ars Technica, May 4. Available at , L. A., J. W. Woodlock. 1960. Early prediction of market success for new grocery products. J. Marketing 25(2) 31-38.Garg, R., R. Telang. 2013. Inferring app demand from publicly available data. MIS Quart.?37(4) 1253-1264.Gatignon, H., T. S. Robertson. 1985. A propositional inventory for new diffusion research.?J. Consumer Res. 11(4) 849-867.Ghalanos, A., S. Theussl. 2014. Rsolnp: General non-linear optimization using augmented Lagrange Multiplier Method. R package version 1.15.Ghose, A., S. P. Han. 2014. Estimating demand for mobile applications in the new economy. Management Sci.?60(6) 1470-1488.Gilula, A., R. McCulloch. 2013. Multi level categorical data fusion using partially fused data.?Quant. Marketing Econom.?11(3) 353-377.Goldenberg, J., B. Libai, E. Muller. 2002. Riding the saddle: How cross-market communications can create a major slump in sales. J. Marketing. 66(2) 1-16.Golder, P. N., G. J. Tellis. 1997. Will it ever fly? Modeling the takeoff of really new consumer durables. Marketing Sci. 16(3) 256-270.—. 2004. Growing, growing, gone: Cascades, diffusion, and turning points in the product life cycle.?Marketing Sci. 23(2) 207-218.Greve, H. R. 2011. Fast and expensive: The diffusion of a disappointing innovation. Strategic Management J. 32(9) 949-968.Grewal, R., G. L. Lilien, G. Mallapragada. 2006. Location, location, location: How network embeddedness affects project success in open-source systems. Management Sci. 52(7) 1043-1056.Haruvy, E., A. Prasad. 2005. Freeware as a competitive deterrent. Inform. Econom. & Policy 17(4) 513-534.Hervas-Drane, A. 2015. Recommended for you: The effect of word of mouth on sales concentration. Internat. J. Res. Marketing?32(2) 207-218.Hill, S., F. Provost, C. Volinsky. 2006. Network-based marketing: Identifying likely adopters via consumer networks.?Statist. Sci.?21(2) 256-276.Hinz ,O., J. Eckert, B. Skiera. 2011. Drivers of the long tail phenomenon: An empirical analysis. J. Management Inform. Systems. 27(4) 43-70.Iyengar, R., C. Van den Bulte, T. W. Valente. 2011. Opinion leadership and social contagion in new product diffusion.?Marketing Sci.?30(2) 195-212.Jiang, Z., F. M. Bass, P. I. Bass. 2006. Virtual Bass model and the left-hand data-truncation bias in diffusion of innovation studies. Internat. J. Res. Marketing?23(1) 93-106.Jiang, Z., S. Sarkar. 2010. Speed matters: The role of free software offer in software diffusion.?J. Management. Inform. Sys. 26(3) 207-240.Kimura, H. 2014. Why app store keyword rankings drop dramatically seven days after launch. Sensor Tower, August 21. Available at , A. 2014. The Insider: Preparing your new app for launch. Tune, August 12. Available at , A., M. D. Smith, R. Telang. 2014. Information discovery and the long tail of motion picture content. MIS Quart. 38(4) 1057-1078.Kumar, V. 2014. Making “freemium” work. Harvard Bus. Rev. 92(5) 27-29.Lee, G., T. S. Raghu. 2014. Determinants of mobile apps’ success: Evidence from the app store market.?J. Management Inform. Sys.?31(2) 133-170.Lee, Y. J., Y. Tan. 2013. Effects of different types of free trials and ratings in sampling of consumer software: An empirical study. J. Management Inform. Sys. 30(3) 213-246.Madey, G. 2013. The SourceForge Research Data Archive (SRDA). University of Notre Dame, Feb. 14. Available at: , V., E. Muller, F. M. Bass. 1990. New product diffusion models in marketing: A review and directions for research. J. Marketing. 54(1) 1-26.Mallapragada, G., R. Grewal, G. Lilien. 2012. User-generated open-source products: Founder’s social capital and time to product release. Marketing Sci. 31(3) 474-492.Meade, N., T. Islam. 2006. Modeling and forecasting the diffusion of innovation: A 25-year review. Internat. J. Forecasting 22(3) 519-545.Moe, W. W., P. S. Fader. 2002. Using advance purchase orders to forecast new product sales. Marketing Sci. 21(3) 347-364.Neitz, R. 2015. Extensive Guide to App Store Optimization (ASO) in 2015 – Part 2: Google Play Store. Trademob, June 12. Available at , M. F., D. J. Wu. 2014. Economics of free under perpetual licensing: Implications for the software industry. Inform. Sys. Res. 25(1) 173-199.Oestreicher-Singer, G., A. Sundararajan. 2012. Recommendation networks and the long tail of electronic commerce. MIS Quart. 36(1) 65-83.Olson, P. 2013. The win for games: They grab two-thirds of app store sales. Forbes, September 19. Available at , R., E. Muller, V. Mahajan. 2010. Innovation diffusion and new product growth models: A critical review and research directions. Internat. J. Res. Marketing 27(2) 91-106.Rangaswamy, A., S. Gupta. 2000. Innovation adoption and diffusion in the digital environment: Some research opportunities. V. Mahajan, E. Muller, Y. Wind, eds. New-Product Diffusion Models. Norwell, MA: Kluwer Academic Publishers, 75-96.Rice, K. 2013. Why pre-launch hype is the key to app success. Kinvey, May 2. Available at , H., P. C. Verhoef, T. H. A. Bijmolt. 2014. Dynamic effects of social influence and direct marketing on the adoption of high-technology products. J. Marketing?78(2) 52-68.Rogers, E. M. 2003. Diffusion of Innovations. New York: Free Press.Rubin, B. F. 2013. The dirty secret of apps: Many go bust. Wall Street J., March 7. Available at , M. S., J. Eliashberg. 1996. A parsimonious model for forecasting gross box-office revenues of motion pictures. Marketing Sci. 15(2) 113-131.Walz, A. 2015. Deconstructing the app store rankings formula with a little mad science. , May 27. Available at , D. 2013. The battle of the freemium and enterprise business models in the task management market. Forbes, March 18. Available at , Y.?PhD thesis.?Department of ESS, Stanford University; 1987. Interior Algorithms for Linear, Quadratic and Linearly Constrained Non-linear Programming.Yogev, G. 2012. The Diffusion of Free Products: How Freemium Revenue Model Changes the Strategy and Growth of New Digital Products. Saarbrucken, Germany: Lap Lambert Academic Publishing.Young, H. P. 2009. Innovation diffusion in heterogeneous populations: Contagion, social influence, and social learning. Amer. Econom. Rev. 99(5) 1899-1924.Zhong, N., F. Michahelles. 2012. Long tail, or superstar? An analysis of app adoption on the Android market. LARGE 3.0 Conf. 11-14.Zhou, W., W. Duan. 2012. Online user reviews, product variety, and the long tail: An empirical investigation of online software downloads. Electronic Commerce Res. Appl. 11(3) 275-289.Zhu, F. Zhang, X. 2010. Impact of online consumer reviews on sales: The moderating role of product and consumer characteristics.?J. Marketing,?74(2), 133-148AppendicesAppendix A: Comparing model smoothing alternativesWe compared the R2 and the KL divergence measures from our model to other smoothing alternatives to examine the goodness of fit of our model. First, we examine other variants of the model. We examine the fit of a model without the recency and δ components separately, and without both effects (effectively collapsing to the Bass model). Second, we examine other smoothing methods used in time series modeling. The Hodrick-Prescott (HP) filter removes short term-cyclical components from the filtered graph, allowing us to separate short-term noises and retaining the long-term trend (Hodrick and Prescott 1997; Chandrasekaran and Tellis 2011). We use 129,600 and 14,400 as the smoothing coefficient (λ) commonly used for monthly data analysis with the HP filter. The Christiano-Fitzgerald filter (CF) has been examined as an alternative to the HP filter, offering better control over high-frequency fluctuations and better fitting more granular (e.g., monthly) data (Christiano and Fitzgerald 1998; Lamey et al. 2007; Van Heerde et al. 2013). We use two months as the minimum length of a software cycle in the CF filter, and examine 40 and 60 months as the maximum length of the software cycle. We also examine smoothing with a locally weighted least squared regression (LOWESS, Rust and Bornman 1982) and with penalized splines (Foutz and Jank 2010; Stremersch and Lemmens 2009). The results are in Table A1 below:Table A1: Goodness-of-fit comparison between modelsModel examinedR2KL DivergenceFDP growth model0.48 (.26)0.20 (.22)FDP growth model (without recency)0.40 (.26)0.26 (.49)FDP growth model (without δ)0.34 (.26)0.30 (.51)Bass model0.33 (.25)0.27 (.26)HP filter (λ = 14,400)0.36 (.22)0.27 (.27)HP filter (λ = 129,600)0.28 (.21)0.36 (.38)CF filter (max cycle = 40)0.39 (.23)0.31 (.45)CF filter (max cycle = 60)0.32 (.23)0.37 (.58)LOWESS0.34 (.23)0.28 (.51)Penalized splines0.73 (.16)0.12 (.16)Looking at Table A1, we see that the FDP growth model performs better than all other models and smoothing methods but one. While the penalized splines model fits the data better than the FDP growth model, the resulting penalized splines curve is too flexible and does not clean the short-term trends and outliers that are inherent in monthly-level data, thus offering a “ceiling” of fit that a smoothing algorithm can reach. If we increase the smoothing parameters of the penalized splines model to take that into account (see Appendix A in Foutz and Jank 2010 for further discussion), the fit drops rapidly.Appendix B: Distribution of pattern types by categories, SourceForge dataTable B1: Distribution of pattern types by categoriesCategoryDiffuse%Slide%S&D%CategorysizeDevelopment47.2%23.2%29.7%12,074Internet47.1%26.5%26.3%7,172System Administration48.0%23.2%28.7%6,267Communications47.8%27.2%25.0%4,987Games41.5%26.7%31.8%4,910Science & Engineering53.6%16.2%30.2%4,317Audio & Video51.1%23.3%25.6%2,957Security & Utilities46.3%23.5%30.2%2,542Business & Enterprise46.1%27.0%26.9%2,387Home & Education48.7%19.6%31.7%1,607Graphics48.9%23.2%27.9%1,593Desktop Environment48.4%26.3%25.3%1,186Other / Unlisted Topic46.5%23.2%30.4%764Multimedia47.6%22.6%29.8%477Mobile47.1%25.2%27.7%242Formats and Protocols49.1%33.9%17.0%112Software without an assigned category52.0%25.0%23.0%5,749Appendix C: Results for the Mobility datasetTable C1: Statistics and average parameter values for the pattern archetypesa) Descriptive StatisticsDiffuseSlideS&DNo. of patterns3,4122,0321,470% of patterns49%30%21%Average no. of downloads4,4048741,218Median no. of downloads414162196b) Model parameter valuesDiffuseSlideS&Dp (initial external effect)0.0160.1490.045q (cumulative effect)0.0830.0590.075r (recency effect)0.5170.2040.214δ (external decay parameter)0.2320.7461.475c) Share of pattern attributed to:DiffuseSlideS&Dp and δ (external effect)28.6%57.2%15.2%q (cumulative effect)41.5%32.7%71.7%r (recency effect)29.9%10.1%13.1% ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download