Micro Data Linking 2014/2015: Methodological report on ...



TOC \h \z \t "02 Kop 1;1;03 Kop 2;2" Executive summary PAGEREF _Toc442440045 \h 51.Introduction PAGEREF _Toc442440046 \h 71.1Aim of this report PAGEREF _Toc442440047 \h 71.2MDL 2014-2015 PAGEREF _Toc442440048 \h 71.3Structure of the report PAGEREF _Toc442440049 \h 92.Phase I: Production of MDL-database PAGEREF _Toc442440050 \h 102.1Background of the sources PAGEREF _Toc442440051 \h 102.2Availability of variables PAGEREF _Toc442440052 \h 102.3Experiences building MDL-database PAGEREF _Toc442440053 \h 102.4Conclusions and recommendations PAGEREF _Toc442440054 \h 113.Phase II: Validation PAGEREF _Toc442440055 \h 133.1Introduction PAGEREF _Toc442440056 \h 133.2Experiences validation PAGEREF _Toc442440057 \h 133.3Conclusions and recommendations PAGEREF _Toc442440058 \h 164.Phase III: Output PAGEREF _Toc442440059 \h 184.1Introduction PAGEREF _Toc442440060 \h 184.2Data analysis PAGEREF _Toc442440061 \h 184.3Regular tabular output PAGEREF _Toc442440062 \h 204.4Limitations MDL and national sources PAGEREF _Toc442440063 \h 21Annex I: Background information sources PAGEREF _Toc442440064 \h 23Annex II: Availability variables PAGEREF _Toc442440065 \h 40Annex III: Overview issues PAGEREF _Toc442440066 \h 43Annex IV: Country report Austria PAGEREF _Toc442440067 \h 46Annex V: Country report Denmark PAGEREF _Toc442440074 \h 56Annex VI: Country report Finland PAGEREF _Toc442440081 \h 66Annex VII: Country report Germany PAGEREF _Toc442440088 \h 75Annex VIII: Country report Latvia PAGEREF _Toc442440095 \h 87Annex IX: Country report the Netherlands PAGEREF _Toc442440102 \h 96Annex X: Country report Norway PAGEREF _Toc442440109 \h 109Annex XI: Country report Portugal PAGEREF _Toc442440116 \h 117Annex XII: Country report Sweden PAGEREF _Toc442440123 \h 127Executive summaryThis methodological report summarizes the main conclusions and recommendations from the project ‘Micro data linking of structural business statistics and other business statistics’ (in short MDL 2014-2015).The MDL 2014-2015 project produces national databases containing the most central structural business statistics – with information available on the enterprise and enterprise group level – in order to conduct micro-level economic analyses. Beyond this, the project provides the basis for further analyses in the future based on the national databases established in the project. The new statistical knowledge is produced without carrying out new surveys, i.e. without increasing the respondent burden on enterprises. An important goal of the project is to test the feasibility of the micro data linking approach to produce (regular) statistical output to be considered a supplement to the current annual deliverables of tables as part of SBS. The micro data linking project produces several deliverables, including three output deliverables. These output deliverables consist of tabular data for each of the three topics – intended for publication on the Eurostat website – and a descriptive analysis in the form of a Statistics Explained article for each topic.Presented below are the main conclusions and recommendations from the MDL 2014-2015 project:Conclusions:Overall there was positive feedback on the quality and timeliness of the circulated guidelines and syntaxes.The used approach was suitable for all participating countries in the project.Most countries reported that the production of the national databases (phase 1 of the project) and its validation (phase 2) was time consuming. It was an educational experience for all countries. The participants learned more about their own (micro) data.It is important to define the output specifications as early as possible; knowing that the projects are research oriented and as such will require an iterative process where the final output cannot be determined from the outset of the project.The validation phase was overall very valuable, even though the validation produced mixed results and experiences. The interpretations of the different national results of validation were mixed. Some reasons were mentioned by many NSI’s, while at the same time there were also quite some reasons for the output in the validation mentioned only by one country (see also annex III). Recommendations:Take into consideration explicit technical requirements in the project to avoid unnecessary problems and possible delays (both hardware and software).Provide more meta description in SAS-syntax, examples of datasets and instructions on importing files in SAS. This is particularly helpful for countries with limited SAS-knowledge.Define beforehand which consistencies of the output tables need to be guaranteed; amongst themselves and in relation to official data from the NSI’s.Make explicit references to framework regulations regarding project specific output (especially the BR-variables).Focus the validation on the largest enterprises. Also macro validation techniques should be included.Include additional BR-variables, like location and address, for validation purposes. Evaluate after each phase of the project instead of only at the end of the project.IntroductionAim of this reportThis report summarizes the experiences and results of the project ‘Micro data linking of structural business statistics and other business statistics’ (in short MDL 2014-2015) from the participating countries: Austria, Denmark, Finland, Germany, Latvia, the Netherlands, Norway, Portugal and Sweden. The main conclusions are presented in this report and recommendations for future exercises with micro data linking are made. This report serves as deliverable ‘D4.4 Methodological report’ of the MDL 2014-2015 project (theme: 06.1.23-Development of structural business statistics). MDL 2014-2015“The picture of economic globalization provided by current official statistics is incomplete, the causal links to economic welfare indicators such as employment and wages tend to be weak and unconvincing, allowing a set of highly charged, politically motivated, and unproductive debates over the basic facts”, economic geographer Timothy Sturgeon reported to Eurostat. He concluded: “[T]he most pressing need is to make full use of existing data sources, for a system that ties data from business surveys to the wealth of information from administrative sources”. The MDL 2014-2015 project is a big step in this direction. It is the goal of the project to produce national databases containing the most central structural business statistics, with information available on the enterprise and enterprise group level, in order to conduct micro-level economic analyses. Beyond this, the project provides the basis for further analyses in the future based on the national databases established in the project. The new statistical knowledge is produced without carrying out new surveys, i.e. without increasing the respondent burden on enterprises. To the extent possible, the new MDL database is structured using input data for the reference period of 2008-2012 from Structural Business Statistics, International Trade in Goods Statistics, International Trade in Services Statistics, Community Innovation Survey, ICT usage and e-Commerce in enterprises Survey, Foreign Affiliate Statistics (Inward and Outward), Business Demography Statistics, International Organization and Sourcing Survey and the national Business Register. Information from the National Business Register is central for the establishment of the database as the issue of identity over time is essential for longitudinal micro-level analysis.The micro data linking project is divided into three phases:Matching and adjustment of data and structuring of the database;Validation controls and calculation of weights for the control group population(s) where necessary;Production of standardized output.The micro data linking project and its phases are illustrated in figure 1.Figure SEQ Figuur \* ARABIC 1: The phases of the MDL 2014-2015 projectStructure of the reportThis report is divided into three parts, in line with the phases of the MDL 2014-2015 project. In the first part the data sources are addressed. This part discusses the quality of the sources and the availability of variables. Also, part one discusses the member states’ experiences regarding the construction of the national databases (chapter 2). The central topic of the second part is the validation of the constructed databases. The results and experiences of the participating countries are presented in this chapter. In order to further improve the consistency and quality of the data, special attention was given to the following issues: instability over time, no-match, unit representation and demographic change (chapter 3). The third, and final, part gives additional information on the choices made in the data analysis for the three Statistics Explained articles (chapter 4). The most relevant knowledge gathered during the micro linking project can be found in the annexes at the end of the report. Phase I: Production of MDL-databaseThis chapter presents the results and experiences from the first phase of the project: the matching and adjustment of the data and the structuring of the MDL-database. As it is crucial to have a clear understanding of quality and limitations of the available variables and sources used for the database, this part of the methodological report also focusses on the relevant background information of the national sources.Background of the sourcesAt the start of the MDL 2014-2015 project all nine participating NSI’s were asked to give additional background information on their national sources that are used in this project. Issues that are addressed include, amongst other: the type of statistical units that are used; how the coverage is of the source when compared to the Structural Business Statistics (SBS); and whether cut-off limits, other supplementary/complementary data, a sampling strategy, estimation methods and/or imputation methods are used during the construction of the national source.The answers on these, and other, questions from the project participants are summarized in Annex I. The original completed questionnaires are in the possession of Eurostat. Availability of variablesBesides giving additional background information on the national sources, all participants were also asked to indicate which variables are available in their national sources. As the overall table in Annex II shows, the majority of the listed variables are available for most countries. Not surprisingly, variables from the national Business Register, Structural Business Statistics, International Trade in Goods Statistics, Inward Foreign Affiliate Statistics and Business Demography Statistics are widely available. Variables from other sources, however, are less available, for example due to non-participation in specific surveys (like the International Organization and Sourcing Survey).Eurostat obtains the country specific overviews of available variables from the participating NSI’s.Experiences building MDL-databaseThe detailed experiences from the NSI’s were gathered in questionnaires that were send out to the participants at the end of the project. These country experiences can be found at the end of this methodological report, in the annexes 4 – 12. A full overview of the mentioned issues/problems and proposals are displayed in annex 3.This paragraph lists the main experiences and proposals of the participating countries regarding the construction of the MDL-database, as presented and discussed in the fourth, and last, task force meeting of the MDL-project. ExperiencesThe overall feedback in this phase of the project is positive. Moreover, all participants agree (5) or strongly agree (4) that the used approach to build the MDL-database was suitable.The circulated guidelines were positively evaluated. The nine participants ranked them as ‘good’ (2), ‘very good’ (3) and ‘excellent’ (4). The guidelines had a “clear structure and visualizations” and were “detailed and comprehensive”.The circulated syntax was positively evaluated. The nine participants ranked them as ‘good’ (4), ‘very good’ (4) and ‘excellent’ (1). Overall, the syntax was “easy to adept and apply” and “efficient despite some minor mistakes”. For NSI’s with limited knowledge of the SAS-software more description would have been helpful.Most countries agreed that this phase was more time consuming than anticipated.Two NSI’s had changes in staff during the project.Almost all countries had issues with different SAS-versions. This resulted in adjusting the standard syntax, either by the NSI’s themselves but more often by the project leaders (which caused extra time).It turned out that one country could not use SAS-software to produce the MDL-database. They had to build the database with different statistical software. There were legal and software/hardware reasons for this decision.One country had problems with importing the data from the Excel format into the database.A proposal for aggregated data sets for confidentiality checks was rejected by the project leaders; unaware of consequences for confidentiality.Some countries indicated that they could not include all variables that were required for the MDL-database.Not for everyone were all variables clear. For example, it was unclear which event the start of a new enterprise group marks.Some countries observed that a lot of time was spend on building specific variables and datasets that were not used.ProposalsDefine earlier which output to make, avoiding time spend on unused variables and datasets.More time for syntax testing.More meta descriptions in SAS-syntax, examples of datasets, instructions on importing files in SAS.Take into consideration the technical requirements before or at the beginning of the project: both hardware and software.More clear variables, with reference to framework regulationsInclude extra BR variables identifying the enterprise, like name and address (for validation purposes).Limit dataset to most relevant variables. Conclusions and recommendationsThe previous paragraph gave an overview of the experiences of phase 1 from the MDL 2014-2015 participants, including the proposals for further improvement. This served as input for the final discussion – held in the fourth task force in Copenhagen. The agreed conclusions and recommendations are presented below:Conclusions:Overall there was positive feedback on the quality and timeliness of the circulated guidelines and syntaxes.The used approach was suitable for all participating countries in the project.Most countries reported that the production of the national MDL-databases was time consuming.It is important to define the output specifications as early as possible; knowing that the projects are research oriented and as such will require an iterative process where the final output cannot be determined from the outset of the project.Recommendations:Take into consideration explicit technical requirements in the project to avoid unnecessary problems and possible delays (both hardware and software).Provide more meta description in SAS-syntax, examples of datasets and instructions on importing files in SAS. This is particularly helpful for countries with limited SAS-knowledge.More clear variable definitions and references to framework regulations regarding project specific output (specifically the BR-variables).Include additional BR-variables, like location and address, for validation purposes. Phase II: ValidationIntroduction The second phase of the MDL-project consists of various steps of different scope aimed at validating the MDL-database. The main aim of the validation process in the MDL-project is to achieve improved consistency and stability in the data, taking into account the analyses to be carried out. This chapter focusses on the NSI’s experiences with the validation phase.The key consistency issues in the validation work in this project consist of the following issues: Instability over time: data for the empirical unit is not registered on the same statistical unit IDs across data sets (same data source, but different reference periods).No-match: Data for the empirical unit is not registered on the same statistical unit IDs across data sets (different data sources).Unit representation: Data is registered on the same statistical unit ID across data sets (different sources and/or different reference periods), but the data does not refer to the same empirical unit.Demographic change: Data is registered on the same statistical unit ID across data sets (different sources and/or different reference periods), but the empirical unit has changed substantially at one (or more) point(s). This may be the case when an enterprise takes over another enterprise. This chapter summarizes the experiences of the participating countries of the MDL 2014-2015 project and presents the main conclusions and recommendations for further micro data linking projects.Experiences validationThe detailed experiences from the NSI’s were gathered in questionnaires that were send out to the participants at the end of the project. These country experiences can be found at the end of this methodological report, in the annexes 4 – 12. A full overview of the mentioned issues/problems and proposals are displayed in annex 3.This paragraph lists the main validation results and experiences from the participating countries. Also, the mentioned proposals are presented in this paragraph.Validation results:Five countries used an additional methodology to validate the MDL-database regarding instability over time: use of other project with more reference years; matching by company name and address; consulting BR-experts; top-down validation using pattern analysis; auxiliary information from Business Demography Statistics and the national Business Register. The main reasons for the output of the validation check for instability over time were:SBS is sample / sampling design;Inconsistencies amongst SBS surveys between NACE branches;Mismatch ITGS and SBS due to ITGS being a monthly and SBS being a yearly statistic;Enterprises not in SBS scope anymore (either because of different NACE or size class;Restructured enterprises outsourcing their business;Inactivity of enterprises, but still reporting trade (mainly micro enterprises);Economic circumstances;Sample coverage amongst ICTeC and SBS;Too late response to integrate in final SBS database.Five countries corrected data in the MDL-database as a result of the instability over time validation. Five NSI’s added or replaced ENT_ID’s, two NSI’s added or replaced other variables, and five corrected all variables across the dataset. Three countries used an additional methodology to validate the MDL-database regarding no match for ITGS: Matching ITGS ‘no-matches’ with same SBS-year via BR using administrative ID; checking company characteristics as EGR, company name, location, persons employed, turnover; control BD for deaths. No countries used an additional ‘no match’ validation method for ITS, although one country added some enterprises in ITS connected to services related to sea and coastal transport.ITS is not available in two countries, while one country does not produce ITS but receives the data from the national bank.The main reasons for the output of the validation check for no-match were:(Foreign/inactive) enterprises in ITGS that are not in SBS;Mismatch ITGS and SBS due to ITGS being a monthly and SBS being a yearly statistic;Different NACE-scope;SBS is sample / sampling design;Effect of ‘unit-representation validation’ on the ‘no-match validation’;Demographic events;Added enterprises results in more mismatches;ITS includes ‘third party trade’.Six countries corrected data in the MDL-database as a result of the no-match validation for ITGS and three countries for ITS. For ITGS these corrections were as follows: adding or replacing ENT-ID’s (6), adding or replacing other variables (4), correcting all variables across the dataset (3), any other correction of the data (1). For ITS this was: adding or replacing ENT-ID’s (3), adding or replacing other variables (1), correcting all variables across the dataset (3), any other correction of the data (0).Only one NSI used an additional method to validate the MDL-database regarding unit representation for ITGS: using additional information like EFR and demographic events in BD. For ITS no NSI used a complementary method. The main reasons for the output of the validation check for unit representation were:Restructuring of enterprises;Indirect exports;Different use of enterprise groups;Controlling companies of tax groups clustered in M and L (ITGS more than one SBS unit);ITGS export includes the total value of goods, SBS turnover does not include this;Transport related enterprises include value of goods;Reporting unit on enterprise level not necessarily represent the same ‘true picture’;Added enterprises results in more mismatches.Four countries corrected data in the MDL-database as a result of the unit representation validation for ITGS and two countries for ITS. For ITGS these corrections were as follows: adding or replacing ENT-ID’s (4), adding or replacing other variables (2), correcting all variables across the dataset (3), any other correction of the data (2). For ITS this was: adding or replacing ENT-ID’s (2), adding or replacing other variables (0), correcting all variables across the dataset (2), any other correction of the data (0).One country used an additional methodology to validate the MDL-database regarding outliers with demorelations in BR: additional checks were made based on STS information. Two participating NSI’s used an additional method to validate the MDL-database regarding outliers without demorelations in BR. They used auxiliary variables from BR or BD, and consulted BD and BR experts.The main reasons for the output of the validation check for demographic change were either due to demographic events (like takeovers, mergers and deaths) or due to other reasons for fast growth.One NSI corrected data in the MDL-database as a result of the demographic change validation regarding outliers with demorelations in BR and two countries corrected data due to the validation of outliers without demorelations. For the validation with demorelations in these corrections were as follows: adding or replacing ENT-ID’s (1), adding or replacing other variables (1), correcting all variables across the dataset (1), any other correction of the data (0). For the validation without demorelations this was: adding or replacing ENT-ID’s (2), adding or replacing other variables (2), correcting all variables across the dataset (1), any other correction of the data (0).Experiences:The overall feedback of the second phase of the project is positive: ‘good’ (4), ‘very good’ (3), excellent (2). Some of the feedback was: “valuable insights”, “informative”, “learned a lot about own data”, “improved the quality of our (combined) data”, and “provided validation tables very good help”.Moreover, all participants agreed that the used approach to validate the MDL-database was suitable.The circulated guidelines were positively evaluated. The nine participants ranked them as ‘good’ (1), ‘very good’ (4) and ‘excellent’ (4). The guidelines were “clearly structured and comprehensive”, “informative”, and “good and easy to follow” even though at times examples could have been added.The circulated syntax was positively evaluated. The nine participants ranked them as ‘good’ (4), ‘very good’ (3) and ‘excellent’ (2). The four validation checks produced mixed results and experiences. For one NSI a specific validation control was very valuable, while for another this check turned out to be less relevant (and vice-versa): “especially no-match and unit representation made data differences more clear which we were able to correct” versus “consumed a lot of time with non-matching and unit representation, and in the end, nothing was done with this”.The used micro data linking method that is used to produce the tabular output differs in most cases from the regular method that NSI’s use to produce these tables. As a result, it is possible that (considerable) differences with the official data occur.Most countries agreed that this phase was more time consuming than anticipated.Some minor mistakes were found in the syntax. These issues were solved promptly. Some countries were not able to include the independent / dependent variable distinction in the MDL-database.One country reported that some variable definitions and measurements differ when comparing to each other.One country was unaware of the overall picture during the validation process. Imputations that were made at the beginning of the validation phase had to be overruled later onwards in the project.Despite the four validation checks on the micro data, not all implausible cases could be resolved.Proposals:Define in advance which coherences need to be guaranteed.An evaluation should take please after each phase and not only at the end of the project.More meta description in SAS-syntax, examples of datasets, instructions on importing files in SAS.Produce a printer friendly version of the output tables.Find a solution for manually editing ENT_ID’s that need to be corrected.Add also macro validation methods.The validation phase could be more simple and/or focused on the largest enterprises.Addressing issue of different approach of enterprise groups in different statistics.Analyse possibilities to include ITS.Conclusions and recommendationsThe previous paragraph gave an overview of the experiences of phase 2 from the MDL 2014-2015 participants, including the main validation results and proposals for further improvement. This served as input for the final discussion – held in the fourth task force in Copenhagen. The agreed conclusions and recommendations are presented below:Conclusions:Overall there was positive feedback on the quality and timeliness of the circulated guidelines and syntaxes.The used approach was suitable for all participating countries in the project.Most countries reported that the validation of the national MDL-databases was time consuming.Educational experience for all countries. The participants learned more about their own (micro) data.The four validation checks produced mixed results and experiences. For one NSI a specific validation control was very valuable, while for another this check turned out to be less relevant (and vice-versa). The interpretations of the different national results of validation were mixed. Some reasons were mentioned by many NSI’s, while at the same time there were also quite some reasons for the output in the validation mentioned only by one country (see also annex III). Recommendations:Define beforehand which consistencies of the output tables need to be guaranteed; amongst themselves and in relation to official data from the NSI’s.Provide more meta description in the SAS-syntax and guidelines. This is particularly helpful for countries with limited SAS-knowledge.Add macro validation techniques.Focus the validation on the largest enterprises. Evaluate after each phase of the project instead of only at the end of the project.Phase III: OutputIntroductionIn total the MDL 2014-2015 project produces three output deliverables, consisting of a set of tabular data for each topic intended for publication on the Eurostat website and a descriptive analysis in the form of a Statistics Explained article for each topic.Two outputs are considered to have the potential of becoming future annual deliverables and at the same time supply users with new information about the following type of enterprises: Profiling SMEs versus large enterprises: the reason for choosing this theme is that SMEs are a focal point in European and national enterprise policy. Especially job creation by SMEs is of high policy interest or to which extent the SMEs are directly present on the global markets as well as the existence of SMEs and entrepreneurs are also considered a cultural value of high importance to maintain and promote.Profiling exporters versus non-exporters: the motivation for this output is that exports and thus exporting enterprises are considered crucial for European job creation and value added creation. Therefore it is important to establish the necessary evidence of how exporting enterprises are performing compared to non-exporters.The third output is more experimental in character due to the statistical registers used for producing this type of output or the approach, i.e. longitudinality: Longitudinal analysis of a panel of enterprises: this analysis utilizes the established database in depth as it not only compares different datasets for each individual year but combines this approach with identity over time of the population of enterprises. From an enterprise policy aspect this analysis creates interesting evidence about which types of enterprises have survived the economic crisis in the best way and which similarities or differences can be observed across countries. The next paragraph elaborates on the (methodological) choices made in the data analysis for the three output deliverables.Data analysis This paragraph lists the relevant decisions that were made in the data analysis regarding the (tabular) output. Topic 1: SMEsA group head should be identified as a dependent enterprise and attribute to the size class according to the employment registered at enterprise level. Not all countries can split into dependent and independent; they will just fill out the total of the table. Portugal has their own definition of SMEs, and they would like to use this on this project. However, it was decided that the proposed definition of SMEs should be used by all countries.Definition of the employment size class will be based on persons employed. The Netherlands uses BR information to determine size class, since the employment variable in SBS is not available for each enterprise as SBS is a sample (This count for the other topics as well).It is decided to collect all three employment variables: Number of employees, number of person employed and FTE (This count for the other topics as well)Regarding the variables Personnel cost or Wages and salaries. It is agreed to collect both variables. (This count for the other topics as well)Regarding the background paper on the SME part; export destination and import origin will be deleted as variables, the rest is agreed on.Regarding NACE breakdown; we decided to skip A*38 and to go with sections. Section G will be split into the three (G45, G46 and G47). A supplementary breakdown for analytical purposes will be used as well, dividing manufacturing into two (High tech, low tech (HTM and LMT)). The knowledge intensive business services (KIBS) will be identified as well.Some countries expressed concern regarding the country break down. It was decided to construct table SME 02 with Intra EU, Extra EU: North America, Asia and rest of the world. Further regarding table SME 02, it will be divided into SMEs and large enterprises, total of two groups. The breakdowns on SMEs are kept in table SME 01.Countries can fix any problem they encounter in the way they find best, as long as it is documented in the methodological ic 2: TradersIt was decided to remove the group information from the tables.Some countries don’t have information regarding foreign affiliates –it was decided to add totals to table Trader 01 to overcome this problem. The conclusion from TF Helsinki was to have table 01 for both ITS and ITGS. We will stick to this decision knowing the difficulties of ITS. Export-/ import intensity should be calculated after aggregating to the decided NACE breakdowns. We decided on the proposed exporter definition (described in the back ground) We decided to skip the A*38 breakdown as with the SMEs. It was decided to use manufacturing further broken down into high tech and low tech and G46 (Wholesale).Topic 3: Longitudinal analysisThe analysis is limited by the way the output tables have been structured. All value totals refer to the 2008 data and not the other four years.However by using the available output data an interesting article can be made with the focus on demographics and trading status from the 2008 cohort.The distinction dependent / independent was ultimately not included in the syntax and tables.A high amount of enterprises are assigned to the ‘unknown’ category in the national output tables – and the variety within these amounts. The analysis focusses only on the other categories.In the analysis an additional variable was introduced that categorized the enterprises into enterprises that grew to a bigger size class, shrunk to a smaller size class or stayed in the same size class.Regular tabular output The 2014 MDL project has proven the possibility of linking micro data from various sources (SBS, FATS, ITGS, BR…) and produce output (tables) broken down by ownership (control), trader status and size class. The first results will be presented in three statistics explained articles in 2015 and early 2016. These articles use the data that was compiled in the project. A major purpose of the project is to test the feasibility of the micro data linking approach to produce (regular) statistical output to be considered a supplement to the current annual deliverables of tables as part of SBS. Using the same data Eurostat will produce a set of tables covering the topics below for the reference years 2008 to 2012. Eurostat wants to update these tables on an annual basis and asks countries to send data on a voluntary basis from 2013 onwards as soon as the data become available. A separate document requesting this data, listing the variables, breakdowns and transmission format will be send to the countries in the first half of 2016. These tables will form a basis and first set of experimental Economic Globalisation Indicators, which will be disseminated in the EGI section of Eurostat’s database. Participation is on a voluntary basis and does not impose any future obligations on countries. TablesWe propose to publish five tables based on the output that Eurostat already received from the countries for which we would not need new data. We realise that the breakdown of independent/dependent enterprises is not possible for all countries but nevertheless have asked it because as a new concept in statistics we feel it is important to show. Dependent and independent enterprises by size class and NACE. Source: SME01.Dependency: dependent, independent, unknown, all Size class: micro, small, medium, all SMEsEnterprises by size class and controlling country. Source: SME02. Controlling country: domestic, foreign, intra-EU, extra-EU, unknown, allSize class: SME, large, totalEnterprises by trader status and control. Source: Trader01Trader status: exporter, importer, 2-way trader, non-trader, all Controlling country: domestic, foreign, unknown, allDestination: intra-EU, extra-EU, both intra and extra, totalBoth for goods (trader01a) and services (trader01b)Exporters by size class, destination and dependencySource: Trader02a and trader03 combinedSize class: SME, large, totalDestination: All (from trader02a) and further breakdown (from trader03) into EU15, EU13, Russia, Other Europe, China, India, Other Asia, North America, Brazil, ROW Dependency: dependent, independent, unknown, allImporters by size class and dependencySource: Trader02bSize class: SME, large, totalDependency: dependent, independent, unknown, allDependent and independent enterprises by size class, NACE and controlSource: SME01.Dependency: dependent, independent, unknown, all Size class: micro, small, medium, all SMEsControlling country: domestic, foreign, intra-EU, extra-EU, unknown, allLimitations MDL and national sourcesFrom the beginning of the project it was agreed that we should adopt an output orientated approach; meaning that we should try to define the output as soon as possible (based on the most urgent/relevant policy questions). Therefore, already in Lisbon in March 2014 preliminary proposals for analytical output were presented. The final output was agreed upon shortly after the Helsinki meeting in November 2014. As can be seen above the first topic focuses on the characteristics and performance of the SME’s, including the issue of independency. The characteristics of trading enterprises (including their turnover and employment) were central to the second topic. The third topic concentrates on the development of the enterprises, using longitudinal analyses. However, also the limitations of the national sources played a role in defining the output. The most important limitations for defining the output are discussed below.An important limitation is that not all NSI’s have access to the micro data of all desired sources. Some national sources are not collected by the NSI’s themselves but by another institution, like the national bank. This is particularly the case with the International Trade in Services Statistics, Outward Foreign Affiliate Statistics, Foreign Direct Investments and Research & Development Statistics. Also, not all NSI’s in the MDL 2014-2015 participated in the non-mandatory International Organization and Sourcing Survey.The use of different statistical units was another limitation for constructing a MDL database. For example, R&D is collected using different statistical units (enterprises but enterprise groups as well), which makes micro data linking difficult. This is also often the case for OFATS. Another limitation that needs to be (further) addressed in future MDL projects is the use of incomplete national sources. Estimations of economic indicators based on micro data databases are challenging because of the number and size of gaps corresponding to missing information in such databases. The report ‘Estimation methods for linked data sources: a review for the Micro Data Linking project’ (which serves as deliverable ‘D4.5 Methodological report on statistical approaches’ of the MDL 2014-2015 project) suggests and reviews several estimation methods ranging from weighting to mass imputation that could be used to deal with the missing data. Moreover, two preliminary case studies dealing with this issue – from Statistics Netherlands and Statistics Germany – are included in the paper. As the previously mentioned report stresses, it is important to take the issues of missing data into account. To give an example there are some countries that have missing micro data in their Structural Business Statistics. The reason for this is that the SBS in these countries are split up into multiple surveys, divided by NACE sections. Most of these SBS surveys are stratified random population samples. As part of the regular data production for SBS, the sample data is extrapolated to population totals by weighting with survey weights. It is important to realize this complication, as in the MDL 2014-2015 project the size class of enterprises was based on the employment variables of the Structural Business Statistics. Another complication was that the implementation of the trader definition was also based on the SBS (for export intensity turnover is needed and for import intensity purchases of goods and services).Therefore some NSI’s needed to find tailor-made solutions in order to overcome these complications. Statistics Netherlands, for instance, used BR-variables in order to classify size classes to all enterprises. Another solution for dealing with missings could be the use of a complementary (country specific) source, a source that is not necessarily included in the scope of the project. Statistics Netherlands reported that it was difficult for them to successfully implement the trader-definition, because in the SBS sample many (largely smaller) enterprises were not available on micro level. The strict use of the trader definition led to a significant decrease in the amount of enterprises labelled as trader in the MDL-database, since all non-observed units were automatically assigned as non-traders. The use of additional secondary sources could tackle this issue.A last important limitation regarding the output of the MDL-project is possible methodological changes in the national sources that influence the interpretation of the data. For example, break in time series due to definition changes, changes in data collection strategy, et cetera. In the Netherlands, for example, the definition of the enterprise group was altered in 2009, causing a break in time series in assigning the characteristic dependency to an enterprise. Similarly, an improvement of the data collection of international trade in goods statistics led to an increase in number of trading enterprises in 2014, since also all (very) small traders are accounted for. The sudden change is in this case largely attributable to better data collection, and not because of abrupt changes in the ‘real world’. Annex I: Background information sourcesAUSTRIAData sourceBRSBSITGSITSCISICTeCIFATSOFATSBDGVCStorage units (administrative and/or statistical)legal unit = administrative enterprisexxxxxxxxxType of information availablelegal id is availablexxxxxxxxxUpdate frequency BRDynamic updates; frozen frames (yearly)xxxxxxxxxInformation on demographic relationsInformation on predecessor/successor is availablexxxxxxxxxConsiderable changesN/AxxxxxxxxxStatistical unitxxIntrastat: taxable persons; Extrastat: all transactions declared to Customsenterprisesenterpriseenterpriselegal unitlegal unitlegal units; not directly possible to identify SBS units in BD; new panel does#naCoverage compared vs SBSxxno limitations NACE breakdownno limitations NACE breakdown10+, NACE (according Regulation)10+ employees; see excel file includedIFATS frame defined by SBS population (including NACE K)NACE B to S excluding O (reporting units = resident units; including natural persons)B-S excl 64.2 NACE rev 2#naCompilation principles* Cut-off limitsxIntrastat: 300k € (2007-2009); 500k€ (2010-2012); Extrastat: none€ 50.000 and accordingly € 200.000 depending on NACE classificationenterprises 10+ 10+according to SBS; for small units country of control is country of direct foreign owner, not that of the UCIMin investment 100k € equity or 1M € balance sheet total affiliate10K € turnover or at least 1 empl#na* Extraction populationxIntrastat: Tax authorities supply information for Intrastat register; Extrastat: Custom declarationsBusiness RegisterBRBRT-1 data, FDI data, admin dataT-1 data, FDI, media reportsBR and supplementary sources#na* Suppl/compl dataxsecondary VAT dataN/Anon used#nashareholder data from company register-VAT, Social Security#na* Sampling strategyxN/Acut-off stratified samplingcensus 250+; sampling 10-249;census 250+; stratified random sample: 1/3 of 50-249, 8% of 10-49censuscensuswork in progress#na* Estimation methodxIntrastat: using secondary VAT for below threshold trade and nonresponse; Extrastat: nonemodel-based estimation by completing the underlying log-normal distribution; as of year 2012 use of VAT Information Exchange System for below threshold and non-response unitsweighting INV(n/N)simple stratified weighting (#enterprises, employment, turnover)according to SBST-1 data; no grossing up for foreign data as total population of foreign affiliates is unknownwork in progress#na* Imputation methodxN/AN/Amean imputation, nearest neighbour, historical datanearest neighbour; field imputationsaccording SBST-1 datawork in progress#naChanges 2008-2012xThreshold changesno changesexpansion of population in 2012Change in NACE coverageno major changes introducedno major changes introducednew method BD under development (BD at micro data level) affecting series 20008-2012#naDENMARKData sourceBRSBSITGSITSCISICTeCIFATSOFATSBDGVCStorage units (administrative and/or statistical)Administrative units are linked to legal units and statistical (enterprise) units Type of information availableAdministrative identification, legal identification and enterprise level identification available for 2008-2012, group level from 2009 Update frequency BRThe BR is dynamic and updated all the time. Information on demographic relationsAll demographic relations are stored in the BR. Limited information on enterprise groups. Considerable changesenterprise group ID is only available from 2009.NoNoNoNoNoNoNoNoNoStatistical unitLegalEnterpriseAdministrative units connected to legal units and enterprise level.Legal units are used as reporting units in ITS. These units can be linked directly to SBS dataEnterprise levelenterprise unit. Enterprise The statistical unit in OFATS is the foreign affiliate, the reporting unit is the Danish enterprise (=legal unit).The statistical unit is the enterprise unit also used in the SBS.Both SBS and GVC are available on enterprise level.Coverage compared vs SBSCovers all sectors private and publicSBS covers the private non-financial market economySBS covers the non-financial market economy. In principle, the ITGS covers all NACE sectors, all legal forms and all size classesall size classes and NACE divisions are covered. See enclosed Annex 1The ICT usage and e-Commerce Survey covers only private firms with at least 10 FTE. IFATS covers all enterprises in the SBS.OFATS covers all enterprises in all size classes and UCI=all.OFATS covers foreign affiliates both extra and intra EUall sectors and all size classescovers enterprises in the private business sector. The survey covers all enterprises with 50 or more employees, and in the case of Manufacturing and Business Activities also enterprises with 20-49 employeesCompilation principles* Cut-off limitsNoOnly active enterprises are included in the SBS. The activity for enterprises should equal at least 0.5 FTE in one reference year in order for an enterprise to be active. The ITGS micro data is collected from three sourcesNo cut-off has been applied to ITSEnterprises with less than 2 employees, for some activities less than 6 or 10 employees, cf. Annex 1Enterprises with 10+ FTEIFATS is a census and covers all active enterprises in the reference year. Input data is fiscal dataNoneOnly active enterprises are included in the BD. The activity for enterprises should equal at least 0.5 FTE in one reference year in order for an enterprise to be active. However, the new enterprises only have to present half of this activity in the first yearSee above* Extraction population-BRAll enterprises within the threshold are included.The most important source of the statistics is a combination of monthly reports from approx. 350 firms and annual (prior to 2009 quarterly) reports from approx. 1300 firms. Business RegisterThe population is extracted from all active enterprises at the time of sampling (end of the year prior to the reference year) from the Business Register.-No sample survey technique or cut off limit is has been introduced. BRPopulation extraction was based on all active enterprises within the relevant size groups and NACE codes in 2009* Suppl/compl data-BR, Tax authorities, Danish Medicines AgencyNoneAnother important source is interviews with travellersOther smaller supplementary sources are: 1. government services 2. transport element from goods trade (CIFFOB) 3. services not performed by enterprises Information from the previous data collection.Apart from Business Register background information, no-N/ASBSThe survey is questionnaire based. In validating the reported data, a number of administrative sources and register data have been used, notably the Foreign affiliates statistics and Employment statistics for employees* Sampling strtgy-About 7.500 enterprises in sample based on employment size-classes. (0-4 emp: 0% in sample, 5-9 emp: 10% in sample, 10-19 emp: 20% in sample, 20-49 emp: 50% in sample, 50+ emp: 100% in sample).None, data is extracted from registersThe ITS population consist of the ~350 monthly reporters covering roughly 2/3 of the ITS trade, and a sample of ~1300 yearly reporters, that are stratified by activity and size. The activity classes and size classes are customized to optimize the sample for greater accuracy for the total trade of services, and not to represent the different activity classes and size classes in the economy. Updated roughly every 5 year.? Coverage of activities and size classes according to EU regulation and extended due to national demands. ? All with 100(+) employees are included. ? All enterprises with 2 or more employees in the activity R&D are included.? All enterprises that have stated a minimum of R&D- or innovation expenses in the previous years’ data collection are included.? Rest of the population: drawn as a rolling sample where app. ? to 1/5 of the respondents are replaced each yearStratified random sample made by number of FTE and NACE activity.-N/AOnly register data usedCensus* Estimation method-Data - especially for small enterprises - is estimated based on employmentNone is usedThe trade reported directly by the monthly and yearly reporters cover roughly 68 % of the total ITS and the estimated trade cover roughly 12 % of the total ITS. The supplementary sources cover roughly 20 %.Different methods of estimation of missing values are applied, Data is raised with SAS Clan procedure taking strata (FTE, NACE activity) into account.-N/ANo estimationN/A* Imputation method-Missing or implausible data is imputed based on information from other periods or donor imputation.None is usedImputation is mostly used after the sample is updated and no prior data for a given enterprise is available. Imputation of total records is applied only to enterprises with 250 or more employees, if it has proven impossible to get a response. No imputation for unit or item non-response-N/ANo imputationUnit non-response has been approached through donor imputation. The unit non-response is very small.Changes 2008-2012NoNONo considerable changes in ITGS between 2008-2012.The new frame, effective from 2009, cf. 3.2, does not establish breaks in the usual sense of the word. The revisions are limited to those derived from the updating of the sampleThe 2012 statistics are published as preliminary numbers. The reference years 2007-2011 are produced as final statisticsThe financial sector is only included in 2008, 2009 and 2010No considerable changes during the periodChange in NACENo considerable changesThe survey covers the period of 2009-2011. Some questions concern end of 2011, e.g. foreign affiliate employment by geographical areas and business functions. 2008 is not covered.FINLANDData sourceBRSBSITGSITSCISICTeCIFATSOFATSBDGVCStorage units (administrative and/or statistical)Administrative level; in future also for the enterprise unitxxxxxxxxxType of information availablelegal units, enterprise groups, ultimate controlling institutional units, foreign affiliates, and basic information as employment and turnoverxxxxxxxxxUpdate frequency BRnew BR 2 relational databases (1 constantly updated, 1 frozen). Old BR one relational, constantly updated.xxxxxxxxxInformation on demographic relationsYes, and for all units. Births, change of legal form, continuation of activity, bankruptcy, fusion, merger, and other such data are available from tax administrationxxxxxxxxxConsiderable changesN/AxxxxxxxxxStatistical unitxxLegal unitslegal unitslegal units; no problematic linkinglegal unitslegal unitslegal unitsLegal units (17% not in SBS)legal unitsCoverage compared vs SBSxxSmallest importers and exporters are not coveredSBS thresholdspartial coverage NACE compared to SBS; enterprise 10+small differencesIFATS not subset B-N_S95_x_KOFATS not subset to B-N_S95_x_Kmore units than is SBS (all industries, all sizes)thresholds SBS and ISCompilation principles* Cut-off limitsx200-500k€ depending year/variable1M€ based on VATenterprises 10+10+ employeesN/AN/Aenterprises that pay VAT, and/or employ people100+* Extraction populationxIntrastat and Custom declarationsTraders T-1 plus new traders VAT above thresholdfrom SBSSBSBR and SBSBRBR and tax administration recordsfrom SBS* Suppl/compl dataxVATVAT dataSBS and R&D dataBRGroup structures from annual reports/financial statementsGroup structures from annual reports/financial statementsN/AFATS, web information* Sampling strtgyxN/AStratified sampling based on ownership, employment and industrycensus 250+; sampling 10-249;census 100+/ stratified random sampling 10-99censuscensusno samplingcensus * Estimation methodx2-5 % depending on year/variableApprox. 4% based on samplingN/ASAS Clan procedureN/AN/AN/AN/A* Imputation methodxBased on VAT for below threshold trade1% imputedmedian values depending size class/activity; imputation 1-2 % on totalsnegligibleN/Aestimate not availableN/Aweights applied to observed unitsChanges 2008-2012xN/AIncrease of sample sizeadditional NACE Dic 86,78,88 in 2010N/AN/AN/Anoneref period 2009-2011GERMANYData sourceBRSBSITGSITSCISICTeCIFATSOFATSBDGVCStorage units (administrative and/or statistical)BothxxxxxxxxxType of information availablelegal units and local units with administrative id and enterprise idxxxxxxxxxUpdate frequency BRyearly; after finishing a cycle a final dataset is storedxxxxxxxxxInformation on demographic relationsN/AxxxxxxxxxConsiderable changesN/AxxxxxxxxxStatistical unitxxEnterprise (legal units)#na#naenterprisesEnterprise#naEnterprise (= legal unit)#naCoverage compared vs SBSxxall sectors/all size classes#na#naNACE sections C-N +S95 including K.NACE Rev.2 Sections B to N and Division 95, excluding K#naSection B to N & P to S#naCompilation principles* Cut-off limitsxIntrastat: 300k € (2008); 400k(2009-2011); 500k (2012); Extrastat: none#na#na10+ employees; shorter questionnaire for <10 employees#na>17,500 € taxable turnover or >= 1 employees subject to social insurance #na* Extraction populationxComplete count ( Intrastat: Trade register + Tax register; Extrastat: Custom declarations)#na#naSample from BRBR, Bureau Van Dijk#naComplete count according to Business Register#na* Suppl/compl dataxN/A#na#naN/ASBS, administrative data from BR#na#na* Sampling strtgyxN/A#na#nastratified random sampling (stratified by federal states, NACE, employment size classes)census#naN/A#na* Estimation methodxN/A#na#naEstimation basing on regressionsN/A#naN/A#na* Imputation methodxN/A#na#nat-1 data, means, auxiliary attributesT-1 data, imputation by arithmetic means, multiple imputation (from 2012)#naIf no information from BR or SBS about the persons employed the characteristic is estimated#naChanges 2008-2012xThreshold changes#na#naSwitch to NACE Rev. 2 in 20092012: new data provider (Bisnode) for enterprise group information #naN/A#naLATVIAData sourceBRSBSITGSITSCISICTeCIFATSOFATSBDGVCStorage units (administrative and/or statistical)BothxxxxxxxxxType of information availablefollowing Regulation 177/2008 xxxxxxxxxUpdate frequency BRMonthly updates; frozen frame for statistical coordinationxxxxxxxxxInformation on demographic relationsPartly available in BRxxxxxxxxxConsiderable changesImprovement maintenance EGxxxxxxxxxStatistical unitxxIntrastat: enterprise (VAT number)Extrastat: enterprise (VAT or EORI Number) all transactions declared to Customs under special trade systementerprisesenterprise enterpriseenterpriseenterpriseenterprise (3 groups: sole proprietor, ltd liability, partnerships)enterpriseCoverage compared vs SBSxxall NACE/all size classesall NACE/all size classesNACE according regulationNACE C-N, excl K, incl Div 95; t-1NACE B-N, excl K, incl Div 95NACE B to S excluding O B-S excl 64.2 NACE rev 2NACE B-N excl K; 100+Compilation principles* Cut-off limitsxN/AIntrastat: LVL 49-100k variable depending on yearExtrastat: noneDepending on the type of serviceenterprises 10+10 + employeesN/AN/AN/A100+* Extraction populationxBRIntrastat: all enterprises within thresholdsExtrastat: customs declarations (SAD)Quarterly reports: transportation services (400 respondents), other services (300 respondents)BRBRBRBR _economically active Latvian enterprises with a subsidiaries or branches abroadN/ABR (active enterprises 2009)* Suppl/compl dataxAdministrative source (Data from The State Revenue Service of Latvia)N/AData collection is mixed system, surveys supplemented with ITRS. Travellers survey, Central Bank and MFI profit and loss on enterprises received EU fundingBREGRCompany annual reports, European Business Register information. N/AFATS, SBS* Sampling strtgyxStratified random samplingN/ASignificant exporters/importers fully covered, other by using threshold depending on the type of servicesampling & censusstratified random samplingCensusCensusN/ACensus* Estimation methodxNeyman Intrastat estimates based on VAT for nonresponse and below threshold trade disseminated by CN section level and by partner country; Extrastat: noneN/AWeighting INV(n/N)Horwitz-ThompsonN/AN/AN/AN/A* Imputation methodxEnterprises, which are surveyed exhaustively and, which did not respond with statistical survey as well as non-sampled enterprises, which are covered by mass imputation, are imputed using the data from administrative sources. Imputation of other statistical survey data, data of previous period (with correction) or donor data imputation is done if no administrative data have been found. Non-response for enterprises, which are included in the sample survey part are corrected in the weighting procedure.N/ASurveys data supplemented with ITRS data N/AN/AN/ACompany annual reports, European Business Register information. N/Aimputation from SBS and FATSChanges 2008-2012xThreshold changesThreshold changesChanges of sampling frame additional coverage NACE 59,60,72,73N/ANoNononeref period 2009-2011 (2008 not covered)THE NETHERLANDSData sourceBRSBSITGSITSCISICTeCIFATSOFATSBDGVCStorage units (administrative and/or statistical)BothxxxxxxxxxType of information availablelegal identification, enterprise unit, local unit, structure and relations enterprise groupsxxxxxxxxxUpdate frequency BRMonthly updates; frozen frame for statistical coordinationxxxxxxxxxInformation on demographic relationsbegin/end, type of demographic eventxxxxxxxxxConsiderable changesChanges in 2009 (NACE Rev 1.1 to NACE Rev.2; new definition of enterprise and enterprise group)Changes in 2009 (NACE Rev 1.1 to NACE Rev.2; new definition of enterprise and enterprise group)xxxxxxxxStatistical unitxEnterprise and enterprise groupVAT numberenterprises and enterprise groupsenterpriseenterpriseenterprise enterprise groupenterpriseenterpriseCoverage compared vs SBSxxSBS is subset (ITGS covers all NACE)SBS subset for ITSS10+10+ employees NACE Rev.2 Sections B to N and Division 95, excluding K (from 2008) NACE B to S excluding O (from 2010)Section B to N, excluding K64.2. incl. S95(NACE Rev.2)SBS business economy excl K; 100+Compilation principles* Cut-off limitsxNot applicableIntrastat: 900.000 euros (until 2013); 1,5 Million (as of 2014)Not applicable enterprises 10+10+ employeesN/ABalance total in T-1 and T-2 > 23 Million EurosN/A100+ persons employed* Extraction populationxBRIntrastat: Tax register; Extrastat: Custom declarationsBRfrom BRBRT-1 dataT-1 dataBR (additional checks for economic activity by using VAT, tax, employment data)BR* Suppl/compl dataxSmall enterprises' values are estimated partly by using VAT-tax dataVAT, VIES (as of September 2013), historical dataCentral Bank, Banks,SPE's, Travels, time series estimationsN/AN/ASBS, CIS, Employment statistics, external sources (registers)annual reports of enterprises ; tax data; financial data known in other internal statisticsN/ASBS* Sampling strtgyxYearly sample where large enterprises are more likely to be in sampleIntrastat: all VAT numbers trading above threshold; Extrastat: total collectionLarge ITS trades full coverage, other by yearly sampleNACE 2 digit and size classstratified random samplingcensusall enterprises in statistic ''finances of large enterprises''N/Astratified sampling (size class, NACE)* Estimation methodxweighingBased on VAT and VIESDirect estimationweightingweightingN/Aannual reports of enterprises N/AN/A* Imputation methodxImputation algorithms based on T-1 data and extrapolation of VAT trends.Based on historical reportsImputation algorithms depending on data sourcemerging of strata; donor imputationnearest neighbour; T-1 datausing suppl/compl. Dataannual reports of enterprises N/Amissing values were replaced by ''unknown''Changes 2008-2012xChanges in 2009 (NACE Rev 1.1 to NACE Rev.2; new definition of enterprise and enterprise group)Threshold changesNo changesImputation manually (since 2012) instead of automaticallyChanges in 2009 (NACE Rev 1.1 to NACE Rev.2; new definition of enterprise and enterprise group)Changes in 2009 (NACE Rev 1.1 to NACE Rev.2; new definition of enterprise and enterprise group)Changes in 2009 (NACE Rev 1.1 to NACE Rev.2; new definition of enterprise and enterprise group)Changes in 2009 (NACE Rev 1.1 to NACE Rev.2; new definition of enterprise and enterprise group)ref period 2009-2011 (2008 not covered)NORWAYData sourceBRSBSITGSITSCISICTeCIFATSOFATSBDGVCStorage units (administrative and/or statistical)Administrative levelxxxxxxxxxType of information availableLU id, Ent id, EG idxxxxxxxxxUpdate frequency BRDynamic/working with provisional datasetsxxxxxxxxxInformation on demographic relationsrelationship LKAU and enterprise, date change ownership. Enterprise = legal unitxxxxxxxxxConsiderable changesN/AxxxxxxxxxStatistical unitxxenterpriseenterprisesenterpriseenterpriseenterpriseenterpriseEnterpriseenterpriseCoverage compared vs SBSxxSBS is subset (ITGS covers all NACE)SBS subset for ITSSSize class: 5 employees and more, NACE: A-K, M (with some exceptions)#naNo differencesNo differences. no differenceNace B-N excel KCompilation principles* Cut-off limitsxExtrastat, no cut off of importance, only single item lines less than NOK 1.000#na50 employees (but random sample of smaller enterprises, 5+ employees2008: 5 or more employers, 2009-2010: 10 or moreNoneno 'total' OFATS populationno100+* Extraction populationxExtra: customs declarations. Separate collection exp crude oil natural gas, ships and elec. current both imp/expQuarterly 3500 as from 2013 q1, previously app 3000BRNace 16-6, 68-74, 77-82 +95.1. Only mainland included (not covered is 0.05)BRsample survey together with outward FDI. Population based on annual reports and other sourcesBRactive enterprises 100+ 2009* Suppl/compl dataxsee aboveTravellers survey, and other sourcesmainly survey data. Number of employees and turnover from BRN/AEGR from 2010N/Ano#na* Sampling strtgyxExtrastat: total collectionextensive information availableRandom sample of enterprises with 5-49 employees stratified by NACE and size class. In general 15% sample for enterprises with 5-9 employees and 35% for enterprises 20-49. Smaller sample in some specific NACEsample 10+ employers, stratified by industry and size of enterprise measured by employment; 10-19 p 7,5%, 20-49 p 15%, 50-99 p 50%, 100-249 p 75% and 250+ allIFATS: part of SBSSee extraction of populationno samplingcensus* Estimation methodxN/Asimple stratified inflation of the sampleRatio estimator for enterprises with 5-49 employees. Weights based on number employees (nominal values) and number of variables (qualitative variables)N/ADirectorate of Taxers'register, press and internetNo grossing-up is conductednot relevantN/A* Imputation methodxN/aUse t-1 data if availableLow degree of imputation due to high unit response rate and also high item response rateN/AIFATS: according SBSNo delivery and present in T-1, enterprises of a certain size will be imputednot relevantAll enterprises from NACE-rev2 B-N excl K with more than 100 persons employed. This covering almost all enterprises within non-financial sector by the end of year 2009Changes 2008-2012xNo major changesin 2010/2011 changes concerning sampleIn order to make a more comparable time series for non-financial services we have included figures for shipping services from SBS in the years where they are not included the usual data collection for non-financial services. From 2012 and onwards the maritime services figures are again included on a regular basis in the non-financial services statistics and the relation between reported and published figures is as expected. Figures for External trade used in the MDL project are imports excluding ships and oil platforms For exports excl. ships, oil platforms, crude oil, natural gas and condensate. #na2008: 5 or more employers, 2009-2010: 10 or moreUse of EGR improved coverageNo considerable changesnoneref period 2009-2011PORTUGALData sourceBRSBSITGSITSCISICTeCIFATSOFATSBDGVCStorage units (administrative and/or statistical)BothxxxxxxxxxType of information availablelegal identificationxxxxxxxxxUpdate frequency BRdaily update. Dynamic registerxxxxxxxxxInformation on demographic relationsbegin/end, type of demographic event, relation startxxxxxxxxxConsiderable changesN/AxxxxxxxxxStatistical unitxxVAT person#naenterprise(=legal unit)enterprise(=legal unit)enterprise(=legal unit)#naenterprise(=legal unit)enterprise(=legal unit)Coverage compared vs SBSxxnatural persons and non-resident enterprises#nasample, 3 groups size class ge 10 empl; NACE B-F, H, J, K, M, Qall size classes; NACE, turnover; acc. Reg 808/2004No differences#nano differencesSBS excl K; 100+Compilation principles* Cut-off limitsxIntra-EU: 250k/350k; Extra-EU: none#naall enterprises surveyed > 9 empl.turnover quantile 10% N/A#nano100+* Extraction populationxExtra: SAD; Intra: BR and VAT#naBRSBS reference frameSBS enterprises under foreign control#naBD produced under SBS dataactive enterprises 100+ 2009* Suppl/compl dataxT-1 data Intrastat#naadmin data from IESAdmin data IESAdministrative data; EGR data from 2011 on#naN/A#na* Sampling strtgyxall enterprises above annual exemption#naStrata NACE, 10+ empl., ec act., size class, NUTSIIstratified random samplingCensus#nano samplingrandom sampling* Estimation methodxextensive information available#naweighting INV(n/N)weighting (#enterprises, turnover, employment)N/A#naN/Asimple grossing* Imputation methodxextensive information available#nahot/cold deck imputations, mean imputation, trimmed meansre-weighting (unit non-response) and admin informationfirst owner or Portugal#naN/AN/AChanges 2008-2012xchange in methodology 2010#nanonechange NACE Rev1.1/NACE Rev2Use of EGR improved coverage#nanoneN/ASWEDENData sourceBRSBSITGSITSCISICTeCIFATSOFATSBDGVCStorage units (administrative and/or statistical)BothxxxxxxxxxType of information availablelegal identification, enterprise unit, local unit, structure and relations enterprise groupsxxxxxxxxxUpdate frequency BRweekly updates; statistical use:xxxxxxxxxInformation on demographic relationstakeovers etc. using historic informationxxxxxxxxxConsiderable changesinclusion non-resident enterprises and inactive businesses with elderly owners and which has sole proprietorship.xxxxxxxxxStatistical unitxactivity/enterpriseenterprise (not all VAT numbers match BR)legal unitsenterpriseenterpriselegal unitslegal unitsenterpriselegal unitsCoverage compared vs SBSxxSmallest importers and exporters are not coveredITSS covers all sectorsSample for enterprises 10-249 empl. Census 250+ employees and NACE 72. NACE covered in CIS: 05-09, 10-33, 35-39, 46, 49-53, 58, 61-63, 64-66, 71-72. CIS2012 also covered NACE 47,59,60,7310+ empl; NACEwhole business sectorwhole business sectorSBS including section KSBS excl K; 100+Compilation principles* Cut-off limitsxAll enterprises are surveyed.Intrastat: SEK 4.5 M a/d; Extrastat: none1M SKR Turnover reportingall enterprises surveyed > 9 empl.10+ employeesnonono100+* Extraction populationxBRIntrastat: Tax Authority; Extrastat: Customs AuthorityBR, ITGS, FATS, SBS, T-1 informationBRBRBRBRBR active 10+ daysBR* Suppl/compl dataxSBS using compl information from the Swedish tax authoritysee aboveBanks, Exchange Offices, Card Companiesturnover from SBS and Survey Financial enterprisesVAT statisticssome info from Bisnode/MM partner databasesome info from Bisnode/MM partner databaseBisnode, financial statementsAdmin data, OFATS* Sampling strtgyxCensus off all enterprises. SBS also use surveys on specific variables as cost, revenue and investments.Intrastat: cut-off; Extrastat: total collectioncensus and samplingNACE 2 digit * size classes (681 strata, n=5431)stratified random samplingRegister basedRegister basednot relevantrandom sampling* Estimation methodxPPS sampling technique applies on cost/revenue survey. A cut off method applies for the investments survey part of SBS.Intrastat: monthly aggregate ; Extrastat: quarterly estimate invoiced valueHorwitz Thompson (90% of values)Horwitz-Thompson on varianceHorwitz-Thompsonnot relevantnot relevantnot relevantHorwitz-Thompson* Imputation methodxMean value imputation method applies where missing values are detected. Industry and firm size are the stratification variables.Intrastat: 10 different automatic imputation methods for estimating the unit non response including the actual VAT value and forecasting methods (ES, ESM, Regression, AR-regression) and one non-automatically made imputation method for those PSI's where the reported data is not complete. Extrastat: none Cold deck & reweightinghot deck imputation in CIS 2012, Banff imputation method were used in CIS2010. Regarding CIS2008 Eurostat developed a SAS-application that, for instance, controlled if the data had illogical answers and imputed for missing values. re-weighting (unit non-response) using T-1 data where missings; <5% imputed for appr. 15% employment stockusing T-1 data where missings; <5% imputed for appr. 15% employment stocknot relevantcold deck; sometimes hot deckChanges 2008-2012xnonesubstantial process improvementsSample increase 2012Additional coverage NACE 47, 59, 60, 73 in 2012nonenoneAltered definition of concern in year 2010. The definition of Swedish concern include all affiliate firms with or without employees abroad. Also, the industry code for the Swedish affiliate abroad changed in year 2010. Both these changes follow the EU regulations: EU-regulation 716/2007.Inclusion non-resident enterprises and inactive businesses with elderly owners and which has sole proprietorship.noneAnnex II: Summary of country methodological reportsAUSTRIADENMARKFINLANDGERMANYLATVIANETHERLANDSNORWAYPORTUGALSWEDENPhase I: Experiences with the building of the MDL-database1. How would you evaluate the circulated guidelines and SAS-syntaxes for phase 1?Mistakes in guidelinesGuidelines function very well; Some mistakes corrected in syntax; Different versions of software and different software biggest issue.A proposal for aggregated data sets for confidentiality checks were rejected by the project leaders. Unaware what consequences are for confidentiality.Clear structure and visualizations; Instructions clear; Some variable definitions could have been more clear.In general clear, but some more descriptions in SAS-syntax would be helpfulGuidelines were clear, even for NSI without much SAS-knowledge; Updates in syntax caused considerable extra time.?Guidelines exhaustive, detailed and comprehensive; Syntax efficient despite some minor mistakes.Syntax easy to adept and apply; Instructions clearly written; Perhaps limit dataset to most relevant variables.How do you evaluate the circulated guidelines?5=excellent5=excellent4=very good5=excellent3=good4=very good3=good4=very good5=ExcellentHow do you evaluate the circulated SAS-syntax?4=very good3=good3=good4=very good3=good4=very good3=good4=very good5=Excellent2. Did you find the approach to build the MDL-database used suitable? Would you have preferred a different approach? ?Detailed approach was beneficial but very time consumingApproach suitable, but appreciated closer collaboration, for instance to solve confidentiality issueSuitableSuitableSatisfied with approach; Would have preferred to work in SPSS environment.Quite suitable. But include some useful variables (like name, address, etc.) when building database.SuitableSuitableSummarize your answer by tick marking the table 5=strongly agree4=agree4=agree5=strongly agree5=strongly agree4=agree4=agree4=agree5=strongly agree3. Did you encounter any problems/difficulties with transforming the variables of the national data sources into the standardized manner, as listed in Annex A of the guidelines? Please also report how these issues were solved.No problemsMost time was spend on understanding data and variables and which to include. Once this was done, entering in database and editing with syntax was simple.Some variables not included. Most likely specific to Danish circumstances.Unclear definitions MDL variables, especially BR; references to regulation would have been helpful; some variables not includedNot all enterprises have CN08; CIS as proportions and rounded to nearest figures.No optimal use of SAS-syntax due to financial/policy constrains regarding use SAS; Rewriting syntax in own environment time-costly NoMaking new variables burdensome; Building of CIS, ICT and GVC very burdensome to make but not used in the MDL project.No4. Did you encounter any problems/difficulties when running the SAS-syntaxes? Please also report how these issues were solved.Mistakes in syntax were identified and reported.Syntax worked. Syntax for phase one very time consuming, other phases less.SAS code adjusted to fit our different SAS architecture.SAS code adjusted to fit our SAS versionError caused by different SAS-versionSeveral updates made after errors/mistakes in syntax; Individual assistance by DK with country specific errors.Different Operating SystemsDifferent SAS versions and minor mistakesNo. All problems were promptly solved by consulting project coordinators.5. Did you encounter any other problems when building the MDL-database? Please also report how these issues were solved.No.Almost impossible to build this system and make it work for all the first time.NoNoDifficulties with importing GVC data from Excel format; dates were converted wronglyTechnical problems: work computers not equipped for running SAS (lack of temporary storage and running space), limited SAS-licences, only limited SAS version availableNoGaps in data and non-availability of data. Gaps in data solved with imputations, however these did not add up to Eurostat totals and as a result not imputed after all; No ENTgrp_ID available in BR.No6. What is your overall assessment of phase 1 of the project? Good.For the new project leaders hectic, but good decisions were madeBuilding data sets time consuming; Changes in staffSatisfiedAt national level review of data updating methodology (double entries, NACE changes) requiredSatisfied, besides technical issues.Good, except for some minor issuesGoodWorks well.7. Please report about any proposals you may have for further improvements.More time for syntax testing to avoid mistakes.N/AN/AMore clear variables definition; reference to framework regulations regarding variables.Include examples how dataset should look; description about SAS environment; in data file format (Excel, Csv, etc.); instructions on importing data into SAS.Addressing technical requirements before or at the beginning of project Include some useful variables (like name, address, etc.) when building database; take into consideration different Operating systems; case sensitive "include SAS-programs"etc.Define earlier which output to disseminate; define á priori which coherences with information already disseminated to Eurostat.?Phase II: Validation3.1 Instability over time1. Besides using the distributed guidelines and syntax for validation, did you use an additional methodology to validate the MDL-database regarding instability over time? Please explain which methodology you used, and its added value.No.Use of other project with more reference yearsNoMatching by company name and address (etc.), assessing whether turnover and persons employed were within two standard deviations; consultation of BR experts.NoTop-down macro validation using pattern analysis, presented at TF 3 Wiesbaden. 'Holes' in SBS are matched with BD and BR. Valuable when SBS is sample; No micro data was changed, but some adaptions to sources were made related to definition differences.Using additional information from BR (establishments over time, relations to enterprises)NoAuxiliary information from BD2. How do you interpret the results of validation check for instability over time? What are the main reasons for the results?Restructured enterprises (outsourced operating business to new enterprise or existing enterprise); Enterprises not in SBS scope anymore (NACE or threshold survey)Not in SBS scope anymore (NACE or threshold survey)Instability over time mainly in -20 enterprises. They have often inactivity and therefore not in BR.Holes primarily because SBS is sample and due to sampling designs; Inconsistencies in SBS NACE; relocation enterprise to other federal state and demographic events.Steep changes in turnover why enterprise failed by sampling previous year; Economic circumstances; Sample coverage between ICTeC and SBS as the sample survey ICT for year t is established as sample for SBS for year t-1.Holes primarily because SBS is sample; existence of dummy records in sample.Within some enterprise groups reporting enterprises may change over time, which leads to less coherence within ITGS and between ITGS and SBS.Most of holes related to late responses; Other holes related to sole proprietors or self-employed without economic activity in the reference year.One enterprise ID in output3. Did you correct data in the MDL-database? Please also explain the reason why or why not.List of holes: 9/28 corrected. Starts big/ends big: 25/50 corrected.Yes, enhance the link between the data sourcesNo big reason for actionNo. No units surveyed under a different ID in the years of absence.Yes. Mergers have been corrected; Profiled enterprises have been corrected (reporting unit is replaced to statistical unit).Yes, dummy records were removed from SBS; 1000-error solvedYes. Reduce no-match between different data sources within sources and between sources.No. Initially holes were imputed but later on this decision was overturned.Yes. Where possible to change ENT_ID this has been done. Not everywhere possible.4. If you made corrections in the MDL-database, how did you correct the data in cases where instability over time was detected? Did the correction involve adding or replacing ENT_ID?YesYesN/AN/AYes, ENT_ID is replaced in SBSN/AYes, adding and replacing ENT_IDN/AYes. Replacing existing ENT_IDDid the correction involve adding or replacing any other variables? Which variables?NoNoN/AN/AYes, replacing all other variables in SBSN/AYes, values changed due to splitting or merging ENT_ID'sN/ANoDid you correct all variables across datasets in the database?YesNoN/AN/AYesN/AYes, values changed due to splitting or merging ENT_ID'sN/AYesIf any, did the validation involved any other correction of data?No?N/ANo?N/ANoN/ANo3.2.1 No match: ITGS1. Besides using the distributed guidelines and syntax for validation, did you use an additional methodology to validate the MDL-database regarding no-match? Please explain which methodology you used, and its added value.NoNoNoMatching of 'no matches' with SBS via BR; Checking other characteristics as company name, turnover, etc.NoBusiness demography used to asses no-matchUtilizing information such as enterprise group register, location and address, and thereby changing ENT_ID's reduced number of no_match.NoNo2. How do you interpret the results of validation check for no-match in ITGS? What are the main reasons for the results?Foreign enterprises in ITGS that are not in SBS; Difference due to ITGS being a monthly and SBS being a yearly statistic.Different NACE scope; Enterprises in ITGS not covered in SBS; Difference due to ITGS being a monthly and SBS being a yearly statistic.Perfect match. Reason is one source register for different dataSBS is sample; sample design; effect of 'unit-representation validation' on the 'no-match validation'Absent in population and sampling; Merger of enterpriseSBS is sampleAbove corrections combined with other sources reduced number of no-match enterprises.Corrected cases related with demographic events, not treated in foreign trade database; enterprises in foreign hands with sole purpose to manage trade transactions; Real deaths; temporary inactivity but reporting trade.No match caused by sub-units in SBS-survey that have no financial information but report trade; Caused by enterprises classified as inactive but report trade.3. Did you correct data in the MDL-database? Please also explain the reason why or why not.6/35 corrected using 100 million euro threshold.Yes, general changes due to different use of administrative ID's across sources and enterprise groupsNoNoReplacing ENT_ID in ITGS surveyNo, validation of sources has already been done at other departments Yes. Establish better coherence to SBS as well as between years of ITGS.YesYes4. If you made corrections in the MDL-database, how did you correct the data in cases where no-match was detected? Did the correction involve adding or replacing ENT_ID?YesYesN/AN/AYes, replacing ENT_ID in ITGS surveyN/AYes. Using information such as enterprise group register, address and location, mergers.Yes. Changing enterprise ID when non-matches were related to demographic events not treated in foreign trade database.Yes. But mostly maintaining available active ENT_ID by aggregating ID's belonging to sub-units.Did the correction involve adding or replacing any other variables? Which variables?NoYesN/AN/ANoN/AAdding or dividing variables in order to keep same 'totals' for imports and export.Yes. Adding variables defined for the respective databases in all years.Yes. Export/import variables have been updated or revised.Did you correct all variables across datasets in the database?YesYesN/AN/ANoN/AYes. Correct or adapt the ENT_ID in SBS when adding or merging ENT_ID'sN/ANoIf any, did the validation involved any other correction of data?NoNoN/AN/ANoN/AUsing the outcome of validation enabled us to correct data in the same manner as explained above.NoNo3.2.2 No match: ITS1. Besides using the distributed guidelines and syntax for validation, did you use an additional methodology to validate the MDL-database regarding no-match? Please explain which methodology you used, and its added value.NoNoNoN/ANoNoNo correction of ITS; Adding information for services related to sea and coastal transport. N/ANo2. How do you interpret the results of validation check for no-match in ITS? What are the main reasons for the results?Foreign enterprises in ITS that are not in SBS; ITS trade values can switch between enterprises in case of restructuring during year, SBS is on a yearly base.Different NACE scope; Difference due to ITS being a monthly and SBS being a yearly statistic.Perfect match. Reason is one source register for different dataN/AAbsent in population and sampling; ITS survey from Bank of LatviaITS is based on survey, many enterprises do not match with SBS sample.Number of no-match increases due to adding enterprises.N/ANo match caused by sub-units in SBS-survey that have no financial information but report trade; Caused by enterprises classified as inactive but report trade; ITS includes 'third part trade' which causes abnormal high ratios.3. Did you correct data in the MDL-database? Please also explain the reason why or why not.14/18 correctedYes, different approach to enterprise groupsNoN/ANoNo, validation of sources has already been done at other departments NoN/AYes, statistics including sub-units. Not always possible to make ITS and SBS data consistent.4. If you made corrections in the MDL-database, how did you correct the data in cases where no-match was detected? Did the correction involve adding or replacing ENT_ID?YesYesN/AN/AN/AN/AN/AN/AYes, replacing.Did the correction involve adding or replacing any other variables? Which variables?NoNoN/AN/AN/AN/AN/AN/AExport/import variables have been updated or revised.Did you correct all variables across datasets in the database?YesYesN/AN/AN/AN/AN/AN/AYes, corrections made in all available data sources except for GVC, CIS and EC.If any, did the validation involved any other correction of data?NoNoN/AN/AN/AN/AN/AN/ANo3.3.1 Unit representation: ITGS1. Besides using the distributed guidelines and syntax for validation, did you use an additional methodology to validate the MDL-database regarding unit-representation? Please explain which methodology you used, and its added value.NoNoNoNoNoNoUsing additional information like Enterprise Group Register, demographic events in BDNoNo2. How do you interpret the results of validation check for unit-representation in ITGS? What are the main reasons for the results?Restructuring of enterprises; indirect export (quasi transit).different use of enterprise groupsUnit representation very good in Finland; Only few inconsistencies due to reporting practices and internal organizational structure of enterprise.Controlling companies of tax groups clustered in M and L (ITGS more than one SBS unit) report foreign trade activitiesITGS includes total value of goods furthermore SBS does not include this valueSome NACE sections have high export/turnover ratios, most likely because transporters list transported goods as their export. This is hard to disentangle.Reporting unit on enterprise level does not necessarily represent the same 'true picture'In most cases related to enterprises that transport goods: custom agents, trade agents, import/export companiesIn some cases higher export than turnover, this is not adjusted; enterprises with only warehouse storage for re-exports in output; Enterprises defined as foreign affiliate with no production but with export in output.3. Did you correct data in the MDL-database? Please also explain the reason why or why not.7/125 enterprises corrected (restructuring of enterprises).Yes, now enterprise ID's are the sameNoYes. Reallocating export and import within tax groups.No. We cannot reveal which trade amount is value of goods.No. Only a few 'obvious' mistakes resulting from conversion SAS to SPSS has been fixed. Yes. The same methods for validation and changing the population of enterprises.NoNo4. If you made corrections in the MDL-database, how did you correct the data in cases where no-match was detected? Did the correction involve adding or replacing ENT_ID?YesYesN/AYes, adding ENT_ID'sN/AN/AYes. Using information such as enterprise group register, address and location, mergers.N/AN/ADid the correction involve adding or replacing any other variables? Which variables?NoNoN/AYes, 'CL_AREA_GEO_ITGS' was imputed and 'ITGS_type' adjusted for the imputationN/AN/AAdding or dividing variables in order to keep same 'totals' for imports and export.N/AN/ADid you correct all variables across datasets in the database?YesYesN/ANoN/AN/AYes. Correct or adapt the ENT_ID in SBS when adding or merging ENT_ID'sN/AN/AIf any, did the validation involved any other correction of data?NoNoN/AYes, imports were reallocated within tax groupsN/AN/AUsing the outcome of validation enabled us to correct data in the same manner as explained above.N/AN/A3.3.2 Unit representation: ITS1. Besides using the distributed guidelines and syntax for validation, did you use an additional methodology to validate the MDL-database regarding unit-representation? Please explain which methodology you used, and its added value.NoNoNoN/ANoNoNo correction of ITS; Adding information for services related to sea and coastal transport. N/ANo2. How do you interpret the results of validation check for unit-representation in ITS? What are the main reasons for the results?Restructuring of enterprisesdifferent use of enterprise groupsUnit representation very good in Finland; Only few inconsistencies due to reporting practices and internal organizational structure of enterprise.N/AITS lists turnover from companies and foreign subsidiaries, while SBS only companies value; NACE H include value of agents' revenue that cannot be revealed.Some NACE sections have high export/turnover ratios, most likely because transporters list transported goods as their export. This is hard to disentangle.Number of no-match increases due to adding enterprises.N/AIn some cases higher export than turnover, this is not adjusted.3. Did you correct data in the MDL-database? Please also explain the reason why or why not.6/48 corrected (restructuring of enterprises)Yes, now enterprise ID's are the sameNoN/ANoNoNoN/ANo4. If you made corrections in the MDL-database, how did you correct the data in cases where no-match was detected? Did the correction involve adding or replacing ENT_ID?YesYesN/AN/AN/AN/AN/AN/AN/ADid the correction involve adding or replacing any other variables? Which variables?NoNoN/AN/AN/AN/AN/AN/AN/ADid you correct all variables across datasets in the database?YesYesN/AN/AN/AN/AN/AN/AN/AIf any, did the validation involved any other correction of data?NoNoN/AN/AN/AN/AN/AN/AN/A3.4.1 Demographic change: Outliers with demorelations in BR1. Besides using the distributed guidelines and syntax for validation, did you use an additional methodology to validate the MDL-database regarding outliers with demorelations in BR? Please explain which methodology you used, and its added value.?NoAdditional checks based on STS statisticsN/ANoNoData to classify demorelations were of bad quality. Therefore all enterprises are in 'outliers without demo group'.NoNo, no output2. How do you interpret the results of the demography validation check with outliers in the BR? What are the main reasons for the results??High growth related to take-oversIn general demography events are captured correctly, but will continue to validate.N/AMost ceased to exist, but there were also some mergers53 enterprises in output. No changes made.Some outliers identified, but not able to develop routines to handle issue14/25 markedN/A3. Did you correct data in the MDL-database? Please also explain the reason why or why not.?NoNo, but perhaps in the future after additional checks.N/ANoNo corrections in micro data. However adaptions made in datasets, making the output 'more than one event' possible.YesNo, only marked.N/A4. If you made corrections in the MDL-database, how did you correct the data in cases where no-match was detected? Did the correction involve adding or replacing ENT_ID??NoN/AN/AN/AN/AYesN/AN/ADid the correction involve adding or replacing any other variables? Which variables??NoN/AN/AN/AN/AYesN/AN/ADid you correct all variables across datasets in the database??N/AN/AN/AN/AN/AYesN/AN/AIf any, did the validation involved any other correction of data??NoN/AN/AN/AN/ANoN/AN/A3.4.2 Demographic change: Outliers without demorelations in BR1. Besides using the distributed guidelines and syntax for validation, did you use an additional methodology to validate the MDL-database regarding outliers with demorelations in BR? Please explain which methodology you used, and its added value.?NoNoConsultation experts and auxiliary variables from BR were used to identify demographic eventsNoNoNoOutcome matched with records of demographic events. 1 enterprise marked for exclusion.Extra validation using other variables in BD and BR.2. How do you interpret the results of the demography validation check with outliers in the BR? What are the main reasons for the results???Other reasons for fast growthNot certain about demographic events but after further examination almost all identified units excluded.Due to development in the enterprise140 enterprises in output. No changes made.Same reason as other validation: death, births and enterprise reorganizations.Due to development within the enterprises' activity.Each year around 7-10 enterprises in output. 3-4 are yearly changed, due to demographic events like acquisitions and mergers.3. Did you correct data in the MDL-database? Please also explain the reason why or why not.?NoNoNoYes. One enterprise was corrected due to demographic eventNo corrections in micro data. However adaptions made in datasets, making the output 'more than one event' possible.Yes.No, only markedNo, output is negligible.4. If you made corrections in the MDL-database, how did you correct the data in cases where no-match was detected? Did the correction involve adding or replacing ENT_ID??NoN/AN/AENT_ID has been replaced in ITGS and other datasetsN/AYesN/AN/ADid the correction involve adding or replacing any other variables? Which variables??NoN/AN/A?N/AYesN/AN/ADid you correct all variables across datasets in the database??N/AN/AN/A?N/AYesN/AN/AIf any, did the validation involved any other correction of data??NoN/AN/A?N/ANoN/AN/AOverall experiences phase II1. What is your overall assessment of phase 2 of the project? Learned a lot about own dataValidation approach went well. Especially no match and unit representation.Not much reason for corrections, but good to see some resultsImproved the quality of our combined data and valuable insights about our data. Especially unit representation was very usefulIt improved quality of the templatesInformative; questions answered by DKProvided validation tables very good help; Validation output tables easy for additional analysis/checking.Not aware of the overall picture during the validation process: imputations that were made had to be overruled later onwards in project; 'No-match' and 'unit- representation' was very time consuming, but nothing was done with it; A learning process without visible results.Difficult, as variables definitions and measurement differ between sources. Especially ITS hard to validate, and recommendation is not to use them in further analysis; ITGS seem to be of good quality; Comparing SBS with ITGS works well.Summarize your answer by tick marking the table.5=excellent5=excellent3=good4=very good4=very good3=good3=good3=good4=very good.2. How would you evaluate the circulated guidelines and SAS-syntax concerning the validation (phase 2)? Some minor mistakes in syntaxFewer comments about syntax, indication approaches functioned well for most other NSI'sWell done including the codeClearly structured and very comprehensiveGuidelines very informative and received answers to questions; More detailed examples of each validation; More descriptions in SAS-syntaxQuality of SAS-syntax was good, country specific issues answered by DKGuidelines were good and easy to followGuidelines comprehensive. Syntax runs without major problems.Instructions clearly written and easy to understand; In case of problems project coordinators were available for help.Summarize your answer by tick marking the table: guidelines5=excellent5=excellent4=very good5=excellent4=very good4=very good3=good4=very good5=excellentSummarize your answer by tick marking the table: SAS-syntax4=very good5=excellent4=very good5=excellent4=very good4=very good3=good4=very good5=excellent3. Did you find the validation approach used suitable? Would you have preferred a different approach?YesVery suitable. Learned a lot about own dataYesApproach was suitable. 'Instability over time' and 'no match' less valuable as 'unit representation' and 'demography'ITS validation not relevant as handled by Bank of LatviaBottom-up approach was suitable. Some issues in datasets were adjusted. Also own top-down validation approach was valuable. Edit: in hind sight, it would have been beneficial to include simple macro validations as well, f.e. checking totals of each database with original source and with figures already on Eurostat.Approach was useful.Suitable, but maybe should have been simplified.Yes, although validations could be more limited to big enterprises.4. How would you evaluate the overall quality of your national MDL-database? Are there any considerations that need to be taken into account while producing output tables and analyses?Very good quality of national MDL-database except for BD data where improvements are expected.Good quality of national MDL-database. Suitable for business analysis if differences in scope (etc.) are taken into accountVery good condition.Validation helped to improve quality data; Some inconsistencies remain: some micro data based on sample surveys, as a result difference with official publications, some implausible cases could not be resolved during validation phase.Further investigate SBS turnover and ITGS exportSAS-limits output; SBS-sample (and weighting variables) make MDL complicated.ITS is subset; employment variables not available for all NACE sections.Quality is reasonable; grand totals do not add up to sum of NACE totals due to different use of SBS and BR definition; Control breakdown not available in BR; No information on ITS.Quality is good and corresponds with official publications; ITS at micro level is problematic.5. Please report about any proposals you may have for future improvements.?Addressing issue of different approach of enterprise groups in sources; Analyse possibilities to include ITSValidation output tables could use some meta information; Print friendly versionNone?Top down validation, using business demography; Edit: in hind sight, it would have been beneficial to include simple macro validations as well, f.e. checking totals of each database with original source and with figures already on Eurostat.Looking for solution to avoid manual work correcting ENT_ID'sDefining output sooner; which consistencies must be guaranteed; evaluation immediately after each phase instead of long after phase.Limit validation only to largest enterprises otherwise validation burden to high.Future use MDL-database1. Are you going to carry out more analysis using the linked datasets? If yes, please mention which.?No specific plans, but the intention is to look at further possibilitiesYes, carry out similar analysis at local level using establishment dataPublish article in "WISTA"Further investigate SBS turnover and ITGS exportNo regular data production. But will be updated; Edit: Datasets will be used to help with in-depth analysis future (undefined) national projects/figures.Not at the moment. Still discussed.Unclear. Moreover, MDL uses different definitions for exporters and SME than national definitions.MDL project is used in parallel project aiming to analyse relationship FDI and trade.2. Do you plan to maintain and (annually) update the MDL-database, and make tabular output without support from other NSI’s or Eurostat? If yes, will it be part of your regular data production??Applied for Eurostat grant; Utilize database for future Eurostat projects and national purposes.Plans to continue workUncertain. Applied for Eurostat grant. However, for maintaining and updating MDL database more resources are required; Still solving legal issues linking micro data from different sources and permanent storage; Probably no regular production.Not yetArea 1: topic 1, 2, 3; Area 2Unclear. Depends also on resources.Most probably not, still discussed.No3. Apart from the output produced in this project, can you think of other interesting and policy-relevant analyses that could be done with the linked datasets? Please give a few (1-3) examples that would be most interesting for your country.?Linking MDL with social statistics at employee level to analyse skills aspects and impacts on enterprise performance; Profiling R&D intensive and innovative enterprises; Profiling high growth enterprisesAnalysis of developments in ICT sector and innovationAdding German Structure of Earnings Survey; Analyse enterprise performance for various indicators for different subpopulations as panel data; Connection between foreign trade affinity and economic performance.Link SBS and BD by different types of enterprises; produce standardized output from other linked datasets.Are enterprises that are sourcing more financially fit?; which characteristics are vital for survival small enterprises?Coherence between R&D activity and turnover corrected time-lag effect; R&D related to enterprises with and without export.Add inter-quartile analysis for some variables (like GVA, GVA per persons employed, wages per person employed) by sector, dimension and exporter profileImports of intermediate products and international sourcing of business functions; Analyses of R&D expenditures outsourcing companies.4. Do you see the possibility to make regular output for Eurostat using the MDL-database??YesNeeds further discussionVery sceptical: unclear legal situation; uncertainties regarding available financial resources; open questions regarding extrapolation of micro data for aggregated output.Not yet.Possible, if SAS constraints are solved.Depends on resources.In near future most probably not, but still in discussion.Yes, but depends on financial support.Annex III: Availability variablesData SourceESSnameQuestion/Variable contents Available 2008Available 2009Available 2010Available 2011Available 2012BRENT_IDUnique enterprise identification99999BRENTgrp_IDEnterprise Group ID56666BR?Administrative ID99999BR?Start date for the enterprise ID99999BR?End date for the enterprise ID99999BR?Start date for the enterprise Group ID66666BR?End date for the enterprise Group ID66666BR?Legal form of the enterprise ID88888BR?Main activity of the enterprise (NACE 4-digit)88888BR?Secondary activity of the enterprise (NACE 4-digit)77777BR?Ownership of the enterprise (private/public)66677BR?Start date for the main activity66666BR?Start date for the secondary activity33333BR?End date for the main activity66666BR?End date for the secondary activity22222BR?Ownership relation with associated direct ownership indicated as percentages for each enterprise ID45567BR?Information on demographic relations (mergers and acquisitions etc.)55555SBS12 11 0Turnover99999SBS12 15 0Value added at factor cost99999SBS12 17 0Gross operating surplus99999SBS13 11 0Total purchases of goods and services99999SBS13 31 0Personnel costs99999SBS13 32 0Wages and salaries99999SBS16 13 0Number of employees99999SBS16 14 0Number of employees in full-time equivalents99999SBS?NACE 4-digit99999ITGSSTAT_ VALUEImport amount99998ITGSSTAT_ VALUEExports amount99998ITGSCL_AREA_ GEOPartner country (country of origin/destination)88998ITGSCN08Product nomenclature CN08 8-digit88887ITGS?NACE 4-digit77777ITSSTAT_ VALUEImport amount66666ITSSTAT_ VALUEExports amount77777ITSCL_AREA_ GEOPartner country (country of origin/destination)77777ITSBopitemService nomenclature EBOPS 3-digit66666ITS?NACE 4-digit77777CISENTGPEnterprise part of a group 63736CISHOCountry of head office 71816CISMAREUROther EU/EFTA/CC market 70706CISMAROTHAll other countries 60606CISINPDGDIntroduced onto the market a new or significantly improved good 71717CISINPDSVIntroduced onto the market a new or significantly improved service 81817CISINPDTWWho mainly developed these products81615CISNEWMKTDid the enterprise introduce a product new to market 81817CISTURNMAR% of turnover in new or improved products introduced during 2006-2008 that were new to the market 81817CISINPSPDIntroduced onto the market a new or significantly improved method of production81817CISINPSLGIntroduced onto the market a new or significantly improved logistic, delivery or distribution system81817CISINPSSUIntroduced onto the market a new or significantly improved supporting activities81817CISINPCSWWho mainly developed these processes81615CISRRDINEngagement in intramural R&D 81817CISRDENGType of engagement in R&D 71716CISRRDINXExpenditure in intramural R&D (in national currency) 81817CISRRDEXXPurchase of extramural R&D (in national currency)81817CISRMACXExpenditure in acquisition of machinery (in national currency)81817CISRTOTTotal of these four innovation expenditure categories (in national currency)81817CISFUNLOCPublic funding from local or regional authorities 50405CISFUNGMTPublic funding from central government 50405CISFUNEUPublic funding from the EU 50405CISFUNRTDFunding from EU's 6th or 7th Framework Programme for RTD 50405CISCOCooperation arrangements on innovation activities 81816CISORGBUPNew business practices for organising work or procedures 81817CISORGWKPNew methods of workplace organisation 81817CISORGEXRNew methods of organising external relations 81817CISMKTDGPSignificant changes to the aesthetic design or packaging 81817CISMKTPDPNew media or techniques for product promotion 81817CISMKTPDLNew methods for product placement or sales channels 81817CISMKTPRINew methods of pricing goods or services 81817CISMKTMET2004-06 New or significantly changed sales or distribution methods31313CCIS?NACE 4-digit42424ECENT_IDUnique firm id77777ECBROADFirm has broadband88888ECAEBUYFirm orders through computer networks (websites/EDI)99999ECAEBVALPCT% of orders through internet77555ECAESELLFirm sells through computer networks (websites/EDI)88877ECAESVALPCT% of sales through computer networks (websites/EDI)88877ECIACCFirm has internet99888ECEMPIUSEPCT% of workers with access to internet88878ECINTRAFirm has intranet99744ECEMPINTRAPCT% of workers with access to intranet55222ECCUSEFirm uses computers77677ECEMPCUSEPCT% of workers using computers88866ECWEBFirm has website88888ECMOBFirm has mobile access to internet99766ECDIALUPFirm uses a dial-up connection to access the internet89988ECITERPEnterprise Resource Planning88556ECADEAutomated Data Exchange99896ECADESUto suppliers99741ECINVRECreceiving e-invoices99632ECADECUreceiving orders99622ECINVSNDsending e-invoices99532ECADEINFOsending product information99775ECADETDOCsending transport documents99895ECADEPAYUse of ADE for sending payment instructions to financial institutions99795ECADEGOVUse of ADE for sending or receiving data to/from public authorities99994ECSISUSharing SCM data with suppliers88652ECSICUSharing SCM data with customers88452ECCRMSTRshare of information with other business functions99565ECCRMANanalyse information for marketing purposes99667ECSISAINVsales: management of inventory levels99896ECSISAACCsales: accounting99995ECSISAPRODsales: production or services management88884ECSISADISTsales: distribution management88884ECSIPUINVpurchases: management of inventory levels88884ECSIPUACCpurchases: accounting88884EC?NACE 4-digit99997OFATS?Number of foreign affiliates66663OFATS?Number of persons employed in foreign affiliates77774OFATS?Turnover i foreign affiliates77774OFATS?Host country of affiliates77774IFATS?Country of ownership99996IFATS?NACE 4-digit99996BD11 91 0Enterprise ID of active enterprises99996BD11 92 0Enterprise ID of enterprise births99996BD11 93 0Enterprise ID of enterprise deaths99996BD?NACE 4-digit99996GVC?Enterprise group (from questionnaire Module 1)14231GVC?Enterprise employment by business functions (question 2.2)16451GVC?International sourcing (question 3.1)17551GVC?International sourcing destination (question 3.3)16551GVC?International sourcing partner (question 3.2)16551GVC?Back-sourcing: Yes/No question and motivation factors (question 3.8 and 8.10)17551GVC?Relocation (question 3.11)17551GVC?Foreign affiliates (question 4.1, 4.2, 4.3, 4.4)17551GVC?Supplying enterprises abroad (question 5.1, 5.2, 5.3)17551GVC?NACE 4-digit47553Annex IV: Overview issues????????????CountryCommentsATDKFIDELVNLNOPTSETOTAL????????????Phase 1: Problems / difficultiesTime consuming?11??1?1?4?Different SAS-version(s) or different statistical software?1111111?7?Minor mistakes in syntax11??11?1?5?Staff changes?11??????2?Rejection of aggregated confidentiality checks??1??????1?Closer collaboration appreciated??1??????1?Some variables could not be included in database ??11?????2?Unclear variable definitions???1?????1?Difficult to make syntax and guidelines uniform for all countries?1???????1?Not enough description in SAS-syntax????1????1?Difficulties importing data in SAS????1????1?Computer environment not made for SAS?????1???1?Much time spend on variables / datasets that were not used???????1?1????????????Phase 1: ProposalsMore time for syntax testing1????????1?More clear variable definitions???1?????1?Reference to framework regulations regarding project specific input variables???1?????1?More descriptions on working with SAS (including examples)????1????1?Take in consideration technical requirements (incl. SAS-versions) before or at the beginning of the project?????11??2?Include extra BR variables identifying enterprise, like name, address??????1??1?Define earlier which output to make???????1?1?Limit dataset to most relevant variables????????11????????????Phase 2: Interpretation validation resultsRestructured enterprises (outsourced operating business to new or existing enterprise)1?1???1??3?Enterprise not in SBS scope anymore (change in NACE or threshold survey)11??1????3?(Foreign) enterprises that are in ITGS but not in SBS11???????2?Difference due to ITGS being monthly and SBS being yearly statistic.11???????2?Indirect export (quasi export)1????????1?Different use of administrative ID's across sources and enterprise groups?1???????1?Different approach to enterprise groups?1???????1?Demographic events?1?11?1?15?Inactivity of (micro/small) enterprises (but reporting trade)??1????113?SBS is sample and sampling design???1?1???2?Inconsistencies NACE amongst SBS sample design???1?????1?Relocation enterprise to other federal state???1?????1?Effect of 'unit representation' validation on 'no-match' validation???1?????1?Enterprises report trade activities from more enterprises (f.e. tax groups)???11??1?3?Economic circumstances????1????1?ITS includes sometimes turnover from foreign subsidiaries, while SBS only from company????1????1?ITS NACE H include value of agents' revenue????1????1?Development within the company??1?1??1?3?ITS is sample?????1???1?Transport related enterprises list value of goods as their export/turnover?????1?113?Within enterprise groups, reporting enterprises may change over time??????1??1?Adding enterprises within ITS (services related to sea and coastal transport)??????1??1?Too late response ???????1?1?Sub-units in the SBS-survey (no financial information but they report trade)????????11?Third party trade in ITS????????11?Foreign affiliates with no production but substantial export volumes????????11????????????Phase 2: Problems / difficultiesMDL method differs from 'normal' method (especially when having samples as sources), and as a result considerable difference with official data???1?1?1?3?Some implausible cases not resolved???1?????1?Minor mistakes in syntax1????????1?Not aware of overall picture validation process???????1?1?Independent/dependent variable not available???????1?1?Time consuming???????112?ITS variables hard to validate, recommend not to use in further analysis????????11?Variable definitions and measurement differs when comparing to each other????????11????????????Phase 2: ProposalsDefine in advance which coherences need to be guaranteed regarding earlier send information to Eurostat?????1?1?2?Addressing issue of different approach enterprise groups?1???????1?Analyse possibilities to include ITS?1?????????More descriptions on working with SAS (including examples)??1?1????2?Print friendly version??1????????Add some macro validation?????1???1?Find solution for manually correcting ENT_ID's??????1??1?Evaluation after each phase, instead at end???????1?1?Micro validation could be more simple / more focussed on big enterprises???????112 ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download