Design – description in the Memobust handbook



Design – description in the Memobust handbookDr. Eva ElversProcess owner Design and plan & Build and test, Process Department, Statistics Sweden, Box 24?300, SE-104 51 Stockholm, Sweden, eva.elvers@scb.seDesign is important for a new survey, when an existing survey is redesigned, and also in the continuous improvement of an ongoing regular survey. The Generic Statistical Business Process Model (GSBPM) has a design phase, which is the second phase out of nine. Later phases and sub-processes are designed, for example output, data collection methodology, and sampling. The Memobust handbook has a design sub-topic at the end of most of its topic chapters; the design utilises the methodological knowledge described in the chapter to make appropriate situation-dependent choices. The quality of the output is to fulfil the specified needs of the users/customers, having balanced quality and costs. The statistics production is to be made within the decided budget. The response burden should be low. Overall design (total survey design) aspects are described in addition to the individual sub-process designs. Appropriate overall choices and allocations are to be made, considering for instance the budget restriction and the fact that sampling and estimation are strongly related. In ongoing surveys process data should be logged, suitably summarised, and used for evaluation and feedback. A new survey may use related process data or pilot studies.IntroductionThis presentation is part of a session about the ESSnet project Memobust: Methodology for Modern Business Statistics. This illustration has two major aims: to describe important aspects of design of statistical surveys, especially for business statistics, and to show how Memobust provides information on design. Memobust is structured with around 25 topics. Each topic consists of a set of modules. The choice of topics has taken the Generic Statistical Business Process Model (GSBPM) and methodological tradition into account. There are two main types of module descriptions: themes and methods. There is a template for each type of module, there are cross-references on several levels, and there are glossaries. Each module is written to be readable on its own and largely self-contained, especially if it is a theme. A method module has methodologists as the main readership, whereas a theme module has a broader scope and should be comprehensible for interested staff in statistics production. There are design modules within several topics, for instance sampling, estimation, data collection, editing, and imputation. In addition, there is a topic devoted to design, called Overall design.Business statistics are largely produced regularly through repeated surveys and often in a statistical office. The production can be thought of in three layers: the Business Register (BR) as a basis, primary statistics in the middle, and secondary statistics on the top, e.g. the National Accounts (NA). There are coordinating activities between the layers in both directions. National and international requests on statistical units, variables, reference periods etc. have to be considered from ideal as well as achievable (measurability) perspectives. The BR provides frames for the surveys and also samples and successive updates for units, variables, and populations. There are many balances to consider in choices and coordination activities, e.g. between administrative data and direct data collection and between annual and short-term statistics. There are time differences when the latter survey types use a (sampling) frame, possibly communicate with respondents, make estimations etc. What does modern mean? Statistical methodology develops, e.g. for data collection with new modes, a better understanding of the response process, and more advanced editing. Technology develops at a high speed enabling further methods, more advanced tools, and more communication. Users get more interested in putting different data and statistics together themselves. This means for instance that quality components such as coherence, accessibility, and clarity are important. Statistics production in statistical offices changes, much due to budget cuts and also because of requests on quality. There is an increasing pressure on low response burden, and non-response is an issue. The production is moving from tailor-made stove-pipes for single surveys towards architecture with statistical systems, re-use of data, common tools, data warehousing etc. Standardisation is a key word. Sometimes the term industrialisation is used, but the interpretation should be careful. It means powerful and enabling production processes, not something mechanical.There is cooperation between statistical offices about methods and tools, and the international organisations have an important role in this work, e.g. Eurostat. There are several frameworks, e.g. the established GSBPM and the Generic Statistical Information Model (GSIM) now being developed. These frameworks contribute to standardisation, and they simplify exchange of information, also with users.What is design?The word design has several aspects, already within statistics production. The scope may be methodological or technical. Design may refer to the whole statistical survey, to a specific sub-process, to a tool, or to a system. Design is important for a new survey, when a survey is redesigned, and also in continuous improvements of surveys, especially repeated surveys. Metadata have several roles, e.g. to assist the users to find relevant statistics and to interpret the statistics. Metadata are essential also in design and production to specify requests. Especially for repeated surveys possible improvements can be seen by first saving information – well-chosen data about the production process – during the production and then analysing these process data (paradata). There may be quality improvements or cost savings or both. New laws and regulations, changing requirements from users, occurrence of mistakes in the production, and high costs are other examples of situations that may lead to decisions about redesign or improvement of individual sub-processes or tools. New methods can lead to redesign and improvements, too.The GSBPM has nine phases, in the current UNECE version. The three first phases are Specify Needs, Design, and Build. They are all preparatory. The phases with numbers four to eight are Collect, Process, Analyse, Disseminate, and Archive. The ninth phase is Evaluate. This is a cyclic procedure – similar to Plan, Do, Check, Act – with learning from one production round to later rounds. It also shows the importance of user needs as input.Figure 1. Illustration of relationships between design sub-processes and quality componentsFigure 1 illustrates the complex relationship between the resulting quality of the statistical output – the five main quality components in the European Statistical System – and the preparatory sub-processes, especially in the phase Design but also in the two phases Specify needs and Build. There are many more relationships than those drawn. Depending on use(s), the user(s) may put different priorities and constraints on the different quality components. The emphasis here is on statistics, but the output may alternatively be micro data or both micro and macro data.Design is about choices of methods – e.g. for sampling and estimation, data collection, contact strategies, and editing – and about allocation of resources to the sub-processes in the statistics production. One aim for design, at least in principle, is to search an optimum, e.g. minimum cost for a given quality or maximum quality for a given cost. In practice this means searching a realistic solution somewhere near the optimum. Quality is multi-facetted and depends on both uses and users. Hence, quality needs and wishes have to be discussed and specified, also balanced with the budget. Some quality components may be considered as constraints, e.g. because there are regulations on contents and timeliness. Much of the practical design work is normally devoted to accuracy, as further discussed below.Before a change is implemented, the consequences must be analysed, as well as the investments that may be required. The benefits of the change are then compared to the cost of implementing the change and possible negative effects, for example costs of changes in IT systems. Some changes are conveniently introduced immediately, while others should rather wait and be introduced simultaneously as a package. Risks to the time series must be considered, as well as opportunities to eliminate or overcome these time series breaks. Usability testing, pilot surveys, and experiments are different ways to examine the consequences, for example to test whether and how new technology influences measurements or systems.Quality assurance and quality controlThere are several general definitions of quality, e.g. fitness for use, fitness for purpose, and the degree to which a set of inherent characteristics fulfils requirements. Since quality of statistics depends on the use, the producer should work together with users and specify the important needs, including the relevant quality components. These selected needs can be taken as the purpose of the statistics and the quality needed for this purpose. This quality level is essential; it is sufficient for the purpose. Design of a statistical survey is largely devoted to the output quality – and, by necessity, to factors influencing that quality. It is related to both quality assurance and quality control, as shown in the brief description below of these concepts.Quality assurance has two main aspects:Approaches and methods to achieve the intended/stated quality.Providing confidence that the quality requirements will be met.For the statistics to achieve the quality that has been stated the following is needed: a good and realistic planning, control of the production, as well as assessments and checks on the quality of processes and the final statistical product. To use proven techniques, methodologies, checklists, etc. have several positive effects. It is e.g. easier to predict end product quality and to avoid situations where the desired quality is not achieved. Common methods, tools, and practices thereby contribute to the quality assurance of individual statistical products.While quality assurance is everything you do to get a good quality, quality control is verification that the quality achieved was as expected. Quality control is used to monitor that the planned methods, tools, routines etc. are used, operating as intended, and result in the intended quality. Checks may be of various types. It may be necessary to check that design specifications are followed. It is important that quality control is used to control, and also to improve, each process that does not work as intended.What is included in an “optimisation”?It is an aim to find the design that is best, or optimal, given the different requirements, criteria, and constraints. Is there such a design and how can one arrive at it or at least come close? How much can be predicted and balanced? Which factors are the most important ones? Are there useful models for quality and costs? These questions are difficult to answer, and the product manager usually needs help from experts, including IT professionals, methodologists, and cognitive experts.Below are some examples of factors and conditions that must be considered in the optimisation procedure; the procedure to find a solution reasonably close to an optimum. Most of these factors imply constraints.There are regulations for the statistical office, e.g. so that a user-desired level of detail in the statistics may not be achieved due to rules for disclosure control.There are rules for data collection and requirements to reduce the response burden. This may have effects on the level of detail and also on the variables and the questionnaire.Quality is related to the use. It is important that the user dialogue clarifies the constraints (if any) and what aspects should be included in the optimisation work. The users may have requirements, e.g. on timeliness, which limits flexibility.Lack of resources can influence and constrain the possibilities for a survey.Although the optimisation seemingly can be expressed simply – to minimise the cost for a given quality or maximise the quality for a given cost – the optimisation itself is not a simple calculation. The situation is rather more described in terms of constraints and room for manoeuvre. Certain constraints are set out in the user dialogue, early or gradually. Perhaps the most common constraint is that the financial resources are limited. A rough design and plan of the required production should be made during the user dialogue to find out if the goals are realistic.When the user dialogue is completed, further work includes most of the following major issues: laws/regulations, quality requirements, quality wishes, response burden, staff, and costs. The work differs, of course, between a new survey, a re-design, and continuous improvements. In the latter cases there is already a starting point and experience, probably both quantitative and qualitative information. Theory for some survey parts and sub-processesThere exists no coherent theory for the design of statistical surveys. However, a variety of theories can be used, singly or in combination in various sub-processes. Such theories can be used to select appropriate methods or at least get assistance in the choice. Some examples of theories and principles follow below, together with comments about the handbook.For sampling and estimation there are theories with clear criteria for achievements (probability sample, minimising the mean squared error MSE) and for many situations, even formulas that make it possible to calculate what is best or at least good. This area is more highly developed than many others in theory. – Some specifics for business statistics are repeated surveys with coordinated samples with regard to response burden and estimation of changes over time. The population is skewed and it changes quickly. Memobust has modules on design of sampling methods and design of estimation. Theories for the response process for different types of surveys, respondents, and data are developed in the behavioural sciences. Measuring techniques utilise theories and experiences in order to avoid or reduce measurement errors (such as reducing response error as a result of difficult words and memory errors). Response processes for business surveys are less well known, but they are gradually developed. – Memobust has a module on questionnaire design.For data collection there are theories and knowledge of the advantages and disadvantages with different modes (questionnaire, telephone interview, face-to-face interview, etc.) in different situations (e.g. subject, cost, and time). In some situations, the choice of data collection method is evident, but in other situations discussions with users are needed. For example, telephone interview is a data collection method that can be implemented quickly but that is not suitable for all types of questions and question structures. – Memobust has a module on design of data collection, including many issues, e.g. data collection method, contact strategy, and responsive design.Editing is part of the quality control, specifically the quality control of data collection. The design of the editing is included in the survey design. Previously there was in many cases “over-editing” with too much time spent on units with little influence on the estimates. Nowadays statistical approaches have led to methods such as selective editing and macro editing, using resources in a cost-effective way. It is, of course, important to know how the statistics will be used – which estimates are needed with what accuracy. – Memobust has a module of design of editing, where the design is much about the choice of methods. It is a matter of quality and also of costs, mainly work in production. Some situationsSome characteristics for business surveys follow. The population is skewed with a small number of large enterprises that make up a considerable part of the economy. The population changes quickly due to births, deaths, mergers etc. The BR is important in providing frames with different types of statistical units. Regular and timely updates of the BR are important, since they contribute to low quality deficiencies, e.g. with respect to coverage and classifications. There are a few major types of business surveys. There are annual statistics, e.g. Structural Business Statistics (SBS), which describe the structure, main characteristics and performance of economic activities across the EU. They are detailed. Short-term statistics (STS) are produced quarterly or monthly; timeliness has high priority. Changes over time are important, and the statistics are often in index form and seasonally adjusted. Coherence between the short-term and the annual statistics is desirable – and a necessity in the national accounts. There are some practical difficulties due to time differences, e.g. between data collections. Drawing a new sample for short-term statistics means new population information, but also practical work with updates of samples and contacts. Considering the quick population changes, updates should be frequent. The sample design should take both accuracy and response burden into account; some procedure with sample coordination is recommended; over time and between surveys. Sampling is generally random, but there may be exceptions, typically in the absence of sampling frames. The producer price index provides an example; there is not a list of products.Business statistics in a statistical office are largely run through repeated surveys, where measures of change often are important. The repetitions enable continuous improvements, for instance by collecting and utilising process data (paradata). Contact strategies can be improved successively, and knowledge can be transferred across similar surveys. The time schedule and the allocation of resources can also be improved successively.Administrative data more and more become a possible source of information; as a complement or on its own. Design is still needed when administrative data and other registers are used – design with delineations of populations and object types, variable definitions, estimation etc.The activities and some aspects of the design process can be described as below, in brief.Design a forthcoming survey roundfor statistics (macro data) or micro datathrough appropriate competences in cooperation/agreement (mutual understanding) with customers/users/stakeholders so thatquality is sufficient for the intended usethe production is within budget and cost-effective (in the long run) andwith regard to respondents (burden).This description is less formal and not as distinct as previously with regard to optimisation. On the other hand it is a bit more encompassing, e.g. since it includes competences and more than one production round.Concluding remarksThe design procedure may simply be stated to make methodological choices and allocations of resources. It strives for some constrained optimum typically involving primarily quality components of the statistical output and costs. There is normally not an obvious utility or cost function, and the constraints may not be in an explicit form. Still, the statistical office certainly has some information on costs. That information should be sub-divided where motivated when striving for cost-efficiency. Even rough figures may be useful. A repeated survey provides a good source of information for improvements. Experiments and tests can be embedded to get information in a controlled way and to try ideas before embarking in full scale. Responsive or adaptive design is a strategy, where the design is modified during the production according to pre-set principles.The design phase has received a clear position in the GSBPM model. Its importance as an investment seems to be realised gradually, together with the further preparatory phases Specify needs and Build. The handbook also shows the worth of design through having so many modules describing design. ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download