Stata – Commonly Used Commands and Useful Information



Stata – Commonly Used Commands and Useful InformationStata Files.dta files – Stata data files. Any time Stata saves data, it saves as a Stata data file..do files – Do files store Stata commands. These commands are the same as those typed into the Command window.smcl and .log files – These are log files that store the output window. Start a log prior to running most of your commands.Stata Command SyntaxStata commands, with few exceptions, follow this template. Bracketed items are optional. Bolded items are most common. [by varlist:] command [varlist] [=exp] [if exp] [in range] [weight] [,options]All commands must contain command, which is a Stata command. For example, regress will run an OLS regression.A prefix may precede the command and is followed by a colon. Common prefixes are discussed below and include by, bysort, xi, and quietly.A varlist is a list of one or more variables. Some commands only allow for a single variable. In many cases, the order of the variables is important. The dependent variable always precedes one or more independent variables.The item =exp is an algebraic expression. These are typically found with the generate and replace commands. The if exp component evaluates true or false for each observation depending on the condition. The command is performed when true.The in range component denotes an observation range (for example, the first 100 observations).weight denotes a weighting expression if one is needed.Most commands have one or more options, which follow a single comma. These allow for additional information or for non-default operation of the command and can be found in the help documentation. ALWAYS CHECK THE OPTIONS BY TYPING help command.Exampleby gender: regress income educ paeduc if marital==1, vce(robust)In English: for each gender, separately run a regression. Income is the dependent variable, education and father’s education are the independent variables, but only if marital is equal to 1. Use robust standard errors.Stata OperatorsArithmeticLogicalRelational+Addition&And>Greater Than-Subtraction|Or<Less Than*Multiplication!Not>=Greater Than or Equal/Dividsion~Not<=Less Than or Equal^Power==Equal-Negation!=Not Equal+String Concatenation~=Not EqualThe order of evaluation (from first to last) of all operators is ! (or ~), ^, - (negation), /, *, -(subtraction), +, != (or ~=), >, <, <=, >=, ==, &, and |.Time Series OperatorsLag / LeadDifferencesL.varLag var one period(vart – 1)D.varDifference vart - vart – 1L2.varLag var 2 periods(vart – 2)D2.varDifference of differencevart - vart – 1 – (vart – 1 - vart – 2)…… F.varLead var one period(vart + 1)L.varSeasonal difference, one periodvart - vart – 1F2.varLead var 2 periods(vart + 2)L2.varSeasonal difference, two periodsvart - vart – 2……File Access and Set-Up CommandsWorking directoryThe working directory is the folder where Stata looks to find data and save data and log files. Setting a working directory means that only the file name, not the full path, will be mandDescriptionExamplepwdPrint the current working to the output windowcdChange the working directorycd “C:\Users\Ryan\myproject\”dirDisplay contents of working directoryOpening and Importing DataStata stores data in a proprietary format, Stata dta files. These can be directly opened with the use command, and all saved data is saved in this form with the save command. Stata also has the ability to import data from a variety of formats, including Excel and comma separated value text files. The import command is used to do thisCommandDescriptionExampleuseOpen Stata datasetuse mydata.dta, clearsaveSave as Stata datasetsave mydatacopy.dta, replaceclearClear all data in memoryclearimport delimitedImports delimited text files (Note: options exist to define the delimiter)Import delimited mydata.csv, clearimport excelImports an Excel sheet (Note: you must specify the sheet name)import excel "mydata.xlsx", sheet("Sheet1") firstrow clearinfixImport fixed width text file (Note: you may specify the fields, data types, and lengths here or in a dictionary file)infix id 1-2 str name 3-6 education 7-8 income 9-14 using "mydata.txt"Other Set-Up Commands Prior to AnalysisCommandDescriptionExampleLog files and do fileslog usingStarts a new loglog using mylog.log, replacelog closeCloses an open logdoeditOpens a do file editorSet-Up (Defaults)set maxvarChange default maximum number of variables in datasetset maxvar 5000set levelChange default level for confidence intervalsset level 90set logtypeChange default format for log filesset logtype textset maxiterChange default number of iterations for maximum likelihood. Default is 16000.set maxiter 8000set matsizeChange default maximum number of variables in a model. Default is 400.set matsize 800set more offChange default behavior of more prompt for multiple output screens. Default is on.set more offDatabrowseOpens the data browsereditOpens the data editor (you should NEVER do this)HelpsearchSearch for a concept or termsearch postestimationhelpGet help on a specific commandhelp regBasic Information, Descriptive Statistics, and PlotsThis section covers basic information regarding the data types used by your data, size of memory allocated for data storage, variable and value labels, and descriptive statistics and mandDescriptionExampleBasic InformationdescribeLists variables, data type, size, any variable labels, and any value label sets.describedescribe var1 var2 var3codebookLists variable name and label, type, value labels, number of missing, and tabulation if few values are presentcodebookcodebook var1 var2listList variables for some or all observationslist var1 var2 in 1/20Descriptive StatisticssummarizeSummary statistics for one or more variablessummarizesummarize var1, detailtabulate One- and two-way tabulationstab var1 var2tab var3, missingtabstatTable of summary statistics for one or more variables.tabstat var1, statistics (mean median min max count)correlateCorrelation matrix of two or more variablescorrelatemisstableTable of present and missing value countsmisstable summarize var1 var2ttestT tests ttest var1==0 – One samplettest var1==var2 – Pairedttest var1 by(group) – Two-sample, groupedGeneralized Linear Models and Post-EstimationThere are dozens of commands that each specify a particular model. The table below lists a few of the common models. The table is principally focused on post-estimation commands, which allow for assessment of the model once run.Post-estimation commands utilize the stored estimates and model information and must be run immediately after the model. The post-estimation commands listed here apply to the regress command. Models may store different pieces of information, and post-estimation commands will necessarily mandDescriptionExampleOLS Regression and Generalized Linear ModelsregressOLS regressionregress var1 var2 var3logitLogit modelprobitProbit modelpoissonPoisson modelmlogitMultinomial logistic regressionPost-estimation Commandsestimates storeStores model estimates for later use and recall. Use immediately after model.regress var1 var2 var 3estimates store my_resultsestimates replayReplays model estimates.estimates replay my_resultsestimates restoreMakes specified model statistics active.estimates restore my_resultsestat ExamplesEstat contains many of the post-estimation statistics. The command requires a second word to specify the particular statistic.estat ovtestRamsay RESET test for omitted variablesestat ovtestestat gofGoodness of fit chi-square test for model formestat gofestat vifVariance inflation factor testestat vifestat hettestTest for heteroskedasticityestat hettestestat icAkaike’s and Schwarz’s Baysean Information Criteria (AIC and BIC)estat icestat vceVariance-covariance matrixestat vceOther Post-estimation CommandspredictPredict values as new variable (fitted values, residuals, etc.). Fitted values is defaultpredict myfitpredict r, residmarginsMarginal means and effectsPost-Estimation PlotsrvfplotResiduals versus fitted plotrvfplotavplotAdded variable plotavplotlvr2plotLeverage versus squared residual plotlvr2plotData ManipulationData manipulation can involve anything from the generation of a new variable to complex transformation of your data (aggregation, transposition, or preparation for time series analysis).CommandDescriptionExampleData BrowserbrowseBrowse a view of the databrowseeditLike browser, except you don’t want to useeditVariable Creation, Population, and DeletiongenerateGenerate a new variablegen mynewvar=.egenExtensions to generate. Use if gen does not work. This is often used when you need to generate a variable based on all observations.egen myvar = max(var)egen myvar = count(var)replaceReplace a variable with a constant or expression.replace mynewvar=1replace mynewvar=2 if var2==2recodeRecode one or more variablesrecode myvar myvar2 (1 2 = 1) (3 = 2) (4 5 = 3) gen (mynewvar mynewvar2)dropDrop variables or observationsdrop var1 var2drop if var3==.keepInverse of drop. This will only keep what is specifiedkeep if var3!=.Variable Names, Labels, renameChange the name of a variable. This is the name, not the longer label that is often attached to a variable. Old name goes first, new name secondrename oldvar1 newvar1label variableChange the label attached to a variable.label variable newvar1 “Marital Status at Age 30”label defineDefines a value label set. A label set can be attached to the values contained within one or more variables by the label values command.label define lab_mar 1 “Married” 2 “Divorced” 3 “Separated” 4 “Widowed” 5 “Never married”label valuesAttaches a label set (last object listed in command) to the values in one or more variables.label values newvar1 lab_marencodeCreates numeric version of a string variable. For example, turns male/female to 1 or 2.encode var1, gen(newvar1)decodeCreates string version of a numeric variable. Note: you will want attached value labels for this to work.decode var1, gen(stringvar)CommandDescriptionExampleAdvanced Data manipulationpreserveGenerate a new variablegen mynewvar=.restoreExtensions to generate. Use if gen does not work. This is often used when you need to generate a variable based on all observations.egen myvar = max(var)egen myvar = count(var)mergeMerge saved dataset to currently loaded dataset. A single field must be shared. This adds variables.merge 1:1 id_variable using data2.dtaappendAppend saved dataset to currently loaded. This adds observations.append using data2.dtacollapseCollapses data into larger unit of analysis. For example, counties may be collapsed into fewer states. Summary statistics for data fields must be specified.collapse (median) var1 var2 var3, by(state)reshapeTransforms data from wide to long format and vice versa. For example, if you have time series data but want each time point to be a separate column, this is moving from long to wide data.reshape wide var1, i(state) j(year) crossForm every pairwise combination of saved dataset and the currently loaded dataset. If each dataset has 10 records, the output will contain 100 recordscross using data2.dtacompressChanges variable types to smallest data type that will retain all information. Makes strings smaller and number types smaller to save pressGraphsThe easiest way to make graphs is to use the clickable interface. Most commands use the graph command, followed by the type, variables, and options. The options allow for manipulation of almost all aspects of the mandDescriptionExampleGraphsgraph boxBox plotgraph barBar plotgraph piePie chartgraph twoway lineTwoway line chartgraph twoway scatterTwoway scatter chartCommandDescriptionExamplegraph twoway areaTwoway area chart histogramHistogram of a variablegladderLadder of powers plot (one variable). This presents histograms of mathematical transformations of the specified variable.quantileQuantile plotkdensityKernel density plot Graphs can also be overlayed, and options control what is shown. The following command combines a scatterplot of some data with a linear fit between the two variables (and the confidence intervals). Each graph is enclosed in its own set of parentheses.graph twoway (lfitci d_emp_man d_emp_all) (scatter d_emp_man d_emp_all) PrefixesThe following list is not comprehensive and only includes some of the more common prefixes that can be used in Stata. Note that not all Stata commands allow for the use of prefixes. For example, egen is a command that allows for limited use of the by prefix. Refer to the help for your specific command to see if prefixes are allowed.PrefixDescriptionExamplebyRun a command for each group identified in the by variableby gender: reg var1 var2 var3bysortSame as by, but sorts by the by variable. Sometimes has to be used if sorting must occur.bysort gender: tab var1quietly / noisilySuppress / Force display of outputquietly: reg var1 var2 var3estimate store model1svySurvey prefix command. Use this when using survey data. The svyseti command must first be used to set the data as survey data.svy: mean var1xiExpand interaction terms. Also used for categorical data. This example uses i. to note the single categorical variable.xi: reg depvar i.marriage ageReferences and ResourcesData and Visualization Services – IDRE - workshop - ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download