Jcsites.juniata.edu



Thursday, October 5, 2017Name ____________Key_____________ [7 pts] Below is a visualization model from the text. Match each of the following actions or considerations to each of the five stages in the model above. Use each one at least once.Rendering the visualization to 2-dimensions: ____visual display____ Distinguishing obvious patterns in the data: ______perception________Merging multiple data sets: ______data mapping____________Concluding that two or more variables’ relationship should be investigated further: _____cognition______Spotting cluster sizes, groups and outliers: ______perception___________Normalizing an attribute to a z-score: ______data map[ing_______Choosing which visual variable is best for a data attribute: _____visual mapping________For each of the following choose whether it describes an infographic (IG) or a data visualization (DV)[5 pts]__IG__ Typically manually drawn.__ DV __ Uses a computer to display thousands of points of data__ IG __ Aesthetically rich, artistic__ DV _ Can be regenerated with different or changing data__ IG __ Mash-up or combinations of different but related smaller datasetsDefine each of the following terms and distinguish among them.[5 pts]Dimension: the number of attributes, value of an observationUnivariate: single dimension dataset. One set of valuesBivariate: two dimension dataset, two columnsHypervariate: multiple dimension dataset, many columnsData coding.[13 pts]8 bits permit representing ___255__ unique values.If we only need to represent 12 unique values, a minimum of __4___ bits are required.The value 10101 in binary is the value ____19_________ in decimal. And the value 20 in decimal is written as ___10110___ in binary.If the 8 bit ASCII coding in decimal for “A” is 65 then “C” is _67___.For the following RGB values, choose what the color is represented. Choose from: white, black, grey, red, green, yellow, blue, brown, light blue, light green)255,255,255 = _____white________0,255,0 = ______green___________150,150,255 = ____light blus_______For the following descriptions, choose the file types that is best described from (.xlsx—Excel, .csv – Comma Separated Values, .json – Javascript Object Notation, .html –HyperText Markup Language)__xlsx_ Proprietary spreadsheet format with binary encodings__html_ A text file that contains data and how to render the data for a web browser__csv__ A text file that holds entirely data, with the same number of values per line__json_ A text file that holds data and descriptions of data; it can represent more complicated structures and relationships of data.__csv__ The most common file format used by data visualization and data mining tools.Plot on the number line this set of 12 univariate numbers {5, 10, 10, 10, 16, 25, 28, 30, 30, 36, 45, 50} with small circles, using jiggle. [16 pts]┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼────┼ 0 5 10 15 20 25 30 35 40 45 50 55Superimpose a Tukey box plot marking the median and 25th and 75th percentiles.What is the mean of these data? _24.6___Bin this data using even ranges for 3 bins: __{5,10,10,10,16}__ __{25,28,30,30}__ __{36,45,50}___If we normalize this data set to fall between 0 and 1, and assuming the data range is actually (0 to 50) then the following 3 values are recoded as… 50-> __1___, 45-> ___0.9___, and 25 -> __0.5____Describe three different actions you can do with data that has missing data. You should also describe in what cases you would use that action.[6 pts]Drop the row of data. Have plenty of points and this is no lossReplace with average. Replace with similar values, N may not be too large and we want the other dataDo nothing and let the application deal with it, treat as a different value. Missing values are not criticalFor each of the attribute/column types, give an example of data of your choice that clearly would represent that type and would clearly not be good to classify with any other possibilities. Also fill in an appropriate visual feature to which the type would be mapped (position, size, shape, color/hue, intensity, texture, direction). [18 pts]Type descriptionDescribe some example data Visual FeatureNominal-Categorical-RankedAny Likert scale, binned heights, weights.Color scale, position, intensity, sizeNominal-Categorical-UnrankedPreferences of color, pets, people categoriesColor, texture shapeOrdinal DiscreteAny countPosition, size, color scale, intensityOrdinal ContinuousAny measurement, weight, heightPosition, sizeSpatialZip, lat-longPositionTemporalTime scales, sequencingPosition, motion[10 pts]Perception true/false__T__ Motion is a pre-attentive perception.__ F __ Change blindness occurs when we stare at something too long.__ F __ The cones in the human eye react separately to the wavelengths of red, yellow and blue colors.__ T __ About 9% of the population has some named form of color blindness.__ T __ Saccadic eye movements are the rapid eye movements to scan targets of interest.__ F __ Green is the best color for those with many forms of colorblindness.__ T __ Some illusions might portray data that is not there, e.g. Hermann Grid illusions.__ F __ It is good to use more visualization features than attributes from the data.__ T __ Recognizing larger glyphs is pre-attentive, while determining ratio of sizes is not.__ F __ People perceive area in a consistent manner.True/false statistics.[10 pts]__ F __ Statistics can be computed only if the data columns are nominal.__ F __ Nominal-ranked cannot be converted to ordinal.__ T __ A Likert scale is a form of nominal-ranked.__ F __ Calculating a mode statistic is appropriate for only nominal-ranked data.__ F __ Frequency counts are appropriate only for data in discrete ordinal form.__ T __ A correlation calculates a value in the range [-1,+1], rating the strength of the relationship between pairs of values from the same row. __ F __ A correlation close to zero means that both attributes should be ignored.__ T __ Linear regression attempts to fit a predictive line to the data that minimizes the y-distance between the line and the data points._ T ___ A visualization of frequency counts is often portrayed as a bar chart.__ T ___ Time series analyzes a sequence of measurements for its cycles and prediction of patterns.Below is an Excel chart comparing two variables. The left pyramid plots the values (1,2,4,8,16,32,64) and the right plots (0,10,20,30,40,50,60)[5 pts]Critique this graph on its presentation of the two data columns.3D inappropriate, only x-ytops of points are inaccurate, shadows on glyphs extra. 0 and 1 hard to distinguishpatterns hard to see that actually exist. Describe how a matrix of scatterplots, as demonstrated in Weka, can be used to explore a data set.[5 pts]allows quick comparison for all combinations of the attributes. Patterns emerge. ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download