Jrskbt.files.wordpress.com



Unit 5UNIT – V Big Data VisualizationData visualizationData visualization is the graphical representation of information and data. By using visual elements like charts, graphs, and maps, data visualization tools provide an accessible way to see and understand trends, outliers, and patterns in data.Data visualization objectivesImproved InsightFaster Decision MakingInformation ProcessingTo illustrate or hide the dataTo make comparison between some statistical data or predict outcomes.To find pattern or relationship among the data.Challenges of Big Data VisualizationScalability and dynamics are two major challenges in visual analytics.The visualization-based methods take the challenges presented by the “four Vs” of big data and turn them into following opportunities [2].? Volume: The methods are developed to work with an immense number of datasets and enable to derive meaning from large volumes of data.? Variety: The methods are developed to combine as many data sources as needed.? Velocity: With the methods, businesses can replace batch processing with real-time stream processing.? Value: The methods not only enable users to create attractive infographics and heatmaps, but also create business value by gaining insights from big data.There are also following problems for big data visualization:???Visual noise: Most of the objects in dataset are too relative to each other. Users cannot divide them as separate objects on the screen.???Information loss: Reduction of visible data sets can be used, but leads to information loss.???Large image perception: Data visualization methods are not only limited by aspect ratio and resolution of device, but also by physical perception limits.???High rate of image change: Users observe data and cannot react to the number of data change or its intensity on display.???High performance requirements: It can be hardly noticed in static visualization because of lower visualization speed requirements--high performance requirement.Perceptual and interactive scalability are also challenges of big data visualization. Visualizing every data point can lead to over-plotting and may overwhelm users’ perceptual and cognitive capacities; reducing the data through sampling or filtering can elide interesting structures or outliers. Querying large data stores can result in high latency, disrupting fluent interaction [ HYPERLINK "" \l "Reference13" \o "Z. Liu, B. Jiangz and J. Heer, imMens: Real-time Visual Querying of Big Data, Eurographics Conference on Visualization (EuroVis) 2013, 32(3), 2013, pp. 421-430." 13].In Big Data applications, it is difficult to conduct data visualization because of the large size and high dimension of big data. Most of current Big Data visualization tools have poor performances in scalability, functionalities, and response time. Uncertainty can result in a great challenge to effective uncertainty-aware visualization and arise during a visual analytics process [ HYPERLINK "" \l "Reference5" \o "C.L. P. Chen, C.-Y. Zhang, Data-intensive applications, challenges, techniques and technologies: A survey on Big Data, Information Sciences, 275 (10), August 2014, pp. 314-347." 5].Potential solutions to some challenges or problems about visualization and big data were presented [ HYPERLINK "" \l "Reference14" \o "SAS Institute Inc., Five big data challenges and how to overcome them with visual analytics, Report, 2013, pp. 1-2." 14]:1.?Meeting the need for speed: One possible solution is hardware. Increased memory and powerful parallel processing can be used. Another method is putting data in-memory but using a grid computing approach, where many machines are used.2.?Understanding the data: One solution is to have the proper domain expertise in place.3.?Addressing data quality: It is necessary to ensure the data is clean through the process of data governance or information management.4.?Displaying meaningful results: One way is to cluster data into a higher-level view where smaller groups of data are visible and the data can be effectively visualized.5.?Dealing with outliers: Possible solutions are to remove the outliers from the data or create a separate chart for the mon general types of data visualization:ChartsTablesGraphsMapsInfographicsDashboardsMore specific examples of methods to visualize data:Area ChartBar ChartBox-and-whisker PlotsBubble CloudBullet GraphCartogramCircle ViewDot Distribution MapGantt ChartHeat MapHighlight TableHistogramMatrixNetworkPolar AreaRadial TreeScatter Plot (2D or 3D)StreamgraphText TablesTimelineTreemapWedge Stack GraphWord CloudAnd any mix-and-match combination in a dashboard!Data Visualization techniques /methodData VisualizationInformation VisualizationConcept VisualizationStrategic VisualizationMetaphor VisualizationCompound VisualizationTool used in Data Visualization / Propriety Data Visualization ToolsMultiple line graphsLine graphs are used for one dimensional data. On the horizontal axis (Ox) the values are not repeated (e.g., time orthe ordering of the table). The vertical axis (Oy) shows the values of the variable of interest. Multiple line graphs can be usedto show more than two variables or dimensions (x, y1, y2, y3, etc.).Wordle :Wordle is a toy for generating “word clouds” from text that you provide. The clouds give greater prominence to words that appear more frequently in the source text. You can tweak your clouds with different fonts, layouts, and color schemes. The images you create with Wordle are yours to use however you like. You can print them out, or save them to your own desktop to use as you wish.pie chartTools: most statistical and charting software, Many Eyes, Google Charts, Tableau Public, Google Fusion TablesImage created in Excel with randomized data.tree map :Tools: d3/Protovis, Many Eyes, Google Charts, Network Workbench/Sci2Image created by code in d3 "examples/treemap/" directory.bar chart, radial bar chartTools: most statistical and charting software, Many Eyes, Google Charts, Tableau Public, High Charts, Google Fusion TablesImage created in Excel with data from Anscombe's quartet.histogramTools: most statistical and charting software, Protovis, Many EyesImage:Pyrsmis. (2008). Black cherry tree BY-SA 3.0Tools for Multidimensional VisualizationsGoogle ChartsDisplay live data on your website.? Includes Introduction, Quick Start, and Chart Gallery for ideas.Many EyesAn experiment by IBM Research and the IBM Cognos software group.? View others' visualizations, upload your own data and create your own visualizations.Tableau PublicTableau Public is a free tool that "brings data to life" (according to their website). View others' visualizations or create your own.? Tutorial included.WeaveWeb-based Analysis and Visualization Environment is?designed to enable visualization of any available data.? WEAVE has a wide array of options for working with different data types.WordleGenerates “word clouds” from text that you provide. The clouds give greater prominence to words that appear more frequently in the source text. You can tweak your clouds with different fonts, layouts, and color schemes.What are Hierarchical Data Visualizations?Hierarchical Visualizations or Trees are collections of items with each item having a link to one parent item (except the root). Items and the links between parent and child can have multiple attributes. These can be applied to items and links. Tasks related to structural properties become interesting -? how many levels in the tree? or how many children does an item have?Examples include: dendrogram, phylogenetic tree, radial tree, hyperbolic tree, tree map, cone tree, radial hierarchy, and decision tree/flow chart. What do these look like?? They are among the light green elements on the Periodic Table of VisualizationTools for Hierarchical VisualizationsNetwork WorkbenchThis is a large-scale network analysis, modeling, and visualization toolkit for biomedical, social science and physics research.? It designs, evaluates, and operates a unique distributed, shared resources environment for large-scale network analysis, modeling, and visualization.ProvotisA graphical approach to visualization, Provotis composes custom views of data with simple marks such as bars and dots, and defines marks through dynamic properties that encode data. Protovis is mostly declarative and designed to be learned by example.? It is no longer under active development.d3.jsFrom the developers of Provotis, d3.js is a small, free JavaScript library for manipulating documents based on data.? Can produce choropleth, motion chart, hib plot, and fisheye distortion visualizations.Many EyesAn experiment by IBM Research and the IBM Cognos software group.? View others' visualizations, upload your own data and create your own visualizations.Open-Source Data Visualization Tools1. CandelaCandela is a data visualization package made available through the Resonant platform. Candela separates itself from other tools by providing a full suite of data visualization components. The training documentation provides for a quick start for novices to get up to speed, and code can be used via JavaScript, Python, or R. Installation of Candela locally can be done via the latest public release package through a repository, though tool documentation suggests installing the package from source as it will allow for the latest development release.2. ChartedCharted is perhaps one of the easiest data visualization tools around, as it simply requires a link to a .csv file or a Google Sheets location; hit GO and Charted creates a visual display using a bar or line chart. According to the developers of Charted (created by the Product Science Team at Medium), the tool was built around three principles: it does not store data, does not transform data, and is not a formatting tool. It pulls data on a regular cadence (refreshes every 30 minutes) so changes made to the underlying sheet are always up-to-date in the chart. It also supports tab-delimited files and Dropbox links. Training? Non-existent, though neither is it required.3. DatawrapperDatawrapper is a tool that has been in existence since 2011 and is primarily used by journalists, though is comprehensive enough to be useful to any data scientist or researcher. In contrast to most of the tools profiled here, Datawrapper has free and paid versions. It’s also not technically open-source because no coding skills are needed. As the site home page explains, you simply cut & paste, visualize, and publish. Charts are interactive, meaning viewers can see underlying values, and the visualizations can also be embedded on a website. There is a wide range of charting options from simple bar charts to scatter plots, as well as mapping functionality.4. LeafletLeaflet is all about maps. In fact, it has no charting capabilities but touts itself as the “leading open-source JavaScript library for mobile-friendly interactive maps”. The tool provides for a variety of mapping layers, and interaction features such as zoom controls, and mouseover functionality. There is also customization capability such as map projections and easy CSS3 restyling. Additional features can be provided via plugins, and users can vote for additional plug-ins if one is not available. There are both basic tutorials such as a quick start guide as well as more advanced training for plugin development. Install files can be accessed through a repository (both stable and in-progress versions) as well as through source code.5. RawGraphsSimilar in some respects to Charted and Datawrapper, RawGraphs, whose tagline is?the?missing link between spreadsheets and data visualizations,?simply requires the user to either cut/paste data, upload, or provide a link to create a wide variety of charts. One feature that differentiates RawGraphs is that a number of unconventional visualization models are provided (e.g. sunburst, alluvial diagrams, dendrograms for hierarchical clustering, etc.). Don’t fret, novices – the usual suspects (bar, line, pie, scatter) are also included. For advanced users, new chart types can also be created. Visual creations can be exported as vector or raster images for display on your website, and the tutorials, while not extensive, can be completed quickly so you can get right to work on that visual magnum opus.6. Chartist.jsChartist.js is another JavaScript library that embodies its tagline as?Simple Responsive Charts.?Indeed. No waterfalls or boxplots here, but what Chartist.js loses in diversity it more than makes up for in customization. Style sheets (CSS) can be customized to a great degree in this tool with customization allowing for animation of visualizations, some using SVG. What is SVG? SVG is?scalable vector graphics, a format that allows for interactivity and animation, as well as being scalable (without loss of resolution quality). Chartist.js sees SVG as a cutting-edge technology, a vision apparently shared by others. There are some browser compatibility issues, but the site provides a concise table indicating compatible browsers.7. D3.jsD3.js is yet another JavaScript library that develops data visualizations through the use of html, svg, and css. D3 stands for?Data-Driven Documents, document here being a Document Object Model (DOM). The core idea behind D3.js is to leverage the full capability of the modern browser for the development of visualizations through web standards, without “tying yourself to a proprietary framework”. In terms of learning curve, this would be the polar opposite of other cut-and-paste tools, so D3.js is decidedly not for those that avoid the dreaded code moniker. That said, if you are looking for a tool that provides nearly unlimited functionality in terms of design creativity and charting options, D3.js might be just the ticket!8. PlotlyPlotly is another example of a tool that has both open-source and proprietary (paid) products, each tier containing its own functionality. Offerings can be grouped into two platforms (Plotly On-Premises and Plotly Cloud) with four primary business intelligence tools covering charting, dashboards, slide decks, and SQL client. The SQL client is free, while Plotly libraries are available as open-source through JavaScript, Python, and R. One of the oft-marketed features of Plotly (at least in the paid tools) includes the ability to collaborate and share data visualizations with other team members.9. PolymapsSimilar to Leaflet, and as the name suggests, Polymaps is a tool consisting of a JavaScript library for “making dynamic, interactive maps in modern web browsers”. Polymaps is another tool that leverages SVG functionality, facilitating styling through CSS, and allows for increased interactivity. Examples of mapping visualizations include general street layer mapping, chloropleth maps (for instance, comparing state-level data), population density, and even the use of k-means clustering.10. OpenHeatMapsIn the category of upload and create, OpenHeatMaps is a fairly basic tool that allows user to upload either a csv, excel, or Google Sheets file, and create a map instantly. OpenHeatMap can also be used by developers (as a JQuery plugin) to provide for mapping functionality within their own website. Users uploading a file for rendering are recommended to include a full street address in one field, with values represented in another field (for instance, housing value, sales price, number of employees, etc.). Geographies can be point-based (i.e. one address), or aggregates such as city, county, state, etc.11. DyGraphsDyGraphs claims as one of its primary features the ability to handle?huge data sets,?plotting millions of data points without “getting bogged down”. Another feature, for those who consider themselves stats nerds, is the ability to display error bars and/or confidence intervals. To use these, one standard deviation must be specified in the data file. The tutorial demonstrations are fairly basic but should serve to get someone started fairly quickly in creating their own visualizations.Analytical techniques used in big data visualization Tree (predication)RegressionLinear and LogisticClustering ( unsupervised)PredicationGroupingK – meansAssociation ruleUngrouping learningPredicationRelationshipRules/itemsPentahoReportingPentaho reporting depends on the JFreeReport project. It helps you to fulfill your business reporting needs. This component also offers both scheduled and on-demand report publishing in popular formats such as XLS, PDF, TXT, and HTML.AnalysisIt offers a wide range of analysis a wide range of features that includes a pivot table view. The tool provides enhanced GUI features (using Flash or SVG), integrated dashboard widgets, portal, and workflow integration.Moreover, Pentaho Spreadsheet Services allows a user to browse, pivot, and use chart from within MS Excel.DashboardsThe dashboard offers Reporting and Analysis, which contribute content to Pentaho Dashboards. The self-service dashboard designer includes extensive built-in dashboard templates and layout. It allows business users to build personalized dashboards with little training.Data MiningData mining tool discovers hidden patterns and indicators of future performance. It offers the most comprehensive set of machine learning algorithms from the Weka project, which includes clustering, decision trees, random forests, principal component analysis, neural networks.It allows you to view data graphically, interact with it programmatically, or use multiple data sources for reports, further analysis, and other processes.Pentaho Data IntegrationThis component is used to integrate data wherever it exists.Rich transformation library with over 150 out-of-the-box mapping objects.It supports a wide range of data source which includes more than 30 open source and proprietary database platforms, flat files. It also helps Big Data analytics with integration and management of Hadoop data.Who are using Pentaho BI?Pentaho BI is a widely used tool by may software professionals like:Open source software programsBusiness analyst and researcherCollege studentsBusiness intelligence councilor HYPERLINK "" Flare Data Visualization for the WebFlare is an ActionScript library for creating visualizations that run in the Adobe Flash Player. From basic charts and graphs to complex interactive graphics, the toolkit supports data management, visual encoding, animation, and interaction techniques. Even better, flare features a modular design that lets developers create customized visualization techniques without having to reinvent the wheel.View the demos and sample applications to see a few of the visualizations that flare makes it easy to build.To begin making your own visualizations, download flare and work through the tutorial. You should also get familiar with the API documentation. Need more help? Visit the help forum (you'll need a SourceForge login to post).Jasper Reports – Open Source Reporting ToolJasperReports is an open source reporting engine. It provides the ability to deliver rich content onto to the printer, the screen, or into various formats such as? PDF, HTML, XLS, RTF, ODT, CSV, TXT and XML files. It is a Java library and can be used in a variety of Java-enabled applications to generate dynamic content. Its main purpose is to help create page-oriented, ready-to-print documents in a simple and flexible manner. JasperReports can also be used to provide reporting capabilities in our applications.As it is not a standalone tool, it cannot be installed on its own. Instead, it is embedded into Java applications by including its library in the application's CLASSPATH.Dygraphsdygraphs is a fast, flexible open source JavaScript charting library.It allows users to explore and interpret dense data sets. Here's how it worksThe chart is interactive: you can mouse over to highlight individual values. You can click and drag to zoom. Double-clicking will zoom you back out. Shift-drag will pan. You can change the number and hit enter to adjust the averaging period.FeaturesHandles huge data sets: dygraphs plots millions of points without getting bogged down.Interactive out of the box: zoom, pan and mouseover are on by default.Strong support for error bars / confidence intervals.Highly customizable: using options and custom callbacks, you can make dygraphs do almost anything.dygraphs is works in all recent browsers. You can even pinch to zoom on mobile/tablet devices!There's an active community developing and supporting dygraphsDatameer Analystics solution and clouderaNodeBoxNodeBox is a node-based software application for generative design. It helps designers and everyone that uses it to automate boring productiong challenges, visualise large sets of data and manipulate the raw power of computer without using mechanical language of machines. The tools are able to integrate with traditional design applications and are cross platform.GephiGephi is an open-source software for visualizing and analysing large networks graphs. Gephi uses a 3D render engine to display graphs in real-time and speed up the exploration. You can use it to explore, analyse, spatialise, filter, cluterize, manipulate and export all types of graphs.Google Chart APIThe Google Chart API is an interactive Web service (now deprecated) that creates graphical charts from user-supplied data. Google servers create a PNG (Portable Network Graphics) image of a chart from data and formatting parameters specified by a user's HTTP request. The service supports a wide variety of chart information and formatting. Users may conveniently embed these charts in a Web page by using a simple image tag.FlotFlot is a javascript plotting library, it`s small, performance is good and supports all kinds of chart types. There are also plugins for Flot to use. There are many chart types available like Line chart、Pie chart、Bar chart、Area chart、Stacked chart, Flot also supports real time update chart and Ajax update chart, if you know little about javascript and jQuery, you could get started with Flot easily.Flot can handle hundreds of data points easily, even if you`re using real time update chart, to redraw the chart in every 100 milliseconds, still runs very fast. Flot can be run on IE、Firefox、Chrome、Safar and Opera. Because Flot uses HTML5 Canvas, if you`re using IE8 and below, you can use excanvas to make IE simulates HTML5 Canvas, this way makes Flot can work properly on IE.Visual.lyVisual.ly is a community platform for data visualization and infographics. (a clipped compound of "information" and "graphics"). ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download