22 free tools for data visualization and analysis

Got data? These useful tools can turn it into informative, engaging graphics.

 OpenRefine (formerly Google Refine)

What it does: OpenRefine can be described as a spreadsheet on steroids for taking a first look at both text and numerical data. Like Excel, it can import and export data in a number of formats including tab- and comma-separated text files.

OpenRefineSCREENSHOT OF OPENREFINE SOFTWARE

OpenRefine helps clean messy data

OpenRefine features several built-in algorithms that find text items that should be grouped together. After importing your data, you can select edit cells --> cluster and edit and choose which algorithm you want to use. After OpenRefine runs, you decide whether to accept or reject each suggestion. For example, you could say yes to combining Microsoft and Microsoft Corp., but no to combining Coach Inc. with CQG Inc. If it's offering too few or too many suggestions, you can change the strength of the suggestion function.

There are also numerical options that offer quick and easy overviews of data distributions. This functionality can reveal anomalies that might be the result of data input errors -- such as $800,000 instead of $80,000 for a salary entry -- or it could expose inconsistencies, such as differences in the way compensation data is reported from entry to entry, with some showing, say, hourly wages and others showing weekly pay or yearly salaries.

Beyond data housekeeping, OpenRefine offers some useful analysis tools, such as sorting and filtering.

What's cool: Once you get used to which commands do what, this is a powerful tool for data manipulation and analysis that strikes a good balance between functionality and ease of use. The undo/redo list of every action you've taken lets you roll back when needed. You can also store command histories to run again. And text functions handle Java-syntax regular expressions, allowing you to look for patterns (such as, say, three numbers followed by two digits) as well as specific text strings and numbers.

Finally, while this is a browser-based application, it works with files on your desktop, so your data remains local.

Drawbacks: If you've got a large data set, carve out some time in your day to go through all of Refine's suggested changes, since it can take a while. And, depending on the data set, be prepared when looking for text items to merge: You're likely to get either a lot of false positives or missed problems -- or both.

Skill level: Advanced beginner. Knowledge of data analysis concepts is more important than technical prowess; power Excel users who understand data-cleaning needs should be comfortable with this.

Runs on: Windows, macOS X (if it appears to do nothing after loading on a Mac, point a browser manually to http://127.0.0.1:3333/ ), Linux

Learn more: These three screencasts give a good overview of why and how you'd use Refine; there's also fairly detailed documentation on GitHub.

Statistical analysis

Sometimes you need to combine graphical representation of your data with heftier numerical analysis.

The R Project for Statistical Computing

What it does: R started off life as a statistical analysis language with built-in support for graphics and handling certain common data formats such as spreadsheet-like rows and columns. Thousands of add-on packages later, it's also used for mapping, dashboards, interactive Web apps and more.

Free data analysis
The R Project for Statistical Computing provides a wide range of data analysis options.

What's cool: There is a great deal of functionality in R, including quite a number of visualization options as well as numerical and spatial analysis. And the R community is adding to the language all the time, as well as generally responsive and helpful. Disclosure: I'm a longtime fan.

Drawbacks: The fact that R runs on the command line means that users will have to take the time to learn which commands do what, and not all users will be comfortable with a text-only interface. Some still complain that the language is slow, although enthusiasts counter that this can usually be fixed with better code and enterprise-class big data tools such as Microsoft R Server.

Skill level: Intermediate to advanced. Comfort with command-line prompts and a knowledge of statistics are musts for the core application.

Runs on: Linux, macOS X, Unix, Windows

Learn more: Check out the Computerworld Beginner's Guide to R and our list of 60+ resources to improve your R skills.

Join the newsletter!

Or

Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.

Tags Google

More about AdvancedApacheAT&TClickExcelFacebookFentonGoogleHomeIBMInc.LinuxMATLABMicrosoftMITModernMySQLOpenOfficeQlikQuantumStanford UniversityToolkitTwitterWikipediaYahooZoho

Show Comments
[]