As big data gathers momentum, there's a big career opportunity building as well -- for professionals with the right qualifications, that is.
According to a report published last year by McKinsey & Co., the United States could face a shortage by 2018 of 140,000 to 190,000 people with "deep analytical talent" and of 1.5 million people capable of analyzing data in ways that enable business decisions.
Companies are, and will continue to be, looking for employees with a complex set of skills to tap big data's promise of competitive advantage, market watchers say. "There's no question that the number one requirement [for] enterprises that are serious about gaining a competitive advantage using data and analytics is going to be the talent to run that program," says Jack Phillips, CEO of the International Institute for Analytics (IIA), a research firm.
But what exactly constitutes "big data talent"? What are these jobs, and what skills do they require? What kind of background qualifies a person for a big data job? Computerworld took the pulse of some prominent players in the emerging field to determine an IT worker's place -- if any -- in the big data universe. Here's our take.
Buckets of skills
"There is no monolithic 'big data profession,'" says Sandeep Sacheti, former head of business risk and analytics at UBS Wealth Management. He was recently hired for a newly created position, vice president of customer insights and operational excellence, at Wolters Kluwer Corporate Legal Services.
Sacheti's new job is all about big data: using analytics to understand customers, develop new products and cut operational costs. In one project, the Wolters division that sells electronic billing services to law firms is using analytics to mine data it's gathered from its customers (with their permission) to create new products, including the Real Rate Report, which benchmarks law firm rates around the country.
Some big data job tiles contain neither the word "big" nor the word "data."
Sacheti is now both hiring from the outside and training internal staff for big data work. He thinks of big data jobs in terms of four "buckets of skillsets": Data scientist, data architect, data visualizer and data change agent (see Big data skills and titles for details).
But there are no standard titles -- others use different buckets and different skills. What one company calls a data analyst, for example, might be called something different at another firm, says John Reed, senior executive director at Robert Half Technology. And, as Sacheti's title demonstrates, some big data jobs contain neither the word "big" nor the word "data."
Some companies in search of qualified people come to the IIA for help, Phillips says. First they ask where to look for candidates. "Then they stop in their tracks and say, 'Wait, how do I know what I'm looking for?'" Phillips says.
"Everybody's asking, how do you identify these people? What skills do you look for? What is their degree?" echoes Greta Roberts, CEO of Talent Analytics Corp., which makes software designed to help employers correlate employees' skills and innate characteristics to business performance.
The skills most often mentioned in connection with big data jobs, say Roberts, Phillips and others, include math, statistics, data analysis, business analytics and even natural language processing. And although not consistent, some titles, such as data scientist and data architect, are becoming more common.
Must have: Intense curiosity
As companies search for talent, they are looking more towards application developers and software engineers than to IT operations, says Josh Wills, senior director of data science at Cloudera, which sells and supports a commercial version of the open-source Hadoop framework for managing big data.
That's not to say IT operations are not needed in big data. After all, they build the infrastructure and enable the big data systems. "This is where the Hadoop guys come in," says D.J. Patil, data scientist in residence at Greylock Partners, a venture capital firm.
"Without these guys, you can't do anything. They are building incredible infrastructure, but they are not necessarily doing the analysis." IT staff can quickly and easily learn Hadoop through traditional classes or by teaching themselves, he notes. Burgeoning training programs at the major Hadoop vendors testify to the fact that many IT folks are doing so.
That said, most of the jobs emerging in big data require knowledge of programming and the ability to develop applications, as well as knowing how to meet business needs.
The most important qualifications for these positions are not academic degrees, certifications, job experience or titles. Rather, they seem to be the soft skills: a curious mind, the ability to communicate with non-technical people, a persistent -- even stubborn -- character and a strong creative bent.
Big data skills and titles
Without conventional titles, or even standard qualifications, it's hard to know what makes someone suitable for a big data job. This listing, based on interviews of big data experts and recruiters, attempts to match up some of the most common titles with the skills required.
- Data scientists: The top dogs in big data. This role is probably closest to what the McKinsey report calls "deep analytical talent." Some companies are creating high-level management positions for data scientists. Many of these people come out of math or traditional statistics. Some have backgrounds or degrees in artificial intelligence, natural language processing or data management.
- Data architects: Programmers who are good at working with messy data, disparate types of data, undefined data and lots of ambiguity. They may be people with traditional programming or business intelligence backgrounds, and are often familiar with statistics programs. They need the creativity and persistence to be able to harness the data in new ways to create new insights.
- Data visualizers: Technologists who translate analytics into information a business can use. They harness the data and put it in context, in layman's language, exploring what the data means and how it will impact the company. They need to be able to understand and communicate with all parts of the business, including C-level executives.
- Data change agents: People who drive changes in internal operations and processes based on data analytics. They may come from a Six Sigma background, but also have the communications skills to translate jargon into terms others can understand.
- Data engineer/operators: The designers, builders and managers of the big data infrastructure. They develop the architecture that helps analyze and supply data in the way the business needs, and make sure systems are performing smoothly.
"The people who do the best are those that have an intense curiosity," says Patil, whom Forbes magazine credited, along with Cloudera founder Jeff Hammerbacher, with inventing the term data scientist. Previously Patil worked at LinkedIn -- his titles included head of data products, chief scientist and chief security officer -- helping develop that company's data science team and strategy.
Patil has a Ph.D. in applied mathematics. Sacheti has a Ph.D. in agricultural and resource economics. And yet, the qualities of curiosity and creativity matter more than the level and type of academic credential, Patil says. "These are people who fit at the intersection of multiple domains," he says. "They have to take ideas from one field and apply them to another field, and they have to be comfortable with ambiguity."
The people who do the best [in big data] are those that have an intense curiosity. D.J. Patil, Greylock Partners
Cloudera's Wills, for example, took a circuitous path to become a data scientist. After graduating from Duke University with a bachelor's degree in math, he pursued a graduate degree in operations research at the University of Texas on and off, while working for a series of companies, dropping out to take a job at Google in 2007. (He did eventually complete that master's degree, he points out.) Wills worked at Google as a statistician and then as a software engineer before moving to Cloudera and assuming his data science title.
In short, big data folks seem to be jacks of all trades and masters of none, Wills says. "You can take someone who maybe is not the world's greatest software engineer, [nor] the world's greatest statistician -- but they have the communications skills to talk to people on both sides" as well as to the marketing team and the C-level executives. Their biggest skill is in serving as the "glue" in an organization, and most organizations have them, he says.
"These are people who cut across IT, software development, app development and analytics." Wills thinks such people are rising in prominence at companies. "I'm seeing a shift in value that companies are assigning to these people."
Sacheti, too, keeps his eye out for such people internally. "We are finding there are a lot more who are flexible in learning new skills, willing to do iterative design and agile thinking," he says.
In an attempt to hone in on the career paths of big data professionals, IIA and Talent Analytics recently completed an online poll that aims to quantify not only the skills and academic degrees of current data professionals, but also their emotional and personal characteristics. Results are expected by year's end and will be available to HR professionals for a fee.
"In some cases the innate characteristics of people, like a predisposition to curiosity, can be more predictive of someone's performance in a role than them having a degree in, say, IT or IS or CS," says Talent Analytics' Roberts.
Wanted: A relentless, scientific temperament
Until the recent past, creativity, curiosity and communications skills have not typically been emphasized in IT departments, which may be why most sources said they weren't looking to their operations IT staff to spearhead big data projects.
IIA sees data science as resting on three legs: technological (IT, systems, hardware and software), quantitative (statistics, math, modeling, algorithms) and business (domain knowledge), according to Phillips. "The professionals we see that are successful come from the quantitative side," he says. "They know enough about the technology but they aren't running the technology. They rely on IT to give them the tools."
Big data also demands a scientific temperament, according to Wills. "When we talk about data science, it's really an experiment-driven process," he explains. "You're usually trying lots of different things, and you have to be OK with failure in a pretty big way." Wills speaks of a "certain kind of relentlessness you need in the personality of someone who does this kind of work."
When we talk about data science, it's really an experiment-driven process. You have to be OK with failure in a pretty big way. Josh Wills, Cloudera
They also have to be intellectually flexible enough to quickly change their assumptions and approach to a problem, says Brian Hopkins, a principal analyst at Forrester Research. "You can't limit yourself to one schema but [need to be comfortable] operating in an environment with multiple schemas or even no schemas."
That tends to be a different operating model than most IT people are used to, he says. "IT people coming out of a strong enterprise IT shop are going to perhaps be constrained a little bit in their ability to do things quickly and move fast and be agile," Hopkins says.
But hiring managers, once they find the right type of person, are usually willing to retrain that person to fill a big data role. At LinkedIn, says Patil, "we largely trained ourselves, because so much of this is open source," and he thinks most companies can do the same. "You can make these people" -- if they have the right personality, he says.
As for employees, certain IT folks would love to flex a more creative muscle in their jobs, and they may be able to segue into a big data career. If an IT worker is flexible, willing to learn new tools and has a bit of the artist in him, he can move into data architecture or even data visualization, says Sacheti.
For the certain subset of IT workers who "would relish the opportunity to show their creativity," big data carries big potential.
Frequent Computerworld contributor Tam Harbert is a Washington, D.C.-based writer specializing in technology, business and public policy.
Read more about big data in Computerworld's Big Data Topic Center.