Forget Big Data. First You’ve Got To Deal With Small Data

Image of lion and lion cub.
Unless companies get their small data house in order, when the waves of big data hit, instead of being in a position to exploit them, the waves kind of wash over them and they are left still struggling with their small data problem.

This content is part of a program managed by the Economist Intelligence Unit, sponsored by Accenture and Pegasystems. “Digital Evolution: Adapting Business for a Digital World” aims to shed light on the world of digital business—the strategic imperatives, challenges, complexities and pitfalls. All “Digital Evolution” content can be found here.

We all know that the “digital bread crumbs” consumers leave behind in social media and on the web offer valuable insights. When triangulated with other information, like location tracking or public records, businesses can discover previously unforeseen opportunities to grow revenues or pivot to capitalize on fertile areas of growth. This is the promise of big data: Lots of information mixed with technology and analysis will produce superpower-like abilities that can alter the very way we do business.

Before you can think about big data, you’ve got to be sure you’ve tackled small data, says Jim Griffeth, Thought Leader at Pegasystems. The emphasis on big data is premature for most large companies, he says, because the reality on the ground is that most companies are still struggling to make full use of their “small” data. Griffeth offers his insights on the persistent problem of small data, the untying of Gordian knots and what IT departments can do about their legacy systems mash-up problem.


Q: A daily deluge of headlines shout about how businesses can harness the promise of big data, but there’s just as much discussion about what’s standing in the way. What do you see as the biggest obstacle?

A: There’s an awful lot of talk about big data and it certainly is a gold mine of opportunity—but it’s not the most important problem that most companies have with data at the moment. The real problem that the largest businesses are still struggling with is how to make use of their small data—the proprietary data that they collect in the normal course of doing business. So, the first order of business is to get their small data house in order. Until they do so, they are likely to be awash in unusable data.

Q: How did this problem come about?

A: Most large companies have grown in part by a series of acquisitions over the last several decades. Each acquired company comes with its own IT infrastructure, including legacy systems of record and associated data stores amassed over the years. The other problem is that there’s this phenomenon where different parts of an organization make IT purchase decisions that satisfy their local needs, but aren’t necessarily made with the larger corporation in mind. So you end up with these organizational systems that work fine for their purposes, but don’t really talk to other systems or share data with other systems. You end up with a local optimization effect.

Q: How unruly is the problem?

A: Most companies can’t even tell you how many legacy systems they have in operation. But one of Pega’s customers actually did a study and found that the company had 651 unique applications deployed a total of 901 different times. The proliferation problem is really widespread. We have another customer in the insurance sector that had a different insurance processing system in each of the 100 countries in which it does business. As you can imagine, that kind of proliferation of data, systems and processes is causing huge headaches for these companies as they try to reach some kind of centralized approach to maintaining quality data. The result is that you end up with what I call “data chaos.” Their small data is a hodgepodge of different data models and data stores, semantics and data management processes all tied up in a nice Gordian knot.

Q: How do they untie that knot?

A: The big challenge with small data is overcoming the fact that the data is both created and stored all over your organization and it is organized in different ways in each place. Because the data has been created over many years, the focus of Master Data Management (MDM) efforts has been on cleaning up this mess instead of on fixing the problem at its source—the point at which master data is created in the system. This data creation happens in your business processes, which determine which data is collected, how it’s organized and where it is stored. Ensuring that data is accurate at the point of entry is, therefore, the first step to untying this knot.

What we see customers moving to now is an approach where they are building a new agile, flexible layer on top of their systems of record, including their MDM systems. This new layer has been referred to by Gartner as a “System of Innovation,” and it has some important capabilities that the legacy systems of record lack.

The first is Data and Process Governance in one system. This requires a complete and robust business process management (BPM) capability. Pega’s industry-leading BPM layer enables large customers to put in place governance processes that control how small data is handled throughout its life cycle—not just at the point of creation. With this approach, business stakeholders become active partners in the definition and development of data governance processes, so they can finally gain some measure of control over the data they need to run their business.

The second capability that distinguishes Pega as an Innovation Layer is the ability to tap into the data stored in siloed legacy systems. This includes both the ability to reach into the siloed systems themselves as well as the various data stores and data warehouses that might exist to grab data as well as write it back into those systems and data warehouses. So-called ETL (Extract, Transform, Load) systems have long performed this function of moving data around between legacy data stores, but without the same level of BPM capabilities demanded by enterprise-wide combined process and data governance.

The third capability that you need in your Innovation Layer is the ability to handle the kinds of complexity that you see at these large enterprises. Complexity comes from the myriad legitimate variations in the way you operate your business, like local language, currency, customs, and maintaining compliance with local laws and regulations. This complexity grows exponentially with the size of the business; accommodating these numbers and types of variations in one coherent application is what we refer to as managing “variation at scale.” The key here is to handle these variations in your business applications and your data management processes in a scalable, unified way. It’s much easier to develop, deploy and maintain your Innovation Layer when all your best practices can be quickly captured and easily changed. This way you avoid relying on the tribal knowledge of the three guys in the organization who know the secrets. Building all of this in a unified system of governance makes it possible for businesses to untie that Gordian knot.

Q: Can you give an example of a company that tamed its small data?

A: One of our banking clients wanted to improve and personalize its loyalty program that targeted its most profitable customers—the bank recognized that it had a gold mine in its existing small data if it could just access it and take appropriate action. As a result, the bank worked with Pega to create and manage a reporting system within its marketing domain. In this case, the company had about 14 million contacts, so it’s a big scale. The company was also looking to expand the scope of these programs. Its analytics people were trying to understand enough about these customers to make them the right kinds of offers so the bank could improve its loyalty programs. The bottom line is once the bank could clearly see what the data was saying and make offers that were based on this deeper understanding of each individual customer, the offer acceptance rates skyrocketed. Rates went from single digits to over 25 percent because the offers were personalized and targeted.

Q: So this is a bigger problem than the new streams of big data that IT departments are dealing with?

A: The valuable needles that are hidden in those big data haystacks are the hot new thing. It is a tidal wave of data, but the actual priority for these large organizations lies in getting small data right and exploiting that as much as possible. Small data is well-structured data—its semantics are well understood—so it is more easily leveraged to drive better decision-making. For example, the customer data domain in particular is an opportunity for companies to grow their business by getting control of their small data. They can use it to drive sophisticated predictive and adaptive analytics engines, which in turn help them to make much better offers to each customer. Unless companies get their small data house in order, when the waves of big data hit, instead of being in a position to exploit them, the waves kind of wash over them and they are left still struggling with their small data problem.

Q: So why is everyone talking about big data and not small data?

A: I think that big data is the sexy story, but the small data problem—that’s the one that’s real. Solve that first, and you build the foundation to start dealing with big data.


Learn more about Pega's commitment to digital transformation.

View video