If bad data is put into a data warehouse, companies risk what Tony Fisher, president and general manager at DataFlux, described as “code, load and explode.”
“If the data from a source system doesn’t meet the expected qualities for that data, the loading process may fail, causing the company to stop the load process and try again,” Fisher said, which leads to implementations that exceed time schedules and budgets.
DataFlux, based in Cary, N.C., provides data profiling, data quality and data integration technology to help companies inspect, correct and merge data prior to loading it into a data warehouse. Therefore, users can locate and resolve errors, helping improve the overall quality of data within the entire data warehouse system at a firm.
Corrupting the System
“Each year, a large number of data warehousing initiatives fail because of erroneous or incomplete data,” Fisher said. “Inaccurate or incomplete data can lead to faulty decisions once the data warehouse is polluted with bad data.”
This bad data costs companies billions. The Data Warehousing Institute, which provides research and business intelligence for the data warehousing industry, estimated that data quality problems cost U.S. businesses over US$600 billion a year.
Rene Marcotte, a senior architect at Collaborative Consulting in Woburn, Mass., told CRM Buyer that using data to forecast and improve operations requires a great deal of quality data. He said potential problems are often multiplied. “There is often little visibility into the long-term effects of failing to enter data or entering in the wrong data,” Marcotte said.
Despite the potential problems of bad data, Marcotte said data warehousing would be a “required component for all companies in the future.” He said companies should consider implementing data warehousing to consolidate business reporting. This would lead to increased ability to forecast revenue and would also help companies meet demands by regulatory bodies that are demanding better and faster information.
Marcotte said large companies often have more than 400 applications in their information technology portfolio. Up to 200 of the applications read or update product data. “Consolidation of this type of information into a master product involves a great deal of data and organizational challenges,” Marcotte said. “This often involves changes to processes, applications and incentive programs. These changes impact various people throughout an organization and the changes are not always well received.”
Focus on Quality
Many companies are spending millions of dollars on data warehousing, but many of these companies are not receiving optimum return on investments, said Scott Barnes, director of the data services practice at Collaborative Consulting. “Business professionals need to understand that data quality is not something IT can fix,” Barnes said.
“IT professionals can help identify problems and suggest new ways to eliminate data quality issues. However, the business must be willing to own data, and change its processes to ensure data accuracy.” Barnes said a company must agree on a common definition of customer, for example, and implement a process to establish a master list.
But cost and what Mike Blumberg, lead technical consultant at Collaborative Consulting, called “instant ROI gratification” are the two big hurdles facing companies that want to implement a data warehouse. Blumberg said the cost of implementation is related to the volume of a company’s data, business complexity, hardware and software requirements, as well as resources needed to fulfill those requirements.
“At first, these costs may seem prohibitive, but generally, those concerns are allayed when ROI realization begins,” Blumberg told CRM Buyer. “However, realization of ROI may take time, and is not always monetary, at least not directly.” He said long-term direct-monetary benefits might include the ability to modify company spending. Indirect-monetary benefits may include employees performing job functions more efficiently.
Gartner, a provider of research and analysis on the IT industry, estimates that 65 percent or more of IT departments that attempt to build a data warehouse will not properly coordinate their data warehousing and business intelligence in the next year.
Identifying the Problem
“It’s becoming increasingly more difficult for IT administrators to identify the bottleneck causing degraded business application performance,” said Tom Mulvehill, product line manager at Symantec, headquartered in Cupertino, Calif. Symantec helps customers manage the performance of data warehouse applications.
“Understanding how to best manage a data warehouse system can be key to maximizing performance, while simultaneously lowering IT costs. In addition, it is crucial for customers to understand how best to utilize the system to minimize inefficient process requests and not add more hardware to solve the problem,” Mulvehill said.
Mark Robinson, business intelligence practice manager at Greenbrier & Russel, a consultancy in Schaumburg, Ill., said that ensuring high data quality can be difficult in certain situations. For example, customer information that has rarely been updated requires remediation or enrichment. Plus, information for which the meaning has changed over time presents challenges in being able to compare history over time.
“Many companies have multiple sources for a common business dimension, like the item master list in manufacturing companies. All of these situations must be addressed in the data warehousing effort,” Robinson said. Plus, he said data integration involves transforming information from the current systems into different data models.
“These efforts require strong business sponsorship to be successful. The business executive sponsoring this direction needs to understand the tactical and strategic benefits of business intelligence. It is worthy of an enterprise investment, but the big picture must be understood so that unreasonable expectations of BI can be avoided,” Robinson said.
Maintaining Success
But even if all goes well during the implementation of a data warehousing system, companies might still face problems, said Jim Lee, vice president of product management at Princeton Softech, a data management software provider based in Princeton, N.J.
“One of the challenges is being too successful. The more useful the solution, the more users will be added. And the more users are added, the more data will be integrated into the warehouse. And to make the warehouse more useful, more data must be added,” Lee said. He said data warehouse projects have the potential to quickly expand beyond the projects original scope and transform a successful project into a failed project.
“One of the keys to success is to build a high-performance information repository that can meet the needs of many users and many queries. This results in the challenge to building a successful warehouse, with a lot of data, which is ever expanding and still maintaining the performance of the warehouse to make it useful,” Lee said.