Big Data

What Does Big Data Mean?

Big Data is a popular term used to describe the exponential growth and availability of data, both structured and unstructured. And big data may be as important to business – and society – as the Internet has become. Why? More data may lead to more accurate analyses. More accurate analyses may lead to more confident decision-making. And better decisions can mean greater operational efficiency, cost reductions, and reduced risk. [1]

Data volumes are growing and the pace of that growth is accelerating. Sensor data, log files, social media and other sources have emerged, bringing a volume, velocity, and variety of data that far outstrips traditional data warehousing approaches. Forward-looking organizations are harnessing these new sources in creative ways to achieve unprecedented value and competitive advantage. It’s not as simple as putting all of this data in one place. The real business value of these “big data” sources is always unlocked through specific use cases and applications. Those applications can vary widely across departments and industries. While there are interesting technical challenges associated with integrating and managing all of this data, organizations should first take the time to identify and crystallize the right use case or use cases for their own business needs. This is a critical first step to understanding the key business insights they stand to gain and the improved results they can achieve with those insights.

Big data analytics is the use of advanced analytic techniques against very large, diverse data sets that include different types such as structured/unstructured and streaming/batch, and different sizes from terabytes to zettabytes. Big data is a term applied to data sets whose size or type is beyond the ability of traditional relational databases to capture, manage, and process the data with low latency. And it has one or more of the following characteristics – high volume, high velocity, or high variety. Big data comes from sensors, devices, video/audio, networks, log files, transactional applications, web, and social media - much of it is generated in real time and on a very large scale. Analyzing big data allows analysts, researchers, and business users to make better and faster decisions using data that was previously inaccessible or unusable. Using advanced analytics techniques such as text analytics, machine learning, predictive analytics, data mining, statistics, and natural language processing, businesses can analyze previously untapped data sources independently or together with their existing enterprise data to gain new insights resulting in significantly better and faster decisions.[2]

Defining Big Data[3]

Big data typically refers to the following types of data:

  • Traditional enterprise data – includes customer information from CRM systems, transactional ERP data, web store transactions, and general ledger data.
  • Machine-generated /sensor data – includes Call Detail Records (“CDR”), weblogs, smart meters, manufacturing sensors, equipment logs (often referred to as digital exhaust), and trading systems data.
  • Social data – includes customer feedback streams, micro-blogging sites like Twitter, social media platforms like Facebook

Uses for Big Data[4]

IBM has conducted surveys, studied analysts’ findings, talked with more than 300 customers and prospects, and implemented hundreds of big data solutions. As a result, it has identified the top five high-value use cases, which could form the first steps into big data, as follows:

  1. Big data exploration: find, visualize and understand big data to improve decision making
  2. 360-degree view of the customer: enhance the existing customer view by incorporating internal and external information sources
  3. Security/intelligence extension: reduce risk, detect fraud and monitor security in real time
  4. Operations analysis: analyze a variety of machine data for better business results and operational efficiency
  5. Data warehouse augmentation: integrate big and traditional data warehouse capabilities to gain new business insights while optimizing the existing warehouse infrastructure.

The Importance of Big Data[5]

Big data analytics helps organizations harness their data and use it to identify new opportunities. That, in turn, leads to smarter business moves, more efficient operations, higher profits, and happier customers. In his report, Big Data in Big Companies, IIA Director of Research Tom Davenport interviewed more than 50 businesses to understand how they used big data. He found they got value in the following ways:

  • Cost reduction. Big data technologies such as Hadoop and cloud-based analytics bring significant cost advantages when it comes to storing large amounts of data – plus they can identify more efficient ways of doing business.
  • Faster, better decision making. With the speed of Hadoop and in-memory analytics, combined with the ability to analyze new sources of data, businesses are able to analyze information immediately – and make decisions based on what they’ve learned.
  • New products and services. With the ability to gauge customer needs and satisfaction through analytics comes the power to give customers what they want. Davenport points out that with big data analytics, more companies are creating new products to meet customers’ needs.

Big Data Analytics
source: SAS

Practical Big Data Benefits[6]

  • Dialogue with consumers: Today’s consumers are a tough nut to crack. They look around a lot before they buy, talk to their entire social network about their purchases, demand to be treated as unique, and want to be sincerely thanked for buying your products. Big Data allows you to profile these increasingly vocal and fickle little ‘tyrants’ in a far-reaching manner so that you can engage in an almost one-on-one, real-time conversation with them. This is not actually a luxury. If you don’t treat them like they want to, they will leave you in the blink of an eye. Just a small example: when any customer enters a bank, Big Data tools allow the clerk to check his/her profile in real-time and learn which relevant products or services (s)he might advise. Big Data will also have a key role to play in uniting the digital and physical shopping spheres: a retailer could suggest an offer on a mobile carrier, on the basis of a consumer indicating a certain need in social media.
  • Re-develop your products: Big Data can also help you understand how others perceive your products so that you can adapt them, or your marketing if need be. Analysis of unstructured social media text allows you to uncover the sentiments of your customers and even segment those in different geographical locations or among different demographic groups. On top of that, Big Data lets you test thousands of different variations of computer-aided designs in the blink of an eye so that you can check how minor changes in, for instance, material affect costs, lead times, and performance. You can then raise the efficiency of the production process accordingly.
  • Perform risk analysis: Success not only depends on how you run your company. Social and economic factors are crucial for your accomplishments as well. Predictive analytics, fueled by Big Data allows you to scan and analyze newspaper reports or social media feeds so that you permanently keep up to speed on the latest developments in your industry and its environment. Detailed health tests on your suppliers and customers are another goodie that comes with Big Data. This will allow you to take action when one of them is at risk of defaulting.
  • Keeping your data safe: You can map the entire data landscape across your company with Big Data tools, thus allowing you to analyze the threats that you face internally. You will be able to detect potentially sensitive information that is not protected in an appropriate manner and make sure it is stored according to regulatory requirements. With real-time Big Data analytics, you can, for example, flag up any situation where 16-digit numbers – potentially credit card data - are stored or emailed out and investigate accordingly.
  • Create new revenue streams: The insights that you gain from analyzing your market and its consumers with Big Data are not just valuable to you. You could sell them as non-personalized trend data to large industry players operating in the same segment as you and create a whole new revenue stream. One of the more impressive examples comes from Shazam, the song identification application. It helps record labels find out where music sub-cultures are arising by monitoring the use of its service, including the location data that mobile devices so conveniently provide. The record labels can then find and sign up promising new artists or remarket their existing ones accordingly.
  • Customize your website in real-time: Big Data analytics allows you to personalize the content or look and feel of your website in real-time to suit each consumer entering your website, depending on, for instance, their sex, nationality, or from where they ended up on your site. The best-known example is probably offering tailored recommendations: Amazon’s use of real-time, item-based, collaborative filtering (IBCF) to fuel its ‛Frequently bought together’ and ‛Customers who bought this item also bought’ features or LinkedIn suggesting ‛People you may know’ or ‛Companies you may want to follow’. And the approach works: Amazon generates about 20% more revenue via this method.
  • Reducing maintenance costs: Traditionally, factories estimate that a certain type of equipment is likely to wear out after so many years. Consequently, they replace every piece of that technology within that many years, even devices that have much more useful life left in them. Big Data tools do away with such unpractical and costly averages. The massive amounts of data that they access and use and their unequaled speed can spot failing grid devices and predict when they will give out. The result: a much more cost-effective replacement strategy for the utility and less downtime, as faulty devices are tracked a lot faster.
  • Offering tailored healthcare: We are living in a hyper-personalized world, but healthcare seems to be one of the last sectors still using generalized approaches. When someone is diagnosed with cancer they usually undergo one therapy, and if that doesn’t work, the doctors try another, etc. But what if a cancer patient could receive medication that is tailored to his individual genes? This would result in a better outcome, less cost, less frustration, and less fear. With human genome mapping and Big Data tools, it will soon be commonplace for everyone to have their genes mapped as part of their medical record. This brings medicine closer than ever to finding the genetic determinants that cause disease and developing drugs expressly tailored to treat those causes — in other words, personalized medicine.
  • Offering enterprise-wide insights: Previously, if business users needed to analyze large amounts of varied data, they had to ask their IT colleagues for help as they themselves lacked the technical skills for doing so. Often, by the time they received the requested information, it was no longer useful or even correct. With Big Data tools, the technical teams can do the groundwork and then build repeatability into algorithms for faster searches. In other words, they can develop systems and install interactive and dynamic visualization tools that allow business users to analyze, view, and benefit from the data.
  • Making our cities smarter: To help them deal with the consequences of their fast expansion, an increasing number of smart cities are indeed leveraging Big Data tools for the benefit of their citizens and the environment. The city of Oslo in Norway, for instance, reduced street lighting energy consumption by 62% with a smart solution. Since the Memphis Police Department started using predictive software in 2006, it has been able to reduce serious crime by 30 %. The city of Portland, Oregon, used technology to optimize the timing of its traffic signals and was able to eliminate more than 157,000 metric tonnes of CO2 emissions in just six years.

Big Data Challenges[7]

One of the reasons big data is so underutilized is that big data and big data technologies also present many challenges. One survey found that 55% of big data projects are never completed. This finding was repeated in a second survey, which found the majority of on-premises big data projects aren’t successful.

  • Scalability: With big data, it’s crucial to be able to scale up and down on demand. Many organizations fail to take into account how quickly a big data project can grow and evolve. Constantly pausing a project to add additional resources cuts into time for data analysis. Big data workloads also tend to be bursty, making it difficult to predict where resources should be allocated. The extent of this big data challenge varies by solution. A solution in the cloud will scale much easier and faster than an on-premises solution
  • Lack of Talent: Businesses are feeling the data talent shortage. Not only is there a shortage of data scientists, but successfully implementing a big data project requires a sophisticated team of developers, data scientists, and analysts who also have a sufficient amount of domain knowledge to identify valuable insights. Many big data vendors seek to overcome this big data challenge by providing their own educational resources or by providing the bulk of the management.
  • Hadoop is Hard: While Hadoop and the surrounding ecosystem of tools are lauded for their ability to handle massive volumes of structured and unstructured data, the software isn’t easy to manage or use. Since the technology is relatively new, many data professionals aren’t familiar with how to manage Hadoop. Add to that the fact that Hadoop frequently requires extensive internal resources to maintain, and many companies are left devoting most of their resources to the technology rather than to the actual big data problem they are trying to solve. In the survey mentioned above, 73% of respondents claimed to understand the big data platform was the most significant challenge of a big data project.
  • Actionable Insights: Having more data doesn’t necessarily lead to actionable insights. A key challenge for data science teams is to identify a clear business objective and the appropriate data sources to collect and analyze to meet that objective. The challenge doesn’t stop there, however. Once key patterns have been identified, businesses must be prepared to act and make necessary changes in order to derive business value from them.
  • Data quality: is not a new concern, but the ability to store every piece of data a business produces in its original form compounds the problem. Dirty data costs companies in the United States $600 billion every year. Common causes of dirty data that must be addressed include user input errors, duplicate data, and incorrect data linking. In addition to being meticulous at maintaining and cleaning data, big data algorithms can also be used to help clean data
  • Security: Keeping that vast lake of data security is another big data challenge. Specific challenges include:
    • User authentication for every team and team member accessing the data.
    • Restricting access based on a user’s need.
    • Recording data access histories and meeting other compliance regulations
    • Proper use of encryption on data in transit and at rest.
  • Cost Management: It’s difficult to project the cost of a big data project, and given how quickly they scale, can quickly eat up resources. The challenge lies in taking into account all costs of the project from acquiring new hardware to paying a cloud provider, to hiring additional personnel. Businesses pursuing on-premises projects must remember the cost of training, maintenance, and expansion. Big data in the cloud projects must carefully evaluate the service-level agreement with the provider to determine how users will be billed and if there will be any additional fees

See Also


Further Reading