What is Big data?
If I put some rice on the table and ask you to count the grains in as little time as possible. As you nears finishing, I will give you more rice to count and keep repeating this. As you are getting used to counting rice grains, I will throw some mustard seeds which are harder to separate and count. This I will say is Big Data representative of large volumes, velocity and variety of data.
It is coming from all the people, computers, phones, devices etc. things we know.
Big data is a term that describes the large volume of data coming on a daily basis.
Big data is exactly what it sounds like — a collection of data that’s so big it’s tough to process. The biggest problems with this much information is being able to store it, share it with other people, or figure out just what the heck those numbers mean. Usually with data, there are technology tools that can do all this work easily, but too much data is too big of a job for them.
Few facts shows how big the big data is:
1. Every second, on average, around 6,000 tweets are tweeted on Twitter, which corresponds to over 350,000 tweets sent per minute, 500 million tweets per day and around 200 billion tweets per year.
2. Data from Facebook is even more. More than 30 billion pieces of content shared every month.
3. Every minue, we send 204 million emails, generate 1.8 million facebook likes, send 278 thousand tweets & upload 2,00,000 photos to the facebook.
4. Google processes over 40,000 search queries per second on an average, making it 3.5 billion in a single day.
5. Around 100 hours of video uploaded to YouTube evry minute.
Now, that’s a huge amount of (unstructured) data. Its going to take hours to analyse, generate the reports and to obtain what you want out of it.
Partial of this data may be structured and partial unstructured. All of this abundant amount of data is not important for us. Big data can be analyzed for insights that lead to better decisions and strategic business moves.
Big data is defined by its 3 properties known as 3 Vs:
1. Volume
– Organizations collect data from a variety of sources like business transactions, social media and information from sensor data.
2. Velocity
– Data is coming at unprecedented speed and must be dealt with in a timely manner.
3. Variety
– Data comes in all types of formats – structured, unstructured, numeric data, text documents, email, video, audio, stock data and financial transactions.
The amount of data that’s being created and stored on a global level is increasing regularly, almost doubling each year. That means the potential to gain powerful insights from data is also large. Big data is getting used across domains like Retail, Finance, banking, education, stock market, manufacturing, healthcare, etc. For the next couple of decades, it is going to dominate business all over the globe.
Over 90% of data in world was created in past 2 years. It is estimated that Walmart collects more than 2.5 petabytes of data every hour from its customer transactions. A petabyte is one quadrillion bytes, or the equivalent of about 20 million filing cabinets’ worth of text.
Why We Use Big Data
“What do we want with all this information, anyway?”
– It’s a great way to spot trends, such as figuring out how many people in your town prefer chocolate ice cream over vanilla. This information can be very useful to an ice cream company that can’t decide whether to advertise for their chocolate ice cream cake or vanilla one. After all, why spend one hundred dollars on a vanilla ice cream ad and just $50 on the chocolate one when the data says most people prefer chocolate?
– Big data can be used to predict crime before they happen.
– By better integrating big data analytics into healthcare, industry could save upto $300 billion each year.
– Retailers could increase their margin by 60% if they utilize the big data properly.
Data Tells Us All Kinds of Things
Data can tell us what type of people are doing what, where the most puppy adoptions take place in each state, and what types of clothes, food or toys people prefer. This is really important for businesses who can use that information to make more money. Otherwise, they might be trying to sell a tricycle for little kids to a bunch of middle schoolers who want dirt or mountain bikes.
Few real world problems solved by Data scientist:
1. Identify nerve structures in ultrasound images of the neck.
2. Which hotel type a customer book so that company should recommend personalized hotel to user.
3. Optimize flight routes based on current weather and traffic.
4. Which customers are happy customers & which coupon they will use.
5. Forecast sales using store, promotion, and competitor data.
Future of Big data:
Big data industry is expected too grow from US $10.2 billion in 2013 to about US $54.3 billion by 2017 & Boom of IOT means lots of more sensors, more smart phones and hence more data.
According to the 2014 IDG Enterprise Big Data Research study, businesses will spend an average of $8 million on Big Data-related initiatives in 2014.
> According to Gartner, Big Data will drive $232 billion in spending through 2016.
> There will be a shortage of talent necessary for organizations to take advantage of Big Data. By 2018, the United States alone could face a shortage of 140,000 to 190,000 skilled workers with deep analytical skills as well as 1.5 million managers and analysts with the know-how to use Big Data analytics to make effective decisions.
“The goal is to turn data into information, and information into insight.” – Carly Fiorina