Little
Data, Big Data and Very Big Data (VBD) or Big BS?
This
is an industry trends and perspective piece about big data and little data,
industry adoption and customer deployment.
If
you are in any way associated with information technology (IT), business,
scientific, media and entertainment computing or related areas, you may have
heard big data mentioned. Big data has been a popular buzzword bingo topic and
term for a couple of years now. Big data is being used to describe new and
emerging along with existing types of applications and information processing
tools and techniques.
I
routinely hear from different people or groups trying to define what is or is
not big data and all too often those are based on a particular product,
technology, service or application focus. Thus it should be no surprise that
those trying to police what is or is not big data will often do so based on
what their interest, sphere of influence, knowledge or experience and jobs
depend on.
Not
long ago while out travelling I ran into a person who told me that big data is
new data that did not exist just a few years ago. Turns out this person was
involved in geology so I was surprised that somebody in that field was not
aware of or working with geophysical, mapping, seismic and other legacy or
traditional big data. Turns out this person was basing his statements on what
he knew, heard, was told about or on sphere of
influence around a particular technology, tool or approach.
Fwiw,
if you have not figured out already, like cloud, virtualization and other
technology enabling tools and techniques, I tend to take a pragmatic approach
vs. becoming latched on to a particular bandwagon (for or against) per say.
Not
surprisingly there is confusion and debate about what is or is not big data
including if it only applies to new vs. existing and old data. As with any new
technology, technique or buzzword bingo topic theme, various parties will try
to place what is or is not under the definition to align with their needs,
goals and preferences. This is the case with big data where you can routinely
find proponents of Hadoop and Map reduce position big data as aligning with the
capabilities and usage scenarios of those related technologies for business and
other forms of analytics.
Not
surprisingly the granddaddy of all business analytics, data science and statistic
analysis number crunching is the Statistical Analysis Software (SAS) from the
SAS Institute. If these types of technology solutions and their peers define
what is big data then SAS (not to be confused with Serial Attached SCSI which
can be found on the back-end of big data storage solutions) can be considered
first generation big data analytics or Big Data 1.0 (BD1 ;) ). That means
Hadoop Map Reduce is Big Data 2.0 (BD2 ;) ;) ) if you
like, or dislike for that matter.
Funny
thing about some fans and proponents or surrogates of BD2 is that they may have
heard of BD1 like SAS with a limited understanding of what it is or how it is
or can be used. When I worked in IT as a performance and capacity planning
analyst focused on servers, storage, network hardware, software and
applications I used SAS to crunch various data streams of event, activity and
other data from diverse sources. This involved correlating data, running
various analytic algorithms on the data to determine response times,
availability, usage and other things in support of modeling, forecasting,
tuning and trouble shooting. Hmm, sound like first generation big data
analytics or Data Center Infrastructure Management (DCIM) and IT Service
Management (ITSM) to anybody?
Now
to be fair, comparing SAS, SPSS or any number of other BD1 generation tools to
Hadoop and Map Reduce or BD2 second generation tools is like comparing apples
to oranges, or apples to pears. Let’s move on as there is much more to what is
big data than simply focus around SAS or Hadoop.
This
is where some interesting discussions, debates or disagreements can occur
between those who latch onto or want to keep big data associated with being
something new and usually focused around their preferred tool or technology.
What results from these types of debates or disagreements is a missed
opportunity for organizations to realize that they might already be doing or
using a form of big data and thus have a familiarity and comfort zone with it.
By
having a familiarity or comfort zone vs. seeing big data as something new,
different, hype or full of FUD (or BS), an organization can be comfortable with
the term big data. Often after taking a step back and looking at big data
beyond the hype or fud, the reaction is along the lines of, oh yeah, now we get
it, sure, we are already doing something like that so let’s take a look at some
of the new tools and techniques to see how we can extend what we are doing.
Likewise
many organizations are doing big bandwidth already and may not realize it thinking
that is only what media and entertainment, government, technical or scientific
computing, high performance computing or high productivity computing (HPC)
does. I'm assuming that some of the big data and big bandwidth pundits will
disagree, however if in your environment you are doing many large backups,
archives, content distribution, or copying large amounts of data for different
purposes that consume big bandwidth and need big bandwidth solutions.
Yes
I know, that's apples to oranges and perhaps stretching
the limits of what is or can be called big bandwidth based on somebody's
definition, taxonomy or preference. Hopefully you get the point that there is
diversity across various environments as well as types of data and
applications, technologies, tools and techniques.
What
about little data then?
I
often say that if big data is getting all the marketing dollars to generate
industry adoption, then little data is generating all the revenue (and profit
or margin) dollars by customer deployment. While tools and technologies related
to Hadoop (or Haydoop if you are from HDS) are getting industry adoption
attention (e.g. marketing dollars being spent) revenues from customer
deployment are growing. If little data is databases and things not generally lumped
into the big data bucket, and if you think or perceive big data only to be
Hadoop map reduce based data, then does that mean all the large unstructured
non little data is then very big data or VBD?
For
further information visit: http://cloudcomputing.sys-con.com/node/2420582
No comments:
Post a Comment