[Apology: Formatting issues may have been resolved.]
Last month I managed to attend a big data event as an attendee, for the first time in nearly two years. One upside to being a big data storyteller is that you don’t get 3am oncall pages. Downsides at events like Strata include that you’re often working the booth, or preparing for and recovering from a presentation, or trying to convince vendors that your role doesn’t involve buying software and services for a Fortune 50 company personally.
Sure, I did give a brief booth presentation for my friends at MapR on both expo days at Strata SJ, but more time was spent catching up with the people in my ecosystem who I rarely see, learning what they’re doing new these days (or who they’re working for these days), and occasionally getting a no-BS perspective on a very rife-for-BS idea, product, or company.
One of the other upsides to not being a buyer anymore is that it is easy, and practical, to jettison the sales pitches and move on to the stories. I tell stories. I don’t deploy production environments anymore. And it’s refreshing to be able to look at things outside the sales pressure.
HADOOP IS OVER…
So the first point, which was teased at Strata NYC last fall, is that Hadoop is finally over…
I’m not referring to Mike Olson of Cloudera telling people at Strata NYC ’14 that this (2014-2015) would be the year that Hadoop disappears. Sure, that verbal equivalent of clickbait was memorable, and readily misinterpreted by many, but it turned out to be true for a substantial swath of the market. Hadoop for many ceased to be an exciting destination on its own, and became merely the infrastructure, the plumbing, the foundation for a big data and analytics ecosystem.
No, what I’m getting at is that Hadoop is finally over… a decade old. Our favorite yellow elephant-slash-data platform foundation is a tween now, and while there are still people completely new to the concept, and analysts mistaking anecdotes for data and data for proof, it’s not really sane to call it an emerging technology per se, or to suggest that deploying HDFS and MapReduce is a cutting edge business move.
If you’re new to the ecosystem, then welcome. Don’t let the above graf dissuade you–your own data journey will go at the speed of your business, even if it’s not bleeding edge. But keep a sense of proportion… Your ten node Hadoop cluster probably won’t set any records (unless you have ULLtradimms or something).
MAPREDUCE IS OVER…
Last year’s common mantra was that MapReduce was dead. Google moved away from it, the stories went, and thus it was no longer A Thing. And for all the companies I talked to who were Google-scale, this was true. But for the other N-3 companies, where N is the total number of companies doing big data, well…
You’re not Google, you’re not Facebook, you’re not Linkedin, and you’re probably not a three letter government agency. So don’t flush everything those firms have flushed just because they’ve flushed it. When you’re at that scale, if you ever are, you can do things in step with them.
Otherwise, obviously, the technology advancements they make will filter down to the world, just as Nutch, HDFS, MapReduce, PCIe flash storage, and other scale technologies. Use them when they fit your needs, not just when they fit analyst headlines. People didn’t jettison orange juice and innerspring mattresses when NASA made TANG and memory foam famous, after all. And each still have their uses decades later.
VISUALIZATION IS OVER….
One thing that came out in keynotes and in some booths this year was that visualization is, for lack of a less clickbaity term, dead. To be more specific, it’s going the way of the Hadoop core platform itself, eventually. You want to see what your data can tell you, but the focus in the next year or three is likely to be more automating the results of visualization and analytics in general. So maybe it would be better to say visualization is over the hump, so to speak. It hasn’t jumped the shark yet though.
Visualization will remain a strong play in the foreseeable future, but cutting edge businesses will move from the reactive monitoring focus many have today (and have had for decades with other forms of monitoring), to where the analytics system will act on analysis, doing what the NOC or other teams might have implemented manually in the past.
We’ve seen forms of this automation go horribly wrong, for example in the $23 million book listing on Amazon a few years ago — price competitiveness is good to act on, but you have to build sanity into the automation.
THIS POST IS OVER (temporarily)…
Stay tuned for part two of this post, perhaps a bit less clickbaity, and with a relatively almond-free topic to start.
Pingback: Live from Interop 2016: Wireless Big Data, #interop4things, and hats #rsts11 #interop #bigdata | rsts11 – Robert Novak on system administration
Pingback: Belated Post-Strata Thoughts 2: Logs and Logistics in Ops Analytics #bigdata #stratahadoop | rsts11 – Robert Novak on system administration