Some upcoming events worth a look

I haven’t been to my datacenter in over six months. I have a feeling the front desk folks at the Westin Casuarina are missing me by now. But I’m still on the move. Hopefully I’ll see some of you at one of the following events in the near future. 

VMworld US 2013

& Tech Field Day Roundtables at VMworld

This year’s VMworld is in San Francisco, just a 90-180 minute commute (each way) from where I live in Silicon Valley. Thanks to the gracious support of Gestalt IT’s Tech Field Day and the Tech Field Day Roundtable at VMworld sponsors, I’ll be camping in San Francisco and making the most of the opportunities during the week. 

Along with a dozen and a half other Tech Field Day delegates, I’ll be meeting with our friends from Asigra, Commvault, Infinio, and Simplivity. I’ve been to TFD sessions with all but Simplivity, but I’ve met Gabriel Chapman (@bacon_is_king) at the SV VMUG so they’re not strangers to me either (even if their “cube” is actually not cubical). 

In addition to the vExpert and VMware customer events, I’ll also be visiting friends from past Tech Field Day meetings, including Scale Computing, Nutanix, Zerto, Pure Storage, and Tintri. If I’ve missed anyone, feel free to touch base. 

Software Defined Data Center Symposium

Gestalt IT is hosting a full day SDDC symposium at Techmart in Santa Clara, a mere 10-15 minute commute for me. There’s still room to join us on Tuesday, September 10th, for a day of discussions about SDDC topics, featuring Greg Ferro, Jim Duffy, Ivan Peplnjak, and several leading vendors in the field. The event will set you back a mere $25 and that includes lunch. 

The Cloudera Sessions

This one actually has nothing to do with Gestalt IT, but if you’re deep into Hadoop, and Cloudera’s particular flavor of it, it’s definitely worth a visit. Cloudera hosts The Cloudera Sessions in cities around the United States, and I’ll be attending the San Francisco event on September 11th.

Several Cloudera technologists, from the system engineering manager to the co-founder/CTO will be talking about where the company is going and where Hadoop is going in the foreseeable future. This event will set you back $149, but if you are a current Cloudera customer, check with your account manager to see if you can get a discount. 

BayLISA At Joyent

The October 17 meeting of BayLISA, Silicon Valley and the San Francisco Bay Area’s oldest system administration group will be held in San Francisco at the headquarters of one of the most prominent Solaris technology companies, Joyent. We’re looking forward to hearing from Brendan Gregg about his new book, Systems Performance: Enterprise and the Cloud, as well as getting an update on Joyent’s Manta storage service.

Attendance is free, but space is limited. RSVP at the BayLISA Meetup site if you’re interested. 

IEEE Computer Society’s Rock Stars Of Big Data

As much as I hate the use of the term “rock stars” (since that’s not necessarily a compliment or a good thing), this event looks interesting. I’m not sure how useful it will be for technologists, but it’s worth a look. IEEE Computer Society is hosting their Rock Stars Of Big Data event at the Computer History Museum in Mountain View on October 29th. It will set you back $239 as an IEEECS member, or $299 without membership. Group discounts are available for registration of 3 or more people on one ticket. 

Mickey’s Not So Scary Halloween Party

Everyone deserves a bit of a break, and big data can wear a technologist out…. If you’re planning to be at the Magic Kingdom between September 10 and November 1, you should check out the Mickey’s Not So Scary Halloween Party. I went two years ago and it was pretty enjoyable. I do work for the Mouse, but I don’t get any benefit if you go. So I highly recommend it. 

 

 

Mohs’ law and big data (Hadoop is hard)

I’ve spent more time than usual the past two weeks talking with people, and listening to people, about Hadoop. I’ve been administering Hadoop clusters for (part of) a living for about 4-5 years now, and I’ve gotten pretty good at answering questions people don’t have, or want, answers for.

In the past week or so I’ve heard one vendor advocate that Hadoop gives you a free analytics environment with no need for expensive developers since it’s free software, and another vendor advocated that you can just virtualize Hadoop by putting lots of  datanodes on a single host and save lots of money. Easy peasy, right?

no-just-no

 I’m proposing we consider Mohs’ Law in this situation.

No, I’m not misspelling Moore’s Law, which tells us that compute power/efficiency will double every 24 months. I’m suggesting a law that’s more of a diamond in the rough, if you don’t mind.

Hadoop is hard. 

 It’s based on Friedrich Mohs developing a method of describing hardness of materials about 200 years ago. And it’s a great pun. But it’s also a reminder that “yum install” does not a production application make.

But Rob, I can get Hadoop in 15 minutes!

yellow-hadoop

It is pretty easy to get started with Hadoop. It’s even free of charge to get started (or even to go into production) with the platform itself. I recommend it. Go do it now. I’ll wait.

For starters, go grab the Cloudera QuickStart VM or the Hortonworks Sandbox VM from their respective websites. Pull it into your desktop virtualization platform of choice. Look at the docs. Run some of the tests. At that point you’re farther along than most people who promote Hadoop.

But at that point you don’t have a functioning business intelligence/data warehouse/analytics application environment, any more than installing Ubuntu 13.04 into VirtualBox gives you a production e-commerce site.

There’s still a lot of work to be done. Some of it is difficult, but a fair bit of it is just downright hard. Understand what you want to do, what data you can pull into your environment. Figure out what your customers/users/analysts need out of the data. Make sure you can validate the output. Automate all your tests. Go back to your data sources and make sure you’re getting all the data. Go back to your end users and make sure you’re giving them what they want. Lather, rinse, repeat.

Rob’s Corollaries to Mohs’ Law

If you remember nothing else, think about an analytics environment the way you would a monitoring environment. I’ve supported both for almost a decade, and the take-home I’ll save you ten years on is this:

Make sure you’re measuring what you think you’re measuring.

Make sure you’re measuring what you need to be measuring.

This rule also applies to a lot of other technology… customer surveys, dating sites, and so forth. But it takes formidable effort to get these two corollaries right (without coronaries), and even if you do throw together something with Insta-analytics.com (probably not a real site, not meant as an endorsement), they won’t be able to tell you what you need or whether you’re getting it.

So where do we go from here?

First of all, if you’re interested in getting familiar with Hadoop, go grab a VM above and give it a try. Simulate Pi Indiana-style. Grab a book and try some of the stuff it suggests.

Then, go talk to the BI team in your company, or the analyst who does performance dashboards when she’s not writing code and designing employee event signage and chasing your kids out of the server closet, or whoever. Find out what they’re doing.

And finally, unless your vendor makes its livelihood supporting Hadoop, don’t take their take on Hadoop as gospel. Apocrypha maybe, mistranslation at worst, and probably not enough to go on.

Hey, I’m in Silicon Valley and want to learn more, what can I do?

Funny you should ask.

BayLISA is hosting a Hadoop meeting on Thursday, May 16, at Yahoo! in Sunnyvale. There’s a waiting list but it usually fades closer to the event. Come see Alan Gates of Hortonworks, Eric Sammer of Cloudera, and Ryan Orban of Nutanix talking about Hadoop innovations and how to get involved.  (Disclaimer: I am president of BayLISA, but I don’t get any profit or direct benefit if people come to the meetups.)

There’s also a Hadoop User Group meetup on Wednesday, May 15, although it’s a bit more suited to advanced users who are already familiar with the technology. Their waitlist is also a fair bit longer. But check it out and see if it fits your needs.

If you’re not in Silicon Valley, check Meetup for local groups, or see if one of the Hadoop vendors has local meetings or events you can attend. If you find one, feel free to add it in the comments here so other people will know where to look.