Link

SolarWinds Alert Central open for beta

I got to see some of the SolarWinds Alert Central product in action last week… if I didn’t have three swamped weekends in a row I’d be setting up their new (and first fully featured free) product  now, but for now I’ll just say that it’s worth a look, especially if (like me) you’ve struggled with managing oncall/escalation for an alerting system, or just want to see a little stuffed lion on a video. 

I do intend to set it up and see how it works in my mindset, so if you’ve used it or plan to, feel free to chime in with your observations or wishes. 

#disclaimer: I do run a risk of winning a prize for mentioning this. But I was not paid for this posting, and if I didn’t think it was cool I wouldn’t mention it. 

Found Out There – 2013-02-03

I’m going to try to push myself toward blogging more by adding an every-two-weeks (or every week) list of interesting links I came up with over the past so long.

I don’t have ads on my site, partly because nobody’s begged me to take their money, and partly because I think the clean look is more enjoyable and requires less maintenance and tracking.

But if you’d like to help my geek gadget (and caffeine) habit, feel free to shop through my Amazon.com affiliate link or use the links on the lower right side of this page. I also occasionally put things in my Amazon aStore, and link to them here. Buying those things (or anything through those links) also gets me a few bucks that can go toward a pound of loose-leaf tea or a new hybrid hard drive. Thanks in advance for those of you who choose to do this.

So here are my interesting links for the week ending 2013-02-03.

Gratuitous self-links:

Other things I’ve found of interest:

What did you find this week, that you think I should be reading?

In pursuit of the other kind of Big Data(tm)

[I suppose I should mention by way of disclaimer that I work for a really big company. If you can think of a data warehousing, analytics, business intelligence, or even any technology product that exists, my employer probably uses it somewhere, or has, or will. Others in the org have written and/or presented about our technology choices in the not so distant past. So no big secrets being revealed here. I’d also note that I’m writing this as a sysadmin who had to solve a problem, not as a representative or spokesman for my company. I’m the former, not the latter.]

Almost ten years ago I got into what would eventually be branded Big Data. Back then it was Business Intelligence, or in my environment, just “batch”… I haven’t had a job since 2003 that didn’t involve BI/analytics/big data, and the last three have had a big focus on things around Hadoop and other intertwined modern data warehousing tools.

But there’s another kind of big data that predates mapreduce. And it’s bitten me three times in the last couple of months, while setting up a secondary site for an analytics unit.

Your mission, should you choose to accept it…

The task involved moving a couple hundred terabytes of data from different environments and applications (primarily Hadoop, MySQL, and Vertica, but some archival logs and such as well). Moving them across the country without a dedicated connection, mind you.

One of the early suggestions was something like this:

Just dump 300TB of data onto a USB drive and ship it. Sounds like a great idea, but a few problems arose.

1) They don’t really make 300Tb USB drives. One provider has some 20TB USB arrays. We could have done Thunderbolt or USB3 with add-on cards, but we’d still be limited by the fact that…

2) It takes a while to load data off a server’s disks onto some other sort of disk. I usually base my rough math on 20-40MByte/sec to pull data off a disk array. Add this to the time it takes to unload it on the other end, and you realize that “we can send it by Fedex overnight” isn’t exactly the speedy proposition it sounds like.

3) There were some policies that would have complicated the ship-overnight part. But those were the simplest to deal with, and not technological, so not really in my scope.

Treating the data in 50TB blocks, we had an estimate of 20 days to load a block onto external storage, whether it was USB external or NFS external or local attach fibre channel. We could have sped it up a bit, at the cost of the production infrastructure’s performance. We had to leave *some* of the disk throughput for the users/BI processes.

Somewhere out there…

So I set about plotting a path across the Internet

For the Vertica platform, we had easily-made backups that ended up being a huge pile of files on a couple of big servers. In this form, they were very easily separated into buckets, and transferred with rsync+ssh in parallel. Something like 30 streams at a time, peaking at 250MBytes/second = 900GBytes/hr. So roughly three days to transfer 50TB over the dirty Internet. Not too bad.

This method is reproducible, so that incremental backups can be transferred with the same structure. I used a few client systems and three source servers, so I limited the impact on either end from the transfer. It did wake up the network guys though, as I guess they weren’t used to seeing 2gbit coming out of a non-production network to the public Internet.

If you’re doing this in your world, some recommendations that may help.

1) Use the newest rsync available to you. The difference between 2.4.6 and 2.5.4 10 years ago was night and day when I used to have to move postgres databases around… and getting into a 3.x version will give you newer options and better performance.

1a) Make sure your openssh packages are current as well. If you have to do this with huge data or frequent transfers, build your own packages.

2) At least the first time, consider [ –wholefile ] (-W) since you will have no data on the other end and it will not try to check for changes.

3) After the first time, if you need to resync incrementals (or lose connections), check your file sizes before using [ –partial ] or -P. The “partial” check is great if you’re dealing with large files, but it does take longer to start up and I’ve seen some cases where it causes the connection to time out and the rsync to fail before it really gets started. If your files take less than a minute or two to transfer, you will lose more than you can gain from this option.

4) Check your ssh cipher. You will probably find “arcfour” to be a reasonably fast option, and depending on your version and compile options for ssh on both ends, you might get something even faster. Change the cipher used by including the rsync option [ -e “ssh -c arcfour” ] or whatever cipher you choose. This change will probably make more difference than whether you use [ –compress ] or “-z”.

5) Consider logging status to a text file so you can track your changes. Using the options [ –stats –progress ] will give you the most detail. The “progress” option shows transfer rate and speed for each file being transferred. The “stats” option gives you details at the end of the transfers.

6) If you’re going to do incrementals, you can run with -n and –stats to see how much needs to be transferred before running the real transfer.

I actually did not use compression, because I had lots of small files and a fast enough theoretical connection (and a rush on the proof of concept which ended up being the production method used). You should try on a subset of your data, comparing with and without. If you have time, also consider trying different ssh ciphers. You can find some examples of other people’s benchmarks (like here and here) but unless you have the same CPUs and connections they do, it’s just a slightly more educated guess.

But wait there’s more… soon

Since this post is getting a bit long, I’ll come back later and talk about the challenges of moving a 10TB single file, and an old version HDFS cluster, in this same adventure.

If you have thoughts on data migration challenges, I read (and often respond to) comments below. Join the discussion!

Some references:

Completely unrelated writeup about Hadoop at Disney, from Hadoop World 2011Slides and Video links from Cloudera (the links on Hadoop World are outdated, apparently)

OpenVPN’s Gigabit Networks Linux page, which talks about MTUs and Ciphers: http://community.openvpn.net/openvpn/wiki/Gigabit_Networks_Linux

An interesting blog entry by Ivan Zahariev on OpenSSH ciphers: http://blog.famzah.net/2010/06/11/openssh-ciphers-performance-benchmark/

Whoa, I did something interesting…

One of the greatest barriers to my blogging is that I have to find something I believe people will want to read about. I’ve set myself up to write about a couple of things, but decided they wouldn’t be as interesting after I poured them into a text box on the WordPress site. So you’ll see a few gaps in my timeline here, which is odd for someone who’s been accused of being “one of the bloggers.”

I’m working on some details on my recent data migration project, which I think will help a few people learn something through my efforts. Watch for the first installment this week.

I’m also planning to do some Fibre Channel SAN stuff in ways you don’t normally see on your VAR’s cut sheet.

And I’m coming up on the one year anniversary of the Andromedary Instinct store, hopefully with some store events to build on my eBay sales of the past year.

A lot of you keep coming back to look at my home lab post. I have actually rebuilt the NAS (went from FreeNAS 8.x to NexentaStor CE and then went to clearing off the living room coffee table for the holidays), and have not been doing as much with the ESXI 5.0 side of things either. But that should take off soon as well.

So stay tuned, I’ll try to get up to the point of a blog post a week, or at least one a month. I appreciate the comments and suggestions, and welcome your thoughts going forward.

 

[2013-06-28: Comments disabled since this page seems to be drawing the automated scourge of the earth. You can contact me through other pages or about.me/rnovak if you like.]

Tune in for Storage Field Day 2 live streaming

I’m live and direct at Storage Field Day 2 today and tomorrow.
Watch live streaming video from TechFieldDay at livestream.com