Madison, Wisconsin is a wonderful city. I've spent the last seven or so years of my life here, first as a student, then as a member of the workforce. To me, it's the perfect mix of a bustling city and a charming town. I've met so many great people here, and for that I am thankful.
Unfortunately, this chapter of my life is coming to a close. I have been offered an amazing opportunity in my career, so in ten days (February 25th, 2012), I will be relocating to Amsterdam, the Netherlands with my girlfriend Michelle.
This was not an easy decision to make, but I'm extremely excited for all of the adventures awaiting Michelle and me. With that in mind, we started a new blog together: http://michelleandrob.in. Please check up on us every so often; we'll be writing about our travels abroad, as well as what it's like to life in Europe!
-Rob
My girlfriend Michelle and I were having breakfast yesterday, and the conversation turned to bookstores, libraries, and e-readers.
Bookstores and libraries are closing, and this saddens both of us not only because we like the feel of a physical book, but also because we're concerned about how people without access to e-readers and the Internet will have access to information. I think that bookstores and libraries both need to modernize if they're going to stay alive; I think they need to supplement their physical book collection with a digital book collection that users can borrow from, even if they're not physically present at the library.
Michelle agreed with the spirit of the idea, but countered by asking how these institutions would make sure that no one is pirating copies. Which made me think about my ideals with regards to technology: as e-readers and other similar devices become more widespread, I'd like them to become more and more open. To clarify, when I say open, I mean that all of the software running on the device is modifiable by the user, and the user can distribute those changes to others. However, if an e-reader is truly open, what prevents a sufficiently skilled user from pirating borrowed copies and distributing them? Obviously bookstores need to make money; businesses are started to be profitable, after all. As far as libraries are concerned, they will continue to be funded if they are used. So some protections must be implemented in order to make sure that businesses and libraries remain sustainable, right? And do these protections have to be based on closed-source software solutions?
I suppose what I mean to say is this: are these two goals compatible? Can a library distribute borrowed copies to a open device without fear of losing grip on their sustainability?
Right now, I'm working on a Perl script that needs to do some data analysis on a MySQL table with nearly three billion rows. I don't have to process every row, so I have a WHERE clause to use, and since the table is partitioned by month, I decided to process the data set chunked by partition. So the query to fetch a chunk looks something like this:
WHERE ...
However, this query is run against 50-something partitions, and the whole process can take some time. Wouldn't it be nice to get a notion of which chunk the script is retrieving?
Well, we could examine the output of SHOW FULL PROCESSLIST:
mysql> SHOW FULL PROCESSLIST; | Id | User | Host | db | Command | Time | State | Info | | 567973 | ***** | ***** | **** | Query | 0 | Sending data | SELECT value FROM Data WHERE ... AND sample_time BETWEEN '1296540000' AND '1298959199' |
Unfortunately for me, sample_time is an integer timestamp, and I can't calculate the date from a timestamp at the drop of a hat, nor can I easily determine which partition that timestamp range represents in my list of partitions. What I really want to know is which iteration of the partition-processing loop my script is on, and how many partitions that loop is iterating over.
So I changed my SQL statement in my script to this ($num and $total are, of course, Perl variables in my script):
/* $num/$total */ WHERE ...
Now when I run SHOW FULL PROCESSLIST, I get something a little more useful:
mysql> SHOW FULL PROCESSLIST; | Id | User | Host | db | Command | Time | State | Info | | 567973 | ***** | ***** | **** | Query | 0 | Sending data | /* 12/60 */ SELECT value FROM Data WHERE ... AND sample_time BETWEEN '1296540000' AND '1298959199' |
This technique is not MySQL-specific, as long as your DBMS has some way of viewing which queries are running.
This technique has the advantage of putting progress information somewhere hidden so you can retrieve it, in case you decide halfway through the script's execution that you should have provided the -v flag. Unfortunately, it is not without its disadvantages:
If you read my post on Adding Remote Shortcuts to Git, you may have found it useful for specifying shortcuts for easy cloning. In case you haven't read it, the summary is that to clone my linotify repository on Github, I need only type the following:
git clone hoelzro:linotify
This is because I set up an alias of sorts that translates hoelzro: to git@github.com/hoelzro/. This works very well,
but sometimes it might be nice for read-only traffic to use a protocol other than SSH. Maybe you're on a new machine and you forgot to
associate its public key with the remote side 1). Maybe you have ssh-agent configured to time out identities after a while,
and you don't want to type your password just to run git fetch. I can't tell you how many times I've typed git fetch, only to be prompted
for my key's password. How annoying! Wouldn't it be nice if I could tell Git to use the Git protocol for reading from a repository, and SSH for writing?
After delving into the git-config manpage a bit, I discovered another gem similar to insteadOf: pushInsteadOf. So I modified my .gitconfig to contain this 2):
[url "git@github.com:hoelzro/"] pushInsteadOf = hoelzro: [url "git://github.com/hoelzro/"] insteadOf = hoelzro:
Now when I run git fetch or git pull on my repostories, I never get prompted to enter my SSH key's password, since reading from my repository
is done via the Git wire protocol.
Do you often find yourself running a process that you know is going to take a while? Do you also find yourself checking the shell it's running in every five minutes to see if it's done? I do this fairly often, so what I used to do is something like this:
my-long-running-process; notify-send Complete "Your long-running process is complete"
This pops up a nice GUI notification letting me know my process is done. However, it has a few disadvantages. One is that it requires me to be at the machine I'm running the job on; sometimes I set up a job to run and leave. Another disadvantage is that I may miss the GUI notification if I get up from my desk to grab a cup of coffee or something.
I decided the best alternative would be to write a simple script that would notify me over XMPP that my job had completed. That way, my IM program would let me know I had a message when I got back to my laptop, and my phone would receive the message too. So now what I do is this:
my-long-running-process; notify-rob.pl "Your long-running process is complete"
That's nice, but if I'm running a process while I'm out and about and I'm interested in a summary of the data it outputs, I'd like that included in the message. So I added support for using standard input as the message. Let's say I want to know how long my process took:
(time my-long-running-process) 2>&1 | notify-rob.pl -i
Now when my-long-running-process completes, it sends a message with the duration of the job to my phone, as well as any chat clients I have running at the time.
Here's my notify-rob.pl script, with the Rob-specific bits removed. If you'd like to use it, you'll need the AnyEvent-XMPP distribution installed for Perl, and perl 5.10 or better.
#!/usr/bin/env perl use strict; use warnings; use feature 'say'; use AnyEvent::XMPP::IM::Connection; use AnyEvent::XMPP::IM::Message; use Getopt::Long; my $from_stdin = 0; my $from_file = 0; GetOptions( input => \$from_stdin, 'file=s' => \$from_file, ); my $body; if($from_stdin || $from_file) { my $fh; if($from_stdin) { $fh = \*STDIN; } else { } $body = do { <$fh>; }; } elsif(@ARGV) { } else { } my $cond = AnyEvent->condvar; my $conn = AnyEvent::XMPP::IM::Connection->new( jid => 'your source JID here', password => 'your password here', domain => 'gmail.com', # I use Google Talk for this; you can # remove the domain, host, port, and # old_style_ssl options if you use # a "regular" XMPP server host => 'talk.google.com', port => 5223, old_style_ssl => 1, ); my $timer; $conn->reg_cb(session_ready => sub { my $msg = AnyEvent::XMPP::IM::Message->new( to => 'your destination JID here', type => 'chat', body => $body, ); $msg->send($conn); $timer = AnyEvent->timer( after => 3, cb => sub { $cond->send }, ); }); $conn->reg_cb(error => sub { say $error->string; $cond->send; }); $conn->connect; $cond->recv;
Discussion