June 15, 2005

June 15, 2005.

Recruiting

To Gretchen:
recruiting successfully isn't only up to recruiters. The best
recruiting department in the world can't make people want to work at a
company that's moribund, that can't figure out how to ship a compelling upgrade to their flagship OS, or update their flagship database server more than once every five years, that has added tens of thousands of technical workers who aren't adding any dollars to the bottom line, and that constantly annoys twenty year veterans by playing Furniture Police games over what office furniture they are and aren't allowed to have. Summer interns at Fog Creek have better chairs, monitors, and computers than the most senior Microsoft programmers.

Recruiting has to be done at the Bill and Steve level, not at the
Gretchen level. No matter how good a recruiter you are, you can't
compensate for working at a company that people don't want to work for;
you can't compensate for being the target of eight years of fear and
loathing from the slashdot community, which very closely overlaps
community where you're trying to recruit. And you can't compensate for
the fact that a company with a market cap of $272 billion just ain't going to see their stock price go up. MSFT can grow by an entire Google every year and
still see less than 7% growth in earnings. You can be the best
recruiter in the world and the talent landscape is not going to look
very inviting if the executives at your company have spent the last
years focusing on cutting benefits, cutting off oxygen supplies, and cutting features from Longhorn.

Network Load Balancing Works

For the first time ever I was able to install today's round of
Microsoft patches on our web servers without bringing the sites down at
all. I'm very happy about this, since this was the main point of
upgrading the web farm.

We have two web servers, web1.fogcreek.com and web2.fogcreek.com,
each with their own IP address, but using a feature built into Windows
2003 called Network Load Balancing,
they both share the web site load using a third IP address, which I've
named webnlb.fogcreek.com. Whenever a request comes in on that shared
IP address, it is distributed to one of the web servers at random. If
requests come in from the same class C address range, those requests
will prefer to go to the same web server that previously served that
address range. So for the most part the same user will always go to the
same physical machine, if possible, so stateful web applications still
work even if the state is maintained on one computer.

I actually like the NLB system a bit more than using a dedicated
hardware load balancer. Here's why: there's no single point of failure.
If you have a hardware load balancer and that needs to be updated or
rebooted or if it fails, you're off the air. Whereas Windows NLB is
all-software and each server in the cluster is a peer, so any server
can die and the rest of the system stays up.

When I needed to install today's Windows updates, here's what I did:

  • Told WEB1 to drainstop. That means “finish serving any requests
    you're working on, but don't take any new requests.” This took three or
    four minutes before it flatlined; WEB2 silently picked up the entire
    load.
  • Installed the upgrades on WEB1 and rebooted it.
  • Repeat for WEB2, while WEB1 held up the entire load.

As far as I can tell nobody should have seen a single hiccup in the sites served from the new web farm.  [Joel on Software]

Leave a comment