slackers, slack

Scheduled Downtime

At 3pm PST (6pm EST / 23:00 UTC) Thursday, January 11, 2018 we will take kadath offline for an operating system upgrade. During that time, all services will be offline.

Mail will be queued for future delivery, and DNS should failover to our remote secondary. We hope to have all services restored within an hour, and will post regular updates to Slack.

slackers, slack

Join Our Slack!

If you are following this community, and you use Slack, why not add the Workspace to your client! We’ll post updates there on a more regular basis.

Contact gripe@ to get an invitation, or reply to this post.

pilchuck, glass

Emergency Maintenance 6/28

Leng (the hardware on which everything runs virtually) has suffered a drive failure. It is still running (yay, RAID), but we will be replacing the drive tomorrow afternoon (PDT).

We’ll also take this opportunity to do some maintenance. The window should not be large, and DNS should be unaffected by the downtime. Mail will be queued and delivered upon restart.

Sorry for the inconvenience & the short notice, but we hope doing it this way will minimize impact.

slackers, slack

Planned Downtime 6/21

On Thursday, June 21st our colocation provider (DHP) will be physically moving between upstream providers. This will result in some significant downtime for us as we move our server.

The current schedule is for everything to go dark after 11am PDT and to be back before 7pm PDT.

We apologize for the inconvenience. During the downtime mail will be queued for later delivery, and web requests will all redirect to a “we are moving” page.


Posted via LiveJournal app for iPhone.

pilchuck, glass

Minor service disruption: 7pm PST

The virtual machines on which we’re running most of the services that make up are losing time. Some of them very badly. This would be only a minor annoyance, except that Kerberos (our authentication service) is finicky about times matching across servers.

We’ve identified the problem, and have a solution in place, but it requires changing kernel parameters (which is to say, a reboot). We’re planning on taking things down quickly around 7pm PST. We’ll do it in a cascade so that there is no interruption for mail delivery, and minimal disruption for other services.

Thanks for your patience.
slackers, slack

Unplanned Outage

We’ve been offline since about 8pm[*] last night. We hope to have everything restored early this afternoon[*]. There was a Supreme Court ruling about which you may have heard, and San Francisco overheated with excitement. Seriously, it was over 100°F on top of Potrero Hill. That NEVER happens!

Our colocation facility overheated in the sweltering temperatures. Fortunately, our server was nearer the bottom of the rack and (it is believed at this time) suffered no damage, but was merely knocked off the network[**]. In fact, we believe our server was only knocked offline later by the brouhaha of attempting to restore/keep other hardware running.

We’ll be using this opportunity to do a hardware upgrade (more memory) to help us move forward on a few projects that have been stalled. (Us?)

We thank you for your patience.

[*] All times Pacific “Morally-Correct” Time

[**] DHP wasn’t so lucky. They lost 2 drives and an entire external (RAID) disk enclosure.
slackers, slack

Final Migration Update

[I wrote an update on the train yesterday, and forgot to post it. My machine did its "I'm awake, but the screen is dark." thing this morning, and I can't find it anywhere. I'm afraid I may have saved it in /tmp, which gets cleared on boot.]

Everything is live and working.

Unfortunately, an office nyetwerk hiccup just as I was testing the mail queue drain caused me to be unable to prevent the entire queue from draining before the configuration was correct. (It didn't take long, the new machine is fast!)

I hadn't yet made a backup and forgot to set the configuration to soft-bounce, so I once again managed to lose everything in the queue. I feel wretched about this, but there's nothing I can do about it.

Worse, all of the senders received a bounce message with an obnoxious(ly cryptic) error message: "You are not me. Nice Try. FOAD." (This is because the configuration the messages tripped is normally reserved for spammers who claim to be on the internal network when they're clearly not)

I'm terribly sorry about the whole thing, this is hardly the auspicious beginning I'd hoped for. On the plus side, from here on out things should be MUCH more reliable (and fast!) from here on out.
slackers, slack

Transition Update

A quick update on the transition to the colo facility

Collapse )

That was an unnecessary play-by-play. The crux of it:

  1. IMAP server is running so mail can be read.
  2. Kerberos server is running, so you can log in to read said mail.
  3. AFS is not yet running. This affects:
    1. Mail delivery. Mail is being queued. This isn't an absolute requirement. We can deliver mail without AFS, but all mail will be delivered to the INBOX. If AFS isn't running this afternoon, I will remove it as a requirement.
    2. Web services. There's no way around this. The vast majority of our web docs are stored in AFS.
    3. Home directories in general.