Its 12AM. Lotus support has me on hold while they're whipping up a hotfix for me. Its story time.So its 9pm. Im coming home from class and I get paged. I reach for the pager with a sigh, and fumble between driving and the tiny lcd screen. "Who's got the shitty ISP tonight?", I ask myself. You see its a pavlovian response. 70% of our agents work remote, and 90% of their problems are ISP related. They know its their ISP, I know its their ISP, but if their modem was on fire they'd still page me about it.The dim blue backlight reveals the 10 character string of numbers. Thats odd. Its a numeric page. They paged me manually? As I swerve back and forth, glancing from pager to blackberry as i enter the number, my mind races with the possibilities. Are the term servers down? Cant be, I'd have heard from netiq by now. Then what? I hit the pager button again to turn the backlight on once more. Then it hits me.Notes.You know you were just talking about how that thing had been up for 6 months straight. And alphanumeric pages are sent via email when they page out the tech on call. Its gotta be notes. My fears are confirmed when I finally make the call.Inotes is down. And we cant access the shift report database."ok. Im about 5 minutes from home, i'll login and let you know what I find."It cant be notes, I say to myself. It must be the tunnel between the server colo where the term servers are, and the corporate datacenter that houses Domino. Yea, its gotta be a network thing I say in an attempt to comfort myself. Remember, we just set up that MPLS network. Maybe trav is screwing with it again. That server is solid as a rock. No way its down.I pull into the driveway at about 9:30. Get inside, and try to access the server from home. "see", i say. " it works fine from the internet, must be a tunnel thing". I term on to the cluster and fire up notes remotely. It connects and I open my mail."did it fix itself? what the hell?" I ping the server; the tunnel is obviously fine. I go back to my notes client and click into my inbox. Remote server is no longer responding.What, it was just there. I quickly term to the console of the server. Domino is recovering from a crash. I open the diagnostics folder and sort by date. 10+ NSDs in the last hour. All due to the full text indexer. All on different databases. This is bad. Very badThe server finishes recovery, and the database server starts up. For a while things look good. Then another crash. FATAL THREAD nUpdate.exe.DamnI remove the update task from startup, and speed dial lotus support. A nice lady takes down my info and creates my PMR. Im am thrusted into the queue for next support engineer.But this is strange, I haven't heard this hold music before, I think to myself as a flute instrumental version of the chorus of George Michael's Careless Whisper loops over and over.I wait on hold for 20 minutes. This song will be in my head for the next week. On the plus side, the domino server is still running.30 minutes. There's no way they're this busy this late at night. I cant be the only one with this issue. A calm indian voice breaks the music once again to inform me im in queue for the next available technician.45 minutes. "Hello this is Namrata with IBM support, can I have your PMR?"I give it to her.She asks for my sitrep and i explain to her the constant crashing. She asks for my NSDs, I send them. My suspicions are confirmed. Im about the 20th person shes talked to about this tonight. Something this widespread must be spam related I suggest. Its happening to several differnt mail databases, and the stack frames mention unsavory websites.Several debug parameters later we get ready to capture a crash. I fire the update task back up. BAM! NSD is running. Precious information fills the logs. I remove the update task, and restart the server again. Sending her the new NSD."Eric, can you check on a specific document in the database mentioned in the NSD?" I find the document using the ID she provides me. Definite spam. I attempt to open it in my own client.INVALID FREE POOL CHAIN. Notes crashes hard.Once my client is back up, I send her the specific document. She looks it over with her peers while I am on hold. The silence is broken, "This appears to be fixed in domino 7.0.3" "Is 7.0.3 out yet?""no, she replies. But we are going to build a hotfix for 7.0.2 that should fix it""great"I go back on hold for long enough for me to type out most of this."Eric, are you there?""Yes im here""We have a hotfix. I'm sending you the download link. You can apply this to any of your 7.0.2 FP2 servers on windows that are having the problem. If you have any questions send me and email""ok great, thanks alot really""your welcome, have a nice day"And with that she disapeared back into the queue. No doubt off to talk to the next person with the same issue.So here I sit. Waiting for the hot fix email. The Ultimate Question of Notes, the HotFix, and Everything.To patch, or not to patch?That is the question that preocupies my being.Lotus is not known for patch stability. But then they said this came from 7.0.3 It cant be that bad.I will have to test this on our backup server first. Users can live without new full text indexes for a few days. Long term stability is more important.One last email. An after action report for the support team. Then its time for bed.Atleast I haven't gotten paged during this.
9/5/2007 12:54:48 AM
will check back when you post cliff notes.
9/5/2007 1:05:25 AM
First, is this an email forward or not?
9/5/2007 1:15:09 AM
i never knew sysadmin was so exciting
9/5/2007 7:54:22 AM
9/5/2007 8:30:40 AM
cliff notes: I got bored while waiting on hold for Lotus Support to fix my server, so I decided to write up what happened.I thought I might be able to make it entertaining, but as I got tired it pretty much went to crap. On the plus note I can send an email that will crash your domino server.
9/5/2007 10:23:03 AM
Cliff Notes: Using Lotus Notes sucks. Administrating it is even worse.
9/5/2007 10:37:33 AM
Dude I fucking hate notes and at least the IBM people helped you. I typically get a different response.
9/14/2007 11:42:56 AM
Just when I thought I had purged the term "PMR" from memory you had to go and bring it back.
9/14/2007 1:17:00 PM