Best Posts in Thread: Server Downtime log

  1. Gaw discord is my friend now

    In fact, they'll be down from now on for the next couple of days. I'll need the servers for extensive testing, but it will be worth it. Expect a huge update (including more servers) around the weekend, hopefully coinciding with Garry's Mod update 160. I wouldn't want anyone to be bored this Easter :V
     
    • Useful x 2
    • Like x 1
    • Wizard x 1
    • Informative x 1
    • Friendly x 1
    • List
  2. So here is what happened:

    • Hard disk corruption, possibly due to TF2 replays, possibly due to samsung f3 disks being awful
    • Rebooted several times in an attempt to get fsck to run properly - as usual it didn't, tends to require a manual pass to fix the kind of severe corruption where sectors die and then come back overwritten with 0's.
    • Begged for KVM
    • KVM was all purple (as posted in screenshot), though this is not too unusual
    • Rebooted with KVM hooked up, tried to get into recovery, 'no signal', thought nothing of it
    • Server just booted again normally, disregarded my attempts to access recovery
    • Tried to set up a fancy grub-reboot and grub-set-default config to let me boot into recovery without KVM
    • Used grub-update for the first time ever (and quickly discovered why I never use it, despite the way I do it being STRONGLY unrecommended by all documentation)
    • Rebooted, ended up in recovery successfully amazingly, but SSH server was broken
    • Power cycled about 100 times, no exaggeration, over the course of ~20 hours
    • Finally discovered it was actually booting properly each one of those times, had no idea as was essentially blind ('no signal' everywhere). Fucking good server to keep on trooping through all that
    • Reality is that grub-update had made LTS kernel the default, and the realtek network module in it is very dodgy, hence no network connection for almost 2 days.
    • Got GD staff to hook up a keyboard and vga monitor and follow my instructions go to into recovery and install a backup of my manually-written grub config that I had thankfully taken
    • Probably should have done this sooner, but it only just occurred to me that it might have been booting the LTS kernel.
    • Everything is instantly gravy, and all those power cycles seem to have forced fsck to run properly
    • Still waiting on KVM so that I can set up the grub system I was aiming for, as it would be extremely nice to have and make this whole situation 99% impossible to happen again (I wasn't doing it for nothing :P)

    Lesson semi-learned: do not mess with bootloaders without making 100% sure KVM is working :V
     
    • Wizard x 3
    • Like x 1
    • Funny x 1
    • Friendly x 1
    • Optimistic x 1
    • List
  3. team dgt, and aimed at me not gamingmasters

    recently cut off a couple of their suppliers, really rustled their jimmies
     
    • Informative Informative x 3
    • Agree Agree x 2
    • List
  4. Reag My name is an anagram for a reason

    Who would want to DDoS this shitty community, did someone get their nappy dirty?
     
    • Agree Agree x 4
    • Funny Funny x 1
    • Friendly Friendly x 1
    • Optimistic Optimistic x 1
    • List
  5. Squallkitty GM's #1 Streamer!

    • Informative Informative x 4
    • Like Like x 1
    • List
  6. Significant improvement in system temperature in the new datacenter if anyone cares:

    [​IMG]
    [​IMG]
     
    • Wizard Wizard x 3
    • Like Like x 2
    • List
  7. Rioter Mad man box wearing lunatic for President

    When I noticed the server had died again, I nearly cried! Just glad it's not the hardware issue I fixed.

    You are a Douglas dark
     
  8. Gaw discord is my friend now

    I tried updating when I saw Jack's post (which is essentially a few button clicks), but it got stuck or was already stuck at 0%. And it'd be a dick move on my part to try and spam Dark around his birthday.

    tl;dr servers down while we add eoc
     
    • Funny Funny x 2
    • Informative Informative x 2
    • List
  9. NomNom Chompski BURNING LOVEEEE

    All in favour of teaching @HellJack how to get the server up say aye
     
  10. Fixed:

    - HLstatsX
    - Mumble channel limit
    - Mumble sidebar widget (holy fucking shit the amount of hoops I had to jump through)
     
    • Funny Funny x 2
    • Like Like x 1
    • Wizard Wizard x 1
    • List
  11. Really not sure what is going on here - the way they are being turned off can only be performed manually by a manager

    Edit: Oh derp it's the thing to stop embarrassingly bad ping when we get ddosed being triggered by the internet being generally totally fucked atm
     
    Last edited: 16 Nov 2014
    • Wizard Wizard x 3
    • Funny Funny x 1
    • List
  12. so it turns out this was actually a deeper issue than i thought - offsite backups have been failing halfway through for 4 months :suicide:

    bloody good thing i checked this
     
    • Wizard Wizard x 3
    • Funny Funny x 1
    • List
  13. HellJack A message was delivered, and received.

    That would imply Mysteryem has emotions like the rest of us.
     
    • Funny Funny x 3
    • Like Like x 1
    • List
  14. Dan Chief Detective at GM Police HQ - Jagex #1 Fan!

    are someone's jimmies a little rustled?

    also lol
     
    • Funny Funny x 3
    • Like Like x 1
    • List
  15. 100 days uptime :dance:
     
    • Like Like x 2
    • Wizard Wizard x 2
    • Friendly Friendly x 1
    • List
  16. Just a heads up that I will be installing XenForo 1.2 on the forums sometime in the next couple of few days

    This is a major upgrade with a shitload of very nice user-facing changes, but expect some kinks that will need to be worked out
     
    • Like Like x 4
    • Sexual Tension Sexual Tension x 1
    • List
  17. 20 days uptime bitchessss
     
    • Wizard Wizard x 3
    • Funny Funny x 1
    • List
  18. If it ain't broke don't fix it you're a lucky cunt
     
    • Funny Funny x 3
    • Agree Agree x 1
    • Disagree Disagree x 1
    • Bad Spelling Bad Spelling x 1
    • List
  19. Raid recovery has finished and everything seems generally to be working better than ever

    Code:
    [Thu May 17 12:48:09 2012] md: md126: recovery done.
    [Thu May 17 12:48:09 2012] RAID1 conf printout:
    [Thu May 17 12:48:09 2012]  --- wd:2 rd:2
    [Thu May 17 12:48:09 2012]  disk 0, wo:0, o:1, dev:sdb3
    [Thu May 17 12:48:09 2012]  disk 1, wo:0, o:1, dev:sdc3
     
    
    [12:58:02 root@weeaboo ~]# mdadm --detail /dev/md126
    /dev/md126:
            Version : 1.2
      Creation Time : Fri Jul  1 23:51:07 2011
         Raid Level : raid1
         Array Size : 969826825 (924.90 GiB 993.10 GB)
      Used Dev Size : 969826825 (924.90 GiB 993.10 GB)
       Raid Devices : 2
      Total Devices : 2
        Persistence : Superblock is persistent
    
        Update Time : Thu May 17 12:58:17 2012
              State : active 
     Active Devices : 2
    Working Devices : 2
     Failed Devices : 0
      Spare Devices : 0
    
               Name : archiso:0
               UUID : e6b89d56:c6386d1b:15f75609:6052820b
             Events : 259822
    
        Number   Major   Minor   RaidDevice State
           2       8       19        0      active sync   /dev/sdb3
           3       8       35        1      active sync   /dev/sdc3
    Load average of 0.33 under average usage (normally 1.0-1.5), what is this madness
     
    • Wizard Wizard x 3
    • Like Like x 1
    • List