Hey folks!

Unfortunately, roughly 2 hours ago, lemm.ee went offline. The cause was our load balancer: it suddenly decided that all of our servers had become unhealthy, despite all health checks responding successfully when I requested them directly. In such cases, the load balancer stops serving all requests, effectively meaning that lemm.ee is unreachable for all users. I am still not sure what exactly caused the issue, but I will try to investigate more over the weekend.

For now, we have partially recovered, and I am continuing to work on remaining issues. Hopefully we will be back to 100% very soon. Sorry for the inconvenience!

  • don
    link
    fedilink
    2111 months ago

    I survived the July 18th lemm.ee downtime, and all I got was this lousy comment.

  • @[email protected]
    link
    fedilink
    English
    911 months ago

    Thanks for the quick fix! What did you have to do to get the load balancer working again?

    • @[email protected]OP
      link
      fedilink
      1611 months ago

      For now, I just redeployed all of our servers completely, but as I don’t know the actual root cause of the issue yet, I’m still investigating to figure out if anything more is needed.

  • @[email protected]
    link
    fedilink
    111 months ago

    Is there another instance where you could report issues?

    If we logged into another account, we’d be able to see those before it comes back up.

    • @[email protected]OP
      link
      fedilink
      911 months ago

      There are two useful sections on https://status.lemm.ee for this - firstly, there is an automated check for federation with all other instances on the bottom of the page, and everything there being red is a definite sign that something is wrong with lemm.ee itself. Secondly, near the top of that page, I will always write a status message manually when I discover & start work on any issues. This second part can have a bit of a delay, as it requires manual input from myself, but I have updated it every time we had any issues so far.

  • Venia Silente
    link
    fedilink
    English
    111 months ago

    Totally healthy servers have a right to rest every once in a while too. Thanks for keeping us notified!

    • TurtlePower
      link
      fedilink
      111 months ago

      Yeah, but it could have been China, India, Iran, or maybe even North Korea. There are a lot of places that think disrupting the rest of the world will get them somewhere.

  • @[email protected]
    link
    fedilink
    811 months ago

    Nginx? I had an nginx LB shit itself yesterday. Luckily it auto-recovered and I had HA but just weird it happened.

  • Clot
    link
    fedilink
    English
    711 months ago

    Sometimes, downtimes are awesome. Get off your machine and spend time with your family, folks!

  • db0
    link
    fedilink
    811 months ago

    Typically when this happens, the issue is on the LB itself. Maybe its own network had issues?

  • p3e7
    link
    fedilink
    1011 months ago

    Thanks for your great work and transperancy!

    • @[email protected]OP
      link
      fedilink
      2111 months ago

      Sorry for the delay in updating the status page - I actually had gone out for lunch just a few minutes before the downtime started, so I didn’t even realize anything was up until I was back at my computer about 45 minutes later 💀

      • @[email protected]
        link
        fedilink
        711 months ago

        no need to apologise. still a better response time, than some of the professionals I work with ;-)

  • SuperSpaceFan
    link
    fedilink
    English
    211 months ago

    Thank you for keeping us abreast of what’s happening. I appreciate you, and how you manage this instance.