Scheduled Maintenance

Today, Friday evening, maintenance will take place starting at 9pm CEST (what time is that for me?). mite won’t be available for 30 to 45 minutes.

This maintenance is the consequence of this morning’s interruption. Our database server went down. To get mite back up asap during working day hours, we switched to our failover database server. This evening, our database will move to its former home, the more powerful main database server. The maintenance downtime is inevitable to avoid data corruption with 100% safety.

Again, and hopefully for the last time for a long long future: We ask for your understanding.

~~
Update: Maintenance went as planned. mite was unavailable for 21 minutes.

Julia in Tech talk

Today’s server problems

Since 10:37 CEST, mite is not available continuously due to server problems. We’re terribly sorry, please, excuse us! We’ll do everything to get mite up and running again as soon as possible. Please visit Twitter to get the newest information on this issue, we’ll update continuously.

~~
Update: Since 11:49 CEST, mite is available again after moving servers to another node. We’re continuing to watch super-closely. Details will follow. Again: so sorry for this interruption!

~~
Update II: In the meantime, we collaborated with our hoster and found the source of the problem: A server of another customer running on our hardware node went wild and “stole” essential ressources from our database server. First, our hoster should have taken measures against this. Second, we too should have identified this troublemaker earlier and safeguarded mite. Our apologies. This interruption was an avoidable one.

~~
Update III, July 27th, 10:15 CEST: Again, we’re having server problems—all bad things seem to come in pairs. Our database server is not running smoothly since 9:52 CEST. We’re so sorry for this rough ride. Please bear with us.

~~
Update IV, July 27th, 11:26 CEST: mite is stable again for now. We switched our database to a redundant failover server. Please visit Twitter to get the newest updates.

~~
Update V, July 27th, 17:01 CEST: To prevent another hiccup, maintenance will take place tonight.

Julia in Tech talk

Scheduled Maintenance

On Saturday, June 16th, maintenance will take place in our primary data center between 5:30am and 7:30am CEST (what time is that for me?). During this time frame, mite won’t be available for some very few minutes.

Tomorrow’s maintenance is the consequence of the last interruption. Our hoster will replace hardware (switches) with a model by a different manufacturer to ensure future stability. We ask for your understanding.

~~
Update: Maintenance went as planned. mite was unavailable for no more than three minutes.

Julia in Tech talk

Today's service interruption

Between 2:04pm and 2:33pm CEST, mite was down for all users due to a hardware failure in our primary data center. Redundant systems did not take over as planned. We’re so sorry for this interruption!

Collaborating with our hosting partner SysEleven, we’re analyzing this problem to prevent it from happening again, this goes without saying. Of course, your data was totally safe throughout this downtime.

This said, we’d like to take a moment to thank SysEleven and their technical team for their fast response—they were hands on within minutes. Also, we’d love to say thank you to the numerous users who got in touch via Twitter, mail, and chat. Your understanding means a lot to us. Although we cannot guarantee 100% uptime, you can count on us doing everything we can to reach that number. We won’t disappoint you.

Julia in Tech talk

Today's service interruption

Between 15:29 and 15:50 CEST, mite was not available for most users due to server problems on our side. We’re so sorry for this interruption!

You can bet on it: we’re not taking this lightly. We’re already investigating the root of the downtime to prevent this from happening again. mite should and will be stable again.

Julia in Tech talk

More powerful Excel & CSV export

Thumbs up: the busiest mite.account has created more than 100,000 time entries by now. But even if we put this spike aside, we’re seeing more and more accounts with five-digit data. Which is absolutely great—besides one fact: those huge figures brought the Excel/CSV export feature (which can be found under the tab »Reports => Time Entries«) down to their knees.

Time for a rebuild! Thanks to today’s update, even huge data sets can now be exported reliably. Furthermore, we were able to significantly accelerate the export. It is now up to three times faster. Ready, set, go, to your next 100,000 time entries!

Julia in Tech talk, New features

Scheduled Maintenance

Monday, January 23th, mite won’t be available between 0:15 am and ~0:45 am CET (what time is that for me?). We’ll move the service to new, more powerful servers. We ask for your understanding.

Update, January 23th: Maintenance went as planned.

Julia in Tech talk

Today’s service interruption

Since 21:21 CEST (what time is that for you?), mite is not available for some users due to a routing problem in our primary data center. We’re terribly sorry, please, excuse us! We’ll do everything to get mite up and running again as soon as possible. Please visit Twitter to get the newest information on this issue, we’ll update continuously.

~~
Update, 22:33 CEST: mite is back up for all users. Hardware problems at the data center were the reason for this outage, routing was at the heart of the problem. We are and will be working together with our hoster to understand this interruption in detail to prevent this from happening in the future. Again: we’re so sorry for causing you trouble!

Julia in Tech talk

Scheduled Maintenance: November 27th

Update 6:17 CET: Maintenance is completed, mite is happy to track your time again. Thanks so much for your huge patience, everybody! Please get in touch if you felt affected by this maintenance beyond the acceptable level – we’re really sorry for the delay.

Update 3:02 CET: Maintenance is taking longer than expected, we’re sorry!

~~
Tomorrow night, on November 27th between 1:00 and ~2:00 CET (what time is that for me?), mite won’t be available due to a move of our main servers to a more redundant server cage within our data center.

We don’t treat our promise lightly: this maintenance is one of the necessary measures that we’re taking from October’s downtimes. Tomorrow’s steps will help us to ensure a more stable mite in the future by putting redundant hardware in place. We ask for your understanding!

Julia in Tech talk

Last downtimes in detail

To put it mildly, we’re not satisfied with the current availability of mite. To be honest, we’re heavily frustrated. One hour of downtime on October 15th, fifteen minutes on the 19th and two hours during last night – that’s simply not the level of quality that mite is known for and that you can and should anticipate. We owe you. Not only another apology, but a detailed description of what went wrong and what we’re doing to prevent this from happening again.

What did happen?

Hardware failures in the data center caused all three outages, the app itself was and is running smoothly. The first failure wasn’t connected to the second and the third one. Bad luck and bad timing, it all came together.

On October 15th, an electricity problem occured in our primary data center, despite of redundant power systems being in place, of course. The power systems were undergoing maintenance, that’s when a switch between the two systems failed, due to a combination of a flawed documentation of the hardware supplier as well as a not perfect emergency plan. Power supply was recovered within half an hour, but the servers needed some more time to check all data and to resume their work properly.

The nightly outages on October 19th and 21th were caused by defect network switches. On the 19th, one of this switches broke. Within minutes, it was replaced. Yesterday night, two switches in one blade center by IBM failed simultaneously. Replacing the switches didn’t solve the problem. Servers had to be moved to another blade center, this took some more precious time.

What will be done about it?

Two notes upfront: one, no hardware will always work 100%, not in our data center and not in another one. That’ll simply not going to happen, that’s a reality we cannot change as much as we’d love to – but we can change how we deal with this reality. Two, our top priority is to assure that your data is totally safe, at any given point of time. To guarantee this guideline, we’ll even keep up with some more minutes of downtime, in case of doubt.

What we can do and will do, is a) throw light on every little failure to really understand it and therefore be able to prevent this from happening in the future, and b) enhance uptime by putting more redundancy in place.

In this particular case, after October 15th, the motor to switch between the different power systems was replaced. Plus, our hoster, the folks from the data center and the manufacturer of the systems have joined forces to clarify the error in the documentation and to fix it. Plus, they are discussing to implement another redundant power system on top of the existing one.

The network switches that caused the downtimes of October 19th and 21th will undergo a scheduled maintenance, probably during the next week. We’ll update as soon as we have more information.

At the moment, we’re thinking about how to add even more redundancy on our side, e.g. by adding further systems that could take over in case of a hardware failure.

On the bright side, we’d like to point out that we trust our primary hosting Partner, SysEleven, despite of those numerous downtimes. Monitoring informed us within a minute. Technicians were hands on within five minutes. CEO and head of IT updated us on an ongoing basis, in detail and in a transparent way. They are deeply sorry and definetely unsatisfied with the status quo, as well. They’ll focus on improving the current set-up during the rest of 2010, no new features will be taken on. All in all, their 10 years hosting history shows that this is not the norm, without a question.

Uptime of mite in 2010: 99,93%

Concluding, we’d like to talk about the bigger picture. We analyzed previous downtimes to help you put this into perspective.

From January 1st 2010 until today, mite was unexpectedly down for a total of 295 minutes. This is an uptime of 99,93%. Even if we included scheduled maintenance, mite was up for 99,89%, all in all.

The gap to 100,00% is not big, but not satisfying. We aim to be better than this. We’ll keep on improving every little detail to maximize uptime even further. Please, trust us: we will get better. If you’d like any further information: please, get in touch!

Julia in Tech talk