Category: Technology

ManageBac & OpenApply Downtime

April 14, 2014

ManageBac and OpenApply are down as of 4:52PM HKT (GMT+8). Our hosting service is experiencing a network issue. We are all-hands on deck and will have further updates soon, both here and on Twitter (@managebac). This downtime is unrelated to HeartBleed.

Our sincere apologies for the inconvenience caused.

[Edit] We are back up and running as of 5:29PM HKT (GMT+8). Total downtime lasted 37 minutes.

Our Response to HeartBleed

April 11, 2014

We’ve received increased questions recently about our response to the HeartBleed bug, which is a security vulnerability recently discovered in the OpenSSL software library. Many popular web apps, including ours, use OpenSSL.

We took immediate action on Tuesday, April 8 with an emergency security upgrade, which you can read more about here. During the update, we patched all servers, and changed all our SSL certificates and application encryption key.

For peace of mind, we do recommend that users change their passwords on all popular web apps, including ManageBac.

You can also check your school’s ManageBac account here to verify that no security holes remain:
http://filippo.io/Heartbleed/

Security issues will always be our top priority. Please don’t hesitate to call or email us if you have additional concerns about our response to HeartBleed.

Planned Server Downtime for Saturday, March 8

March 6, 2014

Our servers will be down for maintenance on Saturday, March 8 starting at 2pm HKT (GMT+8). The total duration of the maintenance will be two hours, but actual ManageBac service downtime is anticipated to last less than 30 minutes.

We apologize for the inconvenience caused. Please enjoy your weekend!

ManageBac Scheduled Downtime on Saturday, January 18

January 15, 2014

ManageBac servers will be down for the planned upgrade on Saturday, January 18 starting at 2pm HKT (GMT+8). The downtime is expected to last one hour.

ManageBac Scheduled Downtime on Saturday, December 28

December 23, 2013

We are getting new servers for Christmas! ManageBac will be down for the planned upgrade on Saturday, December 28 starting at 2pm HKT (GMT+8). The downtime will last 2-4 hours.

ManageBac Scheduled Downtime on Saturday, August 17

August 12, 2013

ManageBac will be inaccessible during a server move on Saturday, Aug 17 for 4 hours from 2 pm HKT to 6 pm HKT (GMT+8). The new server will be bigger, faster and better!

Enjoy your weekend, everyone.

Scheduled ManageBac Server Maintenance

June 26, 2013

ManageBac will be unavailable as we perform server maintenance this Saturday, June 29, from 2-5PM HKT (GMT+8). We apologize for any inconvenience caused.

Rackspace Downtimes – Full Incident Reports

June 21, 2013

Rackspace has sent us the full incident reports from their June 12 and June 20 downtimes. We have reposted them in entirety below.

—–

June 12th ORD Cloud Server Instability

At approximately 10:30 a.m. CDT, our cloud engineers were alerted to an issue impacting services for several thousand customers within our ORD1 data center.

This issue was caused when our Software Defined Network (SDN) cluster suffered cascading node failures, causing some customers to experience intermittent network connectivity, and in some cases extended service interruption, until approximately 4:30 pm CDT.

The controller node failures were caused by corrupted port data from Open vSwitch. The corrupted port data triggered a previously unidentified bug that caused nodes within the control cluster to crash repeatedly until the corrupted port data was identified and fixed. The cluster was repaired and customers began to come back online, with all residual effects eliminated by 4:30 p.m. CDT. The system is now stable, and we are working with our SDN vendor on a permanent fix.

Why did we experience issues within the APIs for both DFW and ORD?

While we were experiencing service degradation in the ORD region for Next Gen Cloud Servers; Rackspace also saw availability dips in both our ORD and DFW Next Gen APIs.

During this time, we experienced increased traffic in our Control Panel as customers began logging in to check their instances in ORD after the network degradation began. This caused additional load on the systems responsible for image management in both regions. Under the conditions of increased traffic, these particular databases became overloaded which translated to dips in API availability.

Recent performance monitoring for those systems identified queries that could be optimized and were already scheduled for an upcoming code release. In order to fully resolve the issues in both regions, the query portions of the scheduled code release were hot patched into the environments, which restored API stability for both regions.

We apologize for any inconvenience this may have caused you or your customers. If you have any further questions please feel free to contact a member of your support team.

Sincerely,

Rackspace

—-

June 20th ORD Service Interruption During Scheduled Maintenance

While performing a scheduled upgrade on the Software Defined Networking (SDN) control cluster for Next Gen Cloud Servers in our ORD datacenter, we experienced two issues that created downtime for our customers and forced us to unexpectedly extend the maintenance window.

The first issue occurred when a configuration sync flag did not fully apply to all hypervisors via the upgrade manager software deploying the cluster updates. This caused issues for customers ranging from intermittent packet loss to a few minutes of network disruption. The root cause of this problem was in the manual configuration of the automated deployment tool, not the underlying cloud network. Rackspace and vendor engineers immediately identified and fixed the issue by 3:45 AM CDT, within the original maintenance window.

During the maintenance wrap-up process, Rackspace engineers discovered a component of the network configuration that was inadvertently overwritten by the upgrade. That component of the network configuration was deployed fairly recently, on May 24th, 2013, and was necessary to ensure that customer server connectivity was maintained and new server provisioning succeeded. Rackspace made the choice to extend the maintenance window by one hour, fix the configuration and reboot the clusters. The clusters finished syncing by 5:30 AM and then the hypervisors were able to check back in for updated flows. Any residual customer impact was confirmed complete between 5:45 AM and 6:00AM. Had Rackspace closed the maintenance window, our customers would have been exposed to potential intermittent network instability and provisioning errors until the next maintenance window was scheduled.

Rackspace prides itself on the transparency of our communications. In this event, we did not live up to our standards. We believe the decision to extend the window was the right decision for our customers, but we did not clearly communicate the rationale for the decision in the manner our customers expect.

Stability and uptime are paramount to our customers and to Rackspace. We apologize for the issues and the manner in which communications were handled. We are reviewing all elements of our maintenance and incident management processes to ensure that these issues do not occur again. If you have any questions please contact a member of your support team.

RackspaceCloud – ManageBac Downtime #2: Post-mortem

June 20, 2013

We deeply regret that we experienced another downtime earlier today starting at 3:36am EDT (GMT-4).

The downtime was caused by maintenance performed by our hosting provider RackSpace Cloud. The original estimated downtime was anticipated to last only a few seconds to a few minutes, but was instead prolonged as their engineers uncovered issues with the central nodes.

In the end, ManageBac was down for a total of 2 hours and 4 minutes. We spent the majority of that time on the phone with RackSpace Cloud, receiving real-time updates from their team.

The issues are explained by RackSpace in more detail here:

https://status.rackspace.com/index/viewincidents?group=21&start=1371700800

We understand that this is a stressful time for our schools as the school year ends and reports are going out. We have personally reached out to the schools most affected by the downtime and encourage you to contact our support team with any questions or concerns.

Thank you again for your understanding and patience.

RackspaceCloud – ManageBac Downtime: Post-mortem

June 12, 2013

We are very sorry for the downtime earlier today at 11:49 am EST. ManageBac was down for just under one hour because our hosting provider, RackspaceCloud, had a network outage in their Chicago data center, where we are hosted.

We were on the phone with Rackspace from 11:51 am EST after receiving the Pingdom downtime alert. We will be following up with RackspaceCloud once they have provided full clarification on the issue.

The Rackspace network issue is explained in more detail here:

https://status.rackspace.com/index/viewincidents?group=21

Thank you for your understanding and we are sorry again for the inconvenience this has caused you. We take downtime very seriously and we strive to maintain high availability.



  • Integrated Information Systems
    for International Education

  • Office Address

    Faria Systems Inc.
    548 Market St. #40438
    San Francisco, CA 94104
    USA

  • Taipei Branch Office
    2/F, No. 3, Lane 27
    Section 4, Ren'Ai Road
    Da'an District, Taipei 106
    Taiwan (R.O.C.)

  • Telephone

      +1 866 297 7022

      +44 208 133 7489


    E-mail

  •   +852 8175 8152

      +61 02 8006 2335