Betfair Apology & Explanation

Here we will notify you of important information

Moderator: 2020vision

Betfair Apology & Explanation

Postby MarkRussell » Sun Mar 13, 2011 5:24 pm

On Saturday March 12, from 14:05 through 20:01, Betfair's website failed. We know how frustrating this is for our customers and offer our sincere apologies to all affected.

We are now looking at ways in which we can make up Saturday’s events to you.

We are working as hard as possible to ensure Betfair offers as reliable a site as possible. In a normal week we make at least 15 changes to the Betfair website but we have resolved not to release any new products or features for the next seven days. This should give maximum stability throughout a busy week that includes the Cheltenham Festival, cricket World Cup and Champions League football.

Below is an explanation of what went wrong and what we have done to fix the issue.

When the website failed on Saturday, our first step was to disable Betfair for all our customers on the web, API and mobile services. Once we identified the actual problem, we determined that we needed our website "available" but with betting disallowed. We recovered the site internally around 18:00 and re-enabled betting as of 20:00 once we were certain it was stable.

Here is what actually happened:

After performing certain types of website changes, an issue developed that caused our servers to temporarily slow down, processing just one thing at a time (single threading) instead of thousands of user requests in parallel. This "single threading" behaviour was introduced some time ago to protect against occasional broken pages caused by serving content while it is changing. In tech speak, our servers weren't thread-safe on certain types of content changes.

This has been an operational concern for several weeks as our traffic has reached record volumes week after week. While we had several operational protections in place to limit these types of changes during peak load, we missed an important one. Every 15 minutes, an automated process was publishing exactly the type of content that triggers the issue described above. Yesterday we hit a tipping point as the web servers reached a point where it was taking longer than 15 minutes to complete their update - essentially rendering the servers unusable.

Then in an attempt to quickly shed load, we triggered a process to disable some of the computationally intensive features on the site. Unfortunately, the way this was done triggered a complete recompile of every page on our site, for every user, in every locale. Under our normal Saturday usage, recovery took several hours.

After spotting the pattern, we've recognised this has been going on with varying impact since February 8, 2011. During periods of increased user traffic, our customers would experience this issue in the form of slow navigation or a "sticky" user experience. Yesterday was simply a tipping point, made worse by our recovery attempt.

We've fixed this problem now. We've disabled the original automated job and rebuilt it to update content safely. We've tripled the capacity of our web server farm to spread our load even more thinly. We've fixed our process for disabling features so that we won't make things worse. We've updated our operational processes and introduced a whole new raft of monitoring to spot this type of issue. We've also isolated the underlying web server issue so that we can change our content at will without triggering the switch to single-threading.

We believe these changes will bring the stability we all desire and thank you for your continued custom.

Yours faithfully,

Niall Wass – Chief Marketing & Development Officer
Tony McAlister – Chief Technology Officer
User avatar
MarkRussell
Site Admin
 
Posts: 1787
Joined: Tue Feb 20, 2007 6:38 pm
Location: Birmingham

Postby doris_day » Sun Mar 13, 2011 5:27 pm

Great to see an apology. A step in the right direction.
'He was looking for the card so high and wild he'd never need to deal another' - Leonard Cohen
User avatar
doris_day
 
Posts: 967
Joined: Fri Nov 02, 2007 12:34 am

Postby six gun » Mon Mar 14, 2011 10:50 am

Fifteen changes a week.

If its not broken don't fix it.

The only change I've noticed that is better is the silks showing up for Meydan.
User avatar
six gun
 
Posts: 61
Joined: Thu Jan 28, 2010 3:02 am
Location: Birmingham

Postby GaryRussell » Mon Mar 14, 2011 10:54 am

Sounds like our release schedule some weeks :oops:
User avatar
GaryRussell
Site Admin
 
Posts: 9679
Joined: Fri Nov 18, 2005 8:09 pm
Location: Birmingham, UK

Postby MarkRussell » Mon Mar 14, 2011 4:41 pm

Betfair are having a Live ‘Site Outage’ forum chat – Monday 14th March 6pm to 7pm GMT

Following on from Saturday’s site issues, Tony McAlister (Chief Technology Officer) and Niall Wass (Chief Marketing & Development Officer) will be hosting a live Q&A session from 6pm until 7pm GMT tonight in the Forum Chat area of our Forum. You can find Forum Chat in Beta by clicking on the Announcements heading in the left-hand menu, and by selecting ‘Forum Chat’ in the left hand menu in Classic view Please e-mail your questions to LiveChat@betfair.com.

You can send any questions you have now, and responses will be posted tonight. Unfortunately it is not possible for us to respond to each email individually, but we will attempt to answer all questions raised via the live Q&A session
User avatar
MarkRussell
Site Admin
 
Posts: 1787
Joined: Tue Feb 20, 2007 6:38 pm
Location: Birmingham

Postby osknows » Mon Mar 14, 2011 5:31 pm

Q1: Do you still have a job?
Q2: How much Bonus did you earn this year?
User avatar
osknows
 
Posts: 946
Joined: Wed Jul 29, 2009 12:01 am

Postby warriorfullights » Thu Oct 06, 2011 11:33 am

They should also apologize to all the punter they have been tricked, including myself. I played, bet and trade a lot. Until now, I can see enough wins.. Well I guess it is just normal.
Image Happy Betting to all punters
warriorfullights
 
Posts: 13
Joined: Thu Oct 06, 2011 9:44 am
Location: UK


Return to Announcements

Who is online

Users browsing this forum: No registered users and 14 guests

Sports betting software from Gruss Software


The strength of Gruss Software is that it’s been designed by one of you, a frustrated sports punter, and then developed by listening to dozens of like-minded enthusiasts.

Gruss is owned and run by brothers Gary and Mark Russell. Gary discovered Betfair in 2004 and soon realised that using bespoke software to place bets was much more efficient than merely placing them through the website.

Gary built his own software and then enhanced its features after trialling it through other Betfair users and reacting to their improvement ideas, something that still happens today.

He started making a small monthly charge so he could work on it full-time and then recruited Mark to help develop the products and Gruss Software was born.

We think it’s the best of its kind and so do a lot of our customers. But you can never stand still in this game and we’ll continue to improve the software if any more great ideas emerge.