Monday 2 February 2015

Are you feeling your customers’ pain?

I blogged recently about finally getting the first version of my Skype Voice Changer application out the door. In order to get something into the hands of users, I had to take a few shortcuts and make a few compromises. One of those was error handling. Basically, for the first version, if something went wrong I would pop up an apologetic dialog, and suggested that they emailed my support address with details of the problem (I included the stack trace in the dialog).

Of course, I got no reported errors at all via email, which meant that no one was having any problems with my app, right?

Automated Error Reporting

Well I knew that probably there were some issues, and so in the next version I added a button on my error dialog to offer the user the chance to submit an error report to a web API. Then my web server could simply drop the error report into Azure blob storage as a text file. Nice and simple (and VS2013 makes it really easy to check a blob container for new entries and look inside them).

Well, within an hour of going live with the new version, I saw three errors reported. And they kept on coming at a rate of about one or two an hour. Some of these were clearly the same user submitting the same problem multiple times, but it revealed that people were using my app, and it wasn't going smoothly for all of them.

Analysing the Errors

By analysing the errors it became apparent that I had some recurring faults. Several users failed to start up with a SocketException - clearly something was blocking them from opening the necessary ports. One user's soundcard refused to open. Some users could connect to Skype, but not determine which version of Skype they were connected to. One user couldn't initialise Media Foundation. One user's installation was bizarrely missing some of the included data files. Without automating error reporting from my app, I would have had no idea that these problems were occurring.

Improving Error Logging

So with error reports now flooding in thanks to my automated submit, I set to work about addressing them. In fact the first thing I did was to improve the error messages I was submitting to myself. They only included the stack trace, and so I upgraded them to include details of what version of Windows and Skype the user was running, and what operation they had been attempting.

Handling Anticipated Exceptions

It also meant I had the opportunity to write some more "defensive code" to handle "anticipated exceptions". In other words there are now several places in my code where I know things might go wrong. Thanks to the stack traces, I know what lines can blow up and with what exceptions. This allows me to specifically write code that anticipates those problems and either works around them, or presents the user with a much more friendly and meaningful message, allowing them to continue if at all possible. However, I still let them report these errors if they wanted, since I don't want to mask an ongoing problem.

Quick Bugfix Turnaround

So I quickly released another version, and sat there refreshing my error blob container to see if my fixes had worked. This time things were a bit more encouraging. In the next 24 hours I had just three errors reported. The improvements I made to error logging also proved extremely useful in helping me diagnose the cause of these issues. This is another benefit of automated error reporting - it allows me to have an extremely quick bugfix turnaround - I can resolve an issue a user is seeing and get an updated version of the app released within hours of the original problem.

Obviously I still don't have any information on how many people are successfully using my application - I'll need to ask my users permission to submit usage statistics in order to do that. But automated submitting of errors with good diagnostic information is an excellent starting point. What about your users? Are they suffering in silence, or are you feeling their pain?

No comments: