“The Shoe Test” – does this test really add value??

I recently read an e-book from Elisabeth Hendrickson, entitled “Exploratory Testing in an Agile Context” (definitely worth a read if you havent already). In this e-book, Elisabeth uses an example of a test that James Bach performed called “the shoe test”.

Is this a valuable test?
Is this a valuable test?

 

 

 

 

 

 

 
For those of you who have never heard of the shoe test, the idea is that when testing of a system, you take off your shoe (left or right – either will do!), and you place it on your keyboard… The anticipated outcome should be that the system handles the continuous input of multiple keys on the keyboard for the duration, and doesn’t blow up!
There are many questionable variables though:
– Should the shoe be strategically placed on the keyboard or randomly placed?
– For how long do you leave the shoe (seconds or minutes)?
– Does the size of shoe matter (a child’s shoe compared to my big size 12 boat shoes 🙂 ?

Elisabeth makes the point that “James doesn’t put a shoe on the keyboard because he’s trying to come up with wacky random stuff. He does it because he has noticed that some software exhibits bad behavior when it receives too many key inputs at one time”.

This is a valid point. It highlights that this test is actually a valid test. I get the fact that if you hold down a key for ages, or if you keep 18 keys pressed down at once, then the system might not handle that correctly. So this is a test that would potentially check for these types of errors, so it is valid…

I am still inclined to question this test though… Is it actually valuable?

I’ve spoken about this test to many people, speaking to developers, testers and even non-IT people about it to get their opinion about it.
There were many contrasting opinions about whether the test is valuable or not – most of the developers automatically produced the standard “no-one would ever do that!?!!” response (as expected). Even most of the non-IT people questioned whether I was serious and laughed. One developer even commented saying that the shoe test confirmed for them that “exploratory testing was just messing around”, which was infuriating…
It only seemed to be the testers that thought that the shoe test might add value in finding defects…

So this got me thinking about how performing tests such as this, or even just telling stories of tests such as the shoe test to non-testers might actually be making things harder for ourselves by damaging the reputation of ET.

Take, for example, you are communicating your testing with your customer. If you told them about the shoe test, what do you think they will say? And what kind of interpretation would they have of your testing?
Even if you did find a bug and you raised it, do you believe that it would be accepted by the developers and the project manager?

For me, the test itself is valid, like I said before, but it doesn’t seem valuable (or at least valuable enough for me to actually run the test over other tests). Additionally, it seems as though my peers and colleagues outside of the testing departments might not take me seriously with this test. In fact, it might give them a bad impression of exploratory testing…
Even the bugs found by the shoe test might be vetoed by all. After all, with the systems which I currently test, if a bug was found that was caused by holding 20 keys down on the keyboard for 3 minutes, then it would most likely never get fixed because of it being such an extremely rare situation…

But what do you think? Do you think this test is valuable? Would you ever perform this test? Or have you ever performed this test? How was it received?

29 thoughts on ““The Shoe Test” – does this test really add value??

  1. I agree with the sentiment – that a shoe test is often not useful for many contexts, although valid in some, I’d warn against the opinions of development or users of any test technique versus the opinion of testing experts, though. If you can defend your work (including the use of a shoe test for your context) then you should be able to communicate its value.

    Also a bug caused by holding down 20 keys on the keyboard for 3 minutes might be vitally important for others – for example testing the anti-ghosting on a new high-end gaming keyboard.

    Also you don’t have to actually use a shoe, nor do you have to leave it there. I’ve also heard Bach call it “any test consistent with pounding on the keyboard with a shoe”.

    I think shoe tests have their place. They’re good quick attacks to show stability in front-end inputs. I’ve used shoe tests to check web-app GUIs before, Their medium value is offset by the fact that they take little time to perform.

    Liked by 1 person

    1. Thanks for the comment!

      I agree – if a bug does occur, then it all depends on the context of what you are testing: what actually occurs, what is its severity in relation to the system and its conditions (how long it takes to occur and whether it is caused by 1 certain key, a certain combination of keys, or even just any key).

      It’s also a good point that you dont actually need a shoe. Infact Elisabeth mentions in her e-book that placing a cat on the keyboard, or handling the keyboard to a 2-year-old would result in similar behavior 🙂

      Like

  2. One of the engineers in my team is constantly looking for bugs in our software with (what others see as) ‘useless’ tests – Multitouch, orientation change ect. It’s often borked one of our apps and caused crashes.

    As unlikely as some of them are to actually occur, they’re still serious in the fact that if just one user experiences an issue with it, they’ll make themselves heard in the app/play store.

    Either way, it retracts from the end-user experience which is important to us. So – Valid test.

    Like

    1. Hi Daniel! thanks for the comment (and the retweets! 🙂
      Thats a good example of how the test is valuable for your app!

      It’d be interesting to hear if your value in the test would decline if the there wasnt a way for customers to rate of comment on your app so freely…

      If you knew the odds of it happening are extremely rare, and that if it did happen, then there was low risk of it being communicated to other users so freely, although the test is valid would you still value the test as highly?

      Like

  3. Yeh, I do it – and variations on it – see my blog post that also references The Shoe Test – http://spin.atomicobject.com/2011/08/24/snack-tests/

    I actually came across a variation of the Shoe Test myself – I had a book open and was reading it and it was pressing on a couple of keys and it crashed the program. If this crash happens to wipe out data isn’t it serious ? No user will press 20 keys for 3 minutes – but a user might drop a book onto the keyboard.
    Why not test at the extremes rather than ‘typical’ values ? Another project I worked on tested CSV import, it seemed OK then when it got released we found a customer was using files with millions of entries as he had a business use for it that no-one had ever considered

    Like

    1. Hey Phil! Thanks for the comment and the link!! I’ve also heard of the “Tea Test”, also known as the “Toilet Test”, which I think most testers might subconciously do anyway. 😀

      You are totally right – it could cause a serious bug to occur…

      So in a sense, does that mean that the value of the test in some way relates to the severity of defects (or possible defects) in a way?

      I agree that the test is valid. I have no doubt that it is possible that the test could find a bug. But I’m trying to relate this test to value for the stakeholders, in the same sense that you would relate quality to value for the stakeholders.

      Coming in from a different angle here: if there /is/ a situation where a user who is browsing a website does drop a book on the keyboard, leaves it there and a bug does occur, perhaps deleting the data in a field the browser was focussed on (lets use this comments field for example…) – from the user’s perspective would they be outraged at the way the system handled the book being on the keyboard? Or would they have been more upset at the situation of the book dropping on the keyboard in the first place?
      Additionally, from a perspective of a client, do you think that if they found out that they were spending money for someone to lay a book (or anything else) on a keyboard for 10-15 minutes, that they’d be happy?

      In other words – If it’s going to be an extremely rare occurrance, and the person paying for the system might think that the test is a waste of time or money AND the user might be frustrated but mainly frustrated at the situation rather than the system itself, then does this mean that this test is still valuable for us to run?

      Like

      1. Do you have tests that you could run that would be more valuable ? Would running the test give you valuable information ? Are you running the test because you read about in a book or because it might give you some insight into the system ? Why cant your system handle it ? Are there lower values what would cause the same problem ?Do your stakeholders care about the comment field of a blog – probably not. Do they care about the data entry into a financial app – quite likely. Trader on the Stock Exchange has too many champagne cocktails for lunch, comes back to his desk and falls asleep on the keyboard and commits a trade for 1000000000000000000000000000000000000000000000000000

        Like

      2. This is where context and risk has to come into play to determine the test’s value… If it is a financial or a mission critical system, then the risk is greater, so the test is more valuable to mitigate the risk.

        I guess this is what was missing… None of the systems that I have worked with have been such mission critial systems.
        In my case, with such tight timescales and fast turnarounds for websites or web-apps, there were a ton more valuable tests that needed to be run…

        I does appear to be an unspoken “priority of value” for tests, which is probably restricted by timescale… For me, taking into account the systems that I work on and the risk around the bugs that might occur around this shoe test, then it might be less of a priority to run.

        i agree that it might be a higher priority test to run in other systems with higher risk ascociated though! 🙂

        Like

      3. Hahaha!! Hilarious!
        That takes it to a new height for people that have a fear of flying AND a fear of sharks!

        I could only imagine the misery of having to reproduce this one time and time again! LOL. 😀

        Like

  4. What if it weren’t a shoe? What if it were a book? A cat? A kid?

    What if it were a stuck key?

    What if it were a hacker?

    When a programmer says “No user would ever do that, what he really means is “No user that I’ve thought of, and that I like, would do that on purpose or in a way that I imagined.” When a programmer says “No user would ever do that,” try asking him for a list of all the things no user would ever do.

    —Michael B.

    Like

    1. Hi Michael! Thanks for the comment!

      Good point about the stuck key – but wouldnt that be a defect with the keyboard?

      I’ve actually asked a developer for a list of things the user wouldnt do berfore!!! 😀
      It was because he kept rejecting my bugs saying that the users would never do that. I think he got the hint…
      Another good response to developers that I find to be good is: “Never is a very long time… are you willing to bet the user will /never/ do that”

      I must admit, I was trying to be ironic in the blog with the comment about what the developers said…
      I was concerned with the comments about it the test diminishing the reputation of Exploratory Testing, hence it inspired me to write the blog.

      On a seperate note, I’ll definitely be at the London Tester Gatehring this month, so it’ll be great to finally get to meet you!!

      Like

  5. @Dan….

    It would be a great idea, in my opinion, for us to drop our idea of “defect” and focus more on “problem”.

    Consider two products. One filters input. Its response to a stuck key is to stop accepting input after 20 characters. Another puts no constraints on input. Its response to a stuck key is to keep processing keyboard input until it overflows its own memory space and starts clobbering the operating system. Which program would you rather have on your computer (or, if the programs are on a a mobile phone with a Bluetooth keyboard, which would you prefer?)

    A customer doesn’t buy our code from us. He buys a system that includes our code and its interaction with us, with the platform on which it’s running, and with other things in the world. Wherever the “defect” might be, when the customer perceives a problem while using our product, he has a high probability of associating a problem with our product. And if there’s a workaround that our product could implement to defend the customer from that problem, the customer would be absolutely right to make that association.

    —Michael B.

    Like

    1. Hi Michael.

      I think the system that I would choose would be completely dependant on my needs… I might find a limitation of 20 characters too restrictive for what I need to enter in the field, so this might be a problem for me. And I might not mind the option of their being no character limitation at all, possibly because i might be blissfully unaware that holding a key might cause a system failure, or because i know that the risk is low due to my environment being minimalistic with no possible objects on my desk that could cause that scenario to happen.

      You are completely right however!! Every customer has the right to associate any problem with our product.
      What should happen when a user is not the paying customer though, for example with websites?
      And what about conflicting interests between different customers/users?

      Like

  6. Hi, Dan…

    I like that you’ve noted that neither possibility is intrinsically “right”. That’s important. For the same reason, no test (whether it involves shoes or not) is intrinsically valuable or valueless, since a complete and accurate determination of its value can’t be made until all the facts are in… and all the facts are never in.

    All right. 🙂 Having got you started on problem vs. defect, and user vs. customer, let’s think about quality: value to some person(s). Implicit in that is the idea that, for every organization, some people matter a lot and other people don’t matter so much. As testers, it’s our job to identify threats to value for anyone that is likely to matter—and to contextualize those threats to value—so that the people who matter most (that is, the people who are paying us for our service) get information of the highest value so that they (and not we) can make decisions about value for other people.

    In practice, that means
    identifying people who might matter
    generating ideas about problems that might affect those people
    investigating the product in a search for those problems
    investigating the product in a search for problems in general
    when we find a problem that we had not anticipated, framing both the problem and our testing to show how the problem might be important to people who matter

    Our bias here, as testers, is on identifying the threats to value. When it comes to conflicting interests, I would argue that we don’t have to decide; that’s a business decision, to be made by people who are responsible for the business. Our role is to help make sure that people who might matter are not forgotten, and that potential effects of a problem on those people are clearly expressed.

    That word “potential” is important. We also need epistemic humility, since we can’t accurately and reliably predict the probability or impact of any particular problem.

    Like

  7. Interesting post and very valuable thoughts above.

    For me, the ‘intent’ of the test and the ‘steps’ are two different things. If you are tasked with testing a system to see if it can be hacked or blown away, a shoe test would make way as the ‘intent’ to do it. For ‘steps’, it can be a shoe or cat or other variations. That lands us on to the key decision so as ‘what we want our testers to focus on?’.

    Like

    1. Hi Majd!
      Would the steps for running the shoe test be that different depending on the objective of running the test?
      It would essentially still be: “hold down a number of keys on the keyboard for X amount of time”… (please correct me if im wrong!)

      I see the shoe (or cat, or book, random object, etc) essentially as a tool to aid with performing the test, to make life a bit easier so that we don’t have to physically hold down the keys ourselves for an inordinate amount of time.

      Like

  8. FWIW, I’m reminded of this story from HWTSaM:

    In the early days of Microsoft Office, testers were very creative with their office supplies.
    The goal was to simulate real user input from a keyboard that would overload the program’s input buffer. The challenge was finding the proper office accessory that was the right size with the proper dimensions that could easily be applied to the keyboard while I went to lunch. As it turns out, the stapler fit the task. When I came back from lunch, there was always an ASSERT or Hard Crash on my screen.

    Like

    1. Thanks for the comment Alan!!

      Thats interesting – so you only tried this test because you were looking for this specific type of bug?
      Thats a different approach/reason for performing this test that i didnt cover at all!

      I guess when you’re hunting for specific bugs like overloading the buffer, then this test could actually be very valuable to run.

      Like

  9. The shoe test is a keyboard input test. Unless you’re testing keyboard input at the OS level, it provides no value.

    There are other tests like the shoe test that are applicable to mobile devices, since multi-touch and other simultaneous input combinations are not fully defined.

    Liked by 1 person

    1. Thanks for the comment Aaron!
      I think the shoe test would also be valuable when testing applications with the focus on certain objects or elements within the application… Take input fields, for example, on a website… Different input fields on different sites might handle the continuous input from a keyboard completely differently. Some sites might have input fields that don’t handle continuous keyboard input at all (maybe if there was no character limitation for example) and might cause the site to crash or fall over, so I’d say it’s definitely a valid test to run.
      I do think that running this type of test when testing an OS would more than likely be of a higher priority than running the test on a website though!

      Like

  10. Anyone deciding to use ad-hoc unstructured random testing like this for their product is on a hiding to nowhere. The whole purpose of a structured and systematic test approach is to be able to quantify the quality of testing that has been carried out and determining whether it is sufficient to meet the defined quality criteria.

    Sure, use the Shoe Test for bit of fun but it would be like looking for a needle in a haystack; how many combinations of key presses and durations are there? and how many of those combinations would you be able to check in a given period of time using this method? I would pity anyone finding a bug in this way; Explain the steps leading up to the failure?, Is the failure reproduceable? etc. you get my drift.

    Like

    1. Thanks for the comment Ali!
      You have a point where your company might not permit this test from being run as it might seem to certain managers like the test is a bit silly, or a waste of time… But I wonder whether the your company would take a different stance if their web site/web app was subject to a loss of service attack which lost them lots of money, all because a hacker found a way to take down the system by holding a random few keys in for a few seconds which overloaded the buffer?

      I’m not sure aiming to test all of the combinations of key presses would be relevent either when testing for this type of bug… I guess like in any test, there are always difficulties regarding the amount of possible combinations of inputs and timescales. This is something that we need to use common sense for.

      And of course, if a bug is found this way, I wouldn’t say that anyone should pity the tester!! Bug investigation of any bug is a major part of being a tester and it’s required for any bug that you find. It’s a good idea to take exploratory notes as you go along, detailing key decisions that you make, such as taking a pic of the position of teh shoes, which show which keys are being pressed, and taking a timestamp of when the test started and ended…
      Investigating a bug of this sort, would be no more difficult that investigating any complex bug in any part of the system… That doesnt make it any less valid in my opinion.

      Like

      1. The thought of presenting pictures of shoes on keyboards to illustrate a bug is really tickling me. 🙂

        Like

  11. Hi Dan, as a complete outsider, I think this test is actually valuable. People would most probably think of me as a rather weird user (yes, I tend to tap multiple keys at times when I am reaching for my coffee or other things, even lean on the keyboard when necessary to reach something above, thus performing the “shoe test” unconsciously). Not to mention that some apps are for kids – they will defo push multiple buttons and place their little paws on the keyboard/touchscreen.

    Like

  12. Just change the name into “Lean test”, as most of us had done that occasionally – even the developers.
    But the real question is: >>> Is it useful, and when? <<<
    We should gather bugs raised on similar cases, see if these were mostly rejected or fixed, understand if these affected the OS or the application itself – and from there decide if we care enough to use.

    BTW – we need to find new issues, more relevant to the new mobile HW, as desktops are slowly becoming extinct.

    Like

  13. Did you know that “the shoe test” can probably be seen performed live at your local Ikea.
    It’s a subflavour of a durability test which is very common to perform on tangible objects or conglomeration of objects into a system. Eg a chair, bike, plain, factory, bridge, etc, etc.
    Different forces applied differently have different effects. Heck even not applying forces might have unpredictable and unwanted effects.

    Now if you are a Bank, would you like a shoe-test be performed on your cash-machines before you set em out in the wild?

    If you are offering some sort of service, would you like to know how it performs under high stress/load etc. A tool like gatling is just a virtual shoe-test.

    Like

Please leave a comment!