Ding!
I just got a Team's message. I look over and it's our latest junior in our project, Alejandro, asking for help.
“Hey Mauro! Sorry to bug you but can I bug you? 😁
Isabella gave me a bug ticket that I can't seem to reproduce. Can you help me?”
Of course! How can I say no? After all, I'm an...

So we hop on a call and he explains the issue.
“Okay, so the client reported this bug that's really hard for us to reproduce. The ticket reads:
1. The user tries to purchase a product.
2. During the purchase, he will be asked to log in.
3. After the login, the user clicked on this random link in the header that navigated away from the website.
4. While on this new, unrelated page, the user clicks the back button in the browser.
5. He tries to continue navigating the website but all the network calls stop working. It's only fixed when he refreshes.
I tried following the exact steps and I was only able to reproduce it once this morning, right around the time I started working. Do you have any ideas what this could be from?”
Okay, now that's a really really weird error. It's one of those strange, uncommon, and hard-to-reproduce bugs all developers hate. The first thing that stood out to me was:
Why were the network calls failing? What errors were they running into?
“Yeah, so when I reproduced it this morning, I recorded my screen. In the network tab of dev tools, it had a 401 error.”
(+1 brownie points for this guy for thinking of recording the screen!)
Alright, that's the first clue. 401 HTTP errors refer to failing authorization problems with the request. Most likely something happened to the token being sent through the authorization header we use in the network calls.
Now, looking at the steps outlined in the ticket, what could have affected the token in this way? It's time to...
Eliminate the unrelated elements and create a hypothesis
The ticket holds a lot of information. Not all of it is relevant all the time. For example, it mentions a whole process relating to purchases. Is that related to the error?
This website has done thousands upon thousands of purchases before. It sounds a bit weird for this to have popped up now. Plus, doesn't it seem weird that this happened when the user clicked on a link away from the site and then back, and then the trouble started? It sounds to me like there's something to investigate there.
We create a hypothesis that the purchase steps outlined in the ticket are unrelated and we should investigate what happens when you navigate away from the website and then back.
What happens when you navigate away from a website and then press back?
The link clicked by the user didn't seem special in any way. There are no obvious differences between the user having clicked that link against any other that navigated away. Okay so... something about navigating away from the website and then pressing back creates an issue with our authorization token. That seems to be the catalyst that causes the webpage to stop working. How can that be possible?
We investigated with my co-worker for a bit and threw ideas around. Googling around for things related to pressing the browser back button, we learned about a browser optimization that we didn't know about.
All browsers have what's called a BFcache
Ever noticed how instant pages load when pressing back or forward on your browser? That's because of the bfcache. The before-forward cache is a performance optimization that all modern browsers have, where they save the states of the last pages you visited. So instead of refreshing every webpage when you press back, they just bring back the state saved in the cache. It's as if you never left the page to begin with!
I feel like we're close to knowing why this happens.
What if this cache is related somehow? I asked Alejandro to investigate in that direction with the information we uncovered in the meantime while I focused on other tasks and asked him to see if he could reproduce it again, if at all.
(That weird line he mentioned earlier about only being able to reproduce it only once at the beginning of the day was also a big clue. Whatever this error was, it only happened under a specific condition we didn't know about yet.)
The next day, Ale comes back with some great news.
Hey Mauro! I was able to reproduce it again! This time it happened again in the morning when I started working. I captured the token and copied it into a notepad. Looking into it, the token was expired. There's something about the cache that's bringing an old version of the token. 🤔
Ohhh that's interesting! And super-advanced progress!
After working together a bit more on it, now having all these hints, we realized what the issue was.
The token would expire once a day, around the same time every day. If the user opened up the webpage right before the token expired at that specific hour, clicked the link away from the site, and then back...
Ahh, the bfcache saved the webpage with the expired token! Thus giving a 401 error and making the site unnavigable until they refreshed the page (and got a new token).
This made this issue the most extreme of extreme edge cases. The odds of this happening are so small that by the end of it, I was wondering if it was something even worth fixing 😅
No matter! With Alejandro, we came up with a strategy to refresh the page automatically if it detected the 401 because of an expired token. Thus fixing this bug forever.
Another day, another victory for your friendly neighborhood...