How browser security fixes Sitecore login

I was contacted by a client to help them roll out Sitecore Kubernetes deployment (hooray for clients willing to invest in technology updates). Of course I came in prepared: I tried out Installation Guide for Production Environment with Kubernetes (on Azure AKS with my own subscription), and had a shiny new Sitecore 10.x up and running on https://globalhost.cd in no time.

When repeating the setup on the client’s subscription, deployment went well but we failed to log in into Sitecore:

A quick look at the log files did not reveal any errors or failures.

In my setup, everything worked as expected, so what’s different for our client? Since this is a real project setup, of course the URLs (and TLS) are going to be customer specific (so not globalhost). Typo’s are easily made, but not this time: double checking the config did not reveal any mistakes.

With no traces in the logs we had to dig a little bit deeper. Since single-sign on is all about redirection and passing along the required information in the process, taking a closer look at the network traffic made sense.

(see Virtual Developer Day 2020 – Getting to Know Sitecore Identity – George Chang for a good introduction on Sitecore ID)

Security best practices for the win

A useful tool to do this is fiddler (fiddler classic in my case): a web debugging proxy to successfully log, inspect, and alter HTTP(s) network requests and server responses.

I was already thinking bad tokens, encryption issues, claim mismatches, … Luckily the issue was not as complex as I feared. Comparing network traffic from my (working) setup with traffic from the client setup revealed they had one HTTP header missing: Strict-Transport-Security, also known as HSTS.

If a website accepts a connection through HTTP and redirects to HTTPS, visitors may initially communicate with the non-encrypted version of the site before being redirected, if, for example, the visitor types http://www.foo.com/ or even just foo.com. This creates an opportunity for a man-in-the-middle attack. The redirect could be exploited to direct visitors to a malicious site instead of the secure version of the original site.

The HTTP Strict Transport Security header informs the browser that it should never load a site using HTTP and should automatically convert all attempts to access the site using HTTP to HTTPS requests instead.

https://developer.mozilla.org/en-US/docs/Glossary/HSTS

This is exactly what happens in my fiddler trace: even though the server tells my browser to go to HTTP (using the Location: header), my browser ignores this and connects over HTTPS directly (line 52).

Why would Sitecore redirect us to HTTP? No idea. As far as I can tell all our config is set correctly. I can only assume this is leftover code from a time where local development was often done on HTTP. With Sitecore on containers, development setup is a lot closer to production setup, including running on HTTPS.

Ingress

So why did I did not face this issue on my setup? An important part of a Kubernetes setup is managing how external traffic can access the services in your cluster (HTTP, HTTPS, TLS, …). This is done with the Ingress API object. Out of the box, Sitecore suggests you to use the NGINX Ingress controller for this.

NGINX is a fine default, but my client has a track record in Kubernetes (not on Azure) and for their networking requirements they standardize on Istio, a service mesh. To use it, we had to switch NGINX with Istio Ingress.

Turns out HSTS is enabled by default in NGINX (see https://kubernetes.github.io/ingress-nginx/user-guide/tls/#http-strict-transport-security), but not in Istio. Enabling it is quite easy though (in case you might be interested): https://www.wagner-dev.com/istio-configure-strict-transport-security-hsts.html

In a development setup, network traffic is typically handled with traefik. Sitecore provides you with docker-compose files that force this behavior.

So why didn’t we spot this right away? The browser we were using (it’s name starts with a ‘C’) doesn’t show the protocol in the URL bar, so we simply didn’t spot it. That we had to test this through a screen share session probably didn’t help neither (but that’s a different story).

In summary

Tags: #aks, #kubernetes, #ngnix, #istio.