Front-End Performance - The Dark Side

Mathias Bynens

Recorded at ColdFront 2016


Get notified about Mathias Bynens

Sign up to a email when Mathias Bynens publishes a new video

yeah thanks for that hi everyone so yeah
i'm mateus and I indeed work for opera
software in developer relations and
today I'm about to make you hate
performance because usually when we talk
about front end performance we see it as
a good thing and usually it is but today
I'm going to focus on some security
sensitive situations in which
performance can actually be a problem I
will also demonstrate that some new and
performance related web api's can
sometimes cause some security issues
they can have a negative security impact
so let's get right to it here's a simple
example let's say you want to compare
two strings in Java scripts it's not
very tricky or heart right you would
probably just use an equality operator
and because you're just comparing two
strings it doesn't matter whether you
use triple equals or double eagle so
just let's just bypass that whole
discussion so okay you compare the
string bold front with itself and it
returns true and if you compare two
strings that are different it returns
false this is exactly what you would
expect right well I promised I will talk
about performance so let's get some
timing measurements up in here so let's
say the first example where the two
strings are equal takes up about a
thousand microseconds to run the second
example where we compare cold front
where the tea at the end with cold front
takes up about a thousand microseconds
however if we compare pikachu with beat
you we get to result much more quickly
it only takes about one hundred
microseconds and if you can bear the
strain CSS with XSS then suddenly we get
a result in 200 microseconds so there's
takes for this function called to
complete depending on the input strings
that we feed it so why is that happening
well the answer lies in the way that the
JavaScript engine or the engine of the
programming language that you're using
because this is not specific to
javascript in any way it lies in the way
the engine implements the Equality
operator for these strings so just
imagine how you would be implementing
this if there was no way to compare two
strings directly
well there's only really one way to do
this you would have to loop over the
string character by character gets the
character code at each position and then
compare it to the character code in the
other string at the same position and if
all the character codes are the same
then you know the strings are equal and
as soon as you start writing that out in
actual code it will become really
obvious that there are some performance
optimizations that you can apply so
let's take a look at some code here so
we have our loop right in the center of
this slide but before we even get to the
looping part we can already apply an
optimization we check the length of the
input strings and if the length is
different between the two strings we
know that the strings can never be the
same so at that point we can just return
false right away and this is called an
early exits or an early return that way
we avoid doing any more work and we
bypass the loop completely now another
performance optimization takes place in
the loop itself as soon as we find a
single character code that differs
between the two strings we can break out
of the loop and return false right away
we have we don't need to check the
remaining characters anymore because we
know there's at least one difference
between the two strings now in terms of
performance the absolute worst case is
when the two strings are equal because
that means you have to complete the
entire loop and compare every single
character and only at the very end you
know for sure that's the two strings are
the same and that's when you return true
so with that in mind if we look back on
our earlier results it kind of makes
sense it's clear that there is a
difference in runtime performance
depending on the input for these
functions so if we compare cold phone
with itself we have to complete the
entire loop before we can return a value
value true in this case if we compare
cold front with a tee at the end with
optimization number two kicks in but
because it only applies at the very end
for the very last character in the
string there's not much of a performance
gain they're comparing pikachu Picchu is
really fast because the length of both
these strings is different so we just
return false right from the beginning
and we don't even enter the loop after
that we don't even have to compare any
character codes at all when we're
comparing the string CSS with the string
XSS the second optimization kicks in
once again but this time it does give us
a very nice speed advantage because
the very first character of the strings
differs already so at that point we just
exit the loop and return false right
away we don't have to keep on checking
the remaining characters so what's
interesting here is that we're not just
getting a boolean result of true or
false which is what this function was
designed to do in the first place we can
infer even more information just by
looking at how long it takes for the
function call to complete and this is
what is known in the cryptography world
as a side channel leak because the side
channel that is actually leaking
information in this case is the timing
measurement you could say that our
compare function is vulnerable to a
timing attack yeah so a timing attack is
like going to the restroom at a dinner
party it's depending on how long you're
away for people can kind of figure out
what it is you've been doing in there so
getting back to a comparison function
that is vulnerable well usually it's not
compare two strings you want this to
complete as quickly as possible right
and that is true as long as you're
comparing data that isn't supposed to be
secret the problem starts when user
input is being compared with some hard
coded value that is supposed to remain a
secret like a password or a hash for
example and this happens because each
and every one of those performance
optimizations that we implemented
earlier is now a weakness the first
optimization where we exit early when
the length of the two strings differs
this allows an attacker to figure out
the expected length of the secret string
only need to do is just try some inputs
of different lengths while measuring how
long it takes for each function call to
complete so the attacker could just send
one character then a string with two
characters than three characters and so
on and there were time how long each
operation takes and as soon as one of
those input strings returns results in a
slightly longer run time than all the
others you know you have found the
correct length so once you know the
length of the secret string you can
start brute forcing the characters
within the string so you create a string
of the expected length and then you
start brute forcing the first character
you try a then you try to character be
then you try to character see and so on
and once again the attacker times every
operation that they perform and if one
of the input strings results in a
slightly longer run time it means
that we probably end up in the next loop
iteration that's why it takes a little
bit longer so that means that the
correct character at the first position
is found and at that point the attacker
can just repeat this for the next
character and the next character and so
on so the attacker cannot use a timing
attack to figure out the very last
character because at that point you
either return true or false and there is
not much of a performance difference
there but at that point they can just
try every possible combination for that
last character because it's only one
character that they're missing so how
can we avoid this problem for the
compare function well whenever you're
comparing sensitive strings you should
use a safe comparison function and
here's an implementation of that this is
a constant time comparison function how
do you create such a function well you
basically undo all those
you try to make sure that the code
executes the same number of instructions
of operations regardless of the input so
here you see that I'm still comparing
the length of both strains at the
beginning but instead of just returning
early I'm storing a flag and so variable
and then I still go through the entire
looping process even though I know it's
not really necessary to get the result
that I'm looking for the goal is to make
sure that the function takes about the
same time to complete for any two input
strings okay so this was a very generic
example of a timing attack and at this
point you're probably wondering how
front-end web developers can execute
timing attacks against their visitors on
the client side so i am going to show
you this in just about a minute but
first we need to talk about an old
privacy problem on the web CSS has a
visited pseudo class that enables you to
style links that have been visited by
the user before in a different color or
with a different background or things
like that so in this example all the
links on the page would be green but the
would have a red color instead I'm sure
I'm not telling you anything new here
but a long time ago in a galaxy far far
away back in 2009 some clever people
discovered a way to figure out someones
browsing history using nothing but HTML
ok
and at that point this data can be used
by the website to fingerprint the user
and to basically reconstruct their
browsing history because this simple
technique could be performed for
thousands of URLs per second it made it
possible to figure out a sizable portion
of the victims browsing history or to
even uniquely fingerprint the user it
was a huge privacy problem and not just
in theory by the way a research show
that this technique will actually being
used in the wild on various popular
websites mostly by you know evil
advertisement companies let's say and as
a result of this web browsers ended up
making some changes to their
implementation and as a result if you
try this today you will find that
document that query selector actually
lies to you and it pretends that the
user has never visited any website
before so if you use the visited
pseudo-class inquiry selector it won't
give you any results now of course this
particular attack is an attack but it's
not a timing attack should state that
there is another way of performing the
same kind of thing which was also
discovered around the same time and that
was to use the get computed style on the
links so you display the same list of
then you read out the rendered color for
each link and then the script
differentiates the visited links from
the null visit links just by looking at
and that way you can effectively still
figure out which of the URLs the user
has visited before and as a result of
this browser vendors ended up changing
get computed style and also made it tell
white lies it now pretends that the user
has never visited any website before and
once again this is also not a timing
attack and also this technique is very
old like I said and it hasn't worked
since 2010 when browsers started
protecting against it so why am I still
talking about it well I just wanted to
establish that it's really not supposed
to be possible to leak a users browsing
history like that and in fact browsers
jumps through all kinds of Hoops to try
to prevent people from doing so and then
in 2013 a researcher named pearl stone
came up with a new technique that
bypassed these browsers protections
you observe that links re-rendered
whenever their state changes from non
visited to visited or the other way
around so for example when the user
clicks a link at that point the link
becomes visited so the color changes
from blue to purple for example so Paul
also realized that the links href
attribute can be updated dynamically by
using javascript so what you could do is
you can create an HTML link and point it
to a fake URL on a domain name that
doesn't even exist so that the user
likely you won't have visited this URL
before so this with this way the link is
rendered in the non visited state then
you update the links href using
javascript to some other real URL that
you want to test and if the user has
visited that URL before then the link is
rear-ended in the visited state if the
user has not visited that URL nothing
changes no rear ndering takes place so
the with that information the only thing
missing to still steal the browser
history is a way to detect whether such
a rear ndering event has happened or not
and Paul of course found a way to detect
when these rear enders happen and here's
how he did that he applied some
computation heavy styles such as the
ones you see there like a text shadow
with a massive blow radius and by doing
that he managed to slow down the
rendering of all links on the page until
it could reliably be detected using
javascript timers in
requestanimationframe so if we given URL
conscious re rendering it means that the
user has visited that URL before so he
used requestanimationframe and every
time he was given access to a frame he
would just get a timestamp and then he
previous frame and by looking at how
long it took for the one frame to
advance to the next frame he was able to
figure out whether a really rendering
took place in between these two frames
or not because he slowed it down badly
enough so this is a three times
essentially the same problem but this
time it has been solved using a timing
attack and timing attacks are in general
just a little bit more well harder to
defend against from a browser point of
view
and here's a very modern timing attack
it's called sniffly and it was
discovered by Jung soo and here's the
demo page it doesn't look super fancy
but it's another timing attack that
detects is the victims browser history
out of a list of domains sniffily is
able to figure out whether the user has
visited a link on that domain before or
not and here's how it works so for each
domain that sniffly checks it starts a
timer then it loads an image on that
domain over HTTP so not https but just
plain HTTP all these domains in the list
they use strict Transport Security which
means that if the user has visited that
site before the browser will rewrite
that request automatically into HTTPS if
the user hasn't visited that site before
no such we write will take place now CSP
or content security policy is being used
on the the sniffly website to restrict
images to HTTP only so images are
blocked before they are being redirected
to HTTPS and whenever image gets blocked
by CSP the error event fires at that
point the timer is stopped and sniffly
now knows how long it took for the
images to be redirected from HTTP to
HTTPS and then you just look at how long
it took if this time is only order of
say in milliseconds that it was probably
an H SDS redirect because no network
request was being made this means that
the user has visited the images domain
before now if this timing measurement is
more on the order of let's say 100
milliseconds then very it's very likely
that a network request occurred a
meaning that the user hasn't visited
that images domain before so this I
thought this was a very clever
combination of different relatively new
techniques on the web that still allowed
us to invite users privacy and these
specs that we've seen so far they're all
kind of nice but they're also maybe a
little bit complicated so let's simplify
things a little bit just for now what
does the simplest possible timing attack
on the web look like well
it looks something like this here's some
very simple code that performs a very
simple timing attack in the browser
first you start a timer and then you
load an HTML document as an image now
the browser will try to parse this HTML
document as an image and that will fail
because in general HTML documents aren't
valid images and that when that happens
the error event fires now as soon as
that happens that's when you stop the
timer and then you can lock the results
you can compute the difference between
the end of the timer and the first time
stem that you had and by filling in the
blanks the function will look something
like this which is really not a lot of
code it's still fairly simple so this
timing attack gives you an idea of how
long it took for the user to download a
particular resource and it could be used
for any URL that is not an image so how
is this useful and how could even evil
people use this against you well imagine
that a given URL returns either a very
big a very large response body or a very
small one depending on some property of
the user and here's an example of that
this is a wordpress administrator
dashboard now if you visit the admin URL
and you're not logged in you will get a
simple page like this asking you to
login it's just a login form and there's
not much else on this page so the
response body in this case is fairly
lightweight let's say it takes about
seven hundred 50 milliseconds to load
however if you visit the same URL while
you are logged in you will get a
completely different response and it
contains a lot more data than the login
form that we saw before this response
body is much larger and it's fair to
assume that it takes a little bit longer
to download as well so let's say it
takes twelve hundred and 50 milliseconds
to download so by applying the timing
attack on this URL we can figure out
whether a user is logged in to a given
site or not and if you remember those
browsing history leaks from before well
this is actually even worse because it
does not just leak whether you've
visited a site or not it also exposes
whether you actually have an account
there and whether you're currently
logged in or not this gives an idea of
whether you have actively used to site
so yeah getting back to this code
example this is a very simple kind of
technique this is probably the oldest
available timing attack on the web but
it's a little bit clunky and crude and
the main problem with this technique is
that you're essentially measuring
something that goes over the network and
the network is inherently unreliable for
any kind of measurement and this
technique measures download time which
does not necessarily correspond to the
file size because of gzip compression or
broadly compression and things like that
of course it can be made a little bit
more accurate by just repeating this
technique several times for the same URL
but that means you have to perform
multiple HTTP requests which have
performance implications of their own so
in other words this particular approach
is interesting but not very impressive
or scary and that brings us back to more
modern timing attacks a security
researcher named tom Van Hooten came up
with a different approach that is much
more robust and accurate by combining
several modern web AP is he found a way
to detect whether a given URL redirects
for the user or not and here's how that
works so first you get a timestamp then
you use the fetch API to request the URL
that you want to test to mimic the image
based request from before you can use
these special options essentially it
make sure that you're sending cookies
along with the request just like the
user would would if they were actually
going to request this page for real the
next step is to use the resource timing
API to get the fetch start time stamp
this is the time stamp at which the
fetch actually started this makes sense
right we can then compare this time
stamp to our start time stem that we
created on the very first line of code
there and of course we expect this
difference to be super small for pages
that do not redirect because we start to
fetch right after we getting the start
time stamp now if the difference is
very small that means it was not a
redirect but if it's greater than 10
milliseconds or greater than 100
redirect so this very short piece of
code allows us to figure out whether a
given URL redirects for the user or not
okay but how is that useful well it
enables you to figure out whether a user
is currently logged into a given website
or not all you need is a URL that
redirects to the login page when the
user is not logged in and then you can
use that URL to test for example if I
want to figure out if someone is
currently logged in on github I could
use github com / stars this is the page
that displays all the repositories that
you've stored before if you actually
access this page but you're not logged
into your github account it will
redirect you to the login or to the
login page but if you are logged in it
will just return the response and that's
that so in this example you see that I'm
logged in to all the websites that you
see open in my tabs so I'm logged into
Facebook what else is there i think it's
Twitter rabbits Instagram gets up and
just assume that I'm logged in to no
other websites than just those that you
see there now using the tiny piece of
code turns out that we can actually
accurately detect whether the user is
currently logged in to any of those
websites so we tried facebook already
let's try Twitter where I'm also logged
in and yeah it doesn't even take a whole
second for the script to figure this out
the next thing it's going to try as
Amazon where I'm locked not logged in
and once again it accurately is able to
detect this by the way this demo page is
online I will tweet a link it's in the
slides that I will put online later so
you can try this for yourself and see if
it works for you but for me it got
everything right on the first try yeah
so another new technique that also
improves upon the clunky image based
approach from before is the video
parsing timing attack and this looks
something like this instead of using an
image to download an HTML page we use a
video element instead and this way we
can listen for the suspend events to fig
when the download has completed and
that's when we start our timer whenever
the error event fires it means the
browser has passed the HTML document and
realize that it's not actually a valid
video file and at that point we stop the
timer so the timing difference turns out
to corresponds to how long it took for
the browser to parse the file and this
gives you a very accurate idea of the
file size when you use this on different
URLs you can compare the file size like
that so this technique completely animal
eliminates Network measurements and
instead it measures the file parsing
time see because we only start the timer
after the download has been completed
this gives you a very accurate idea of
how big one file is compared to another
and it's much more reliable than the old
image based approach now another new
attack is this one the cache storage
timing attack it also completely
eliminates the need for any network
measurements and it looks something like
this so first you use the fetch API to
load the URL that you want to test our
timer starts after the download has
completed just like in the last example
and before we put the resource into the
programmable cash using the cache API in
other words the script times how long it
takes to put this resource in the cache
the longer it takes the larger the
resources and the main advantage of this
technique is that it is repeatable
without having to repeat the HTTP
request you can just download the
resource once and then insert it into
the cache let's say a thousand times to
get a more accurate measurements so now
that we have an accurate client-side
timing attack well what can we do with
this just imagine for a second that I
work for an evil advertisement company
and I'm trying to find out as much
information as possible to my
advertisements that I'm serving and my
JavaScript that I'm serving about my
site visitors without their consent or
knowledge so this is another simple demo
it combines the video parsing timing
attack and the cache storing timing
accuracy it tries to determine your
favorite u.s. presidential candidates
based on the people you follow on
Twitter now
to do this it makes the very naive
assumption that if you follow more
people that follow Donald Trump than
assumes that you are rooting for Trump
and vice versa so when I try this for my
Twitter accounts yeah it's correctly
figured out that i would actually much
prefer Hillary Clinton to be president
over Donald Trump so this is already
kind of scary but it gets even worse I'm
sure you're all familiar with Facebook
right well on facebook you can create a
facebook page and once you own a
page and what was new to me is that
there is actually a lot of tools
available to you to fine-tune these
posts and to target these posts to very
specific audiences so you can restrict
the audience in terms of age gender or
even the language that they speak so in
this example I'm targeting my post at
people in the age range from 25 to 29
who identify as female and who speak
well javascript is not the language
apparently so we speak Esperanto let's
say so after creating this post we can
copy its URL and then open it in the
browser and this is what such a post
looks like if you happen to match the
criteria so there's the post content
there's also the comments on the post of
course in this case Newark there are no
comments but you get the idea there's
also a list of suggested pages in the
sidebar so let's say adding this page to
the cache 10 times takes about 30
milliseconds as measured by our timing
attack okay however this is what you get
if you don't match the criteria the
response body in this case it's much
smaller because all you're getting is
just this error message and some links
at the top of the page now adding this
page to the cache 10 times takes only 15
milliseconds and this difference in
timing makes it possible to actually
detect the H the gender and the
preferred languages of the user using
nothing but javascript without having
without us having access to their
Facebook account directly and in case
you don't believe me and this is not a
theoretical issue here's an actual demo
of that
which kind of scared me a little bit
this demo nicely visualizes everything
as that's going on as its figuring out
the users age so here we are performing
six different HTTP requests age of each
of these requests loads a unique
Facebook post that is targeted to a
specific age range now then we apply the
timing attack by adding each response
body to the cash thousands of times over
and over again and this gives us an
indication of each responses body size
the largest response body of course
belongs to the only post that the user
can access and this reveals the users
age range so in this case after a couple
of seconds the script already figured
out that the victim well the user that
ran this test is in the way well when if
we give you 33 year old so of course
what we figure that out you can repeat
the process for every number in the
range from 23 to 30 food to find out the
exact age of the user and hero coupon 10
more HTTP requests just like before the
same kind of tricky and after just a
couple of seconds it already becomes
quite clear yeah that's the age of the
user in this case is 26 years old so
keep in mind that this demo visualizing
everything that's going on in the
background while it's getting expensive
us but the scary grid is that the
visualizations could just as well be
hidden this would all be happening in
the background while you're visiting any
website whatsoever she could be reading
an article or watching a cat video or
playing a gay one of the website runs
the timing
on you in the background and there would
be no way for you to be noticed this
could even be happening in one of the
apps you currently have open in your
browser right now so this is an actual
photo my reaction when I first saw the
devil but believe it or not it gets even
worse this is just a couple of weeks ago
earlier this month 12 was in August
report the new attack niche heist was
presented at major security conference
it turns out that heist is an acronym it
stands for hdb encrypted information can
be stolen who tcp windows I'm not making
it up this up you guys who wrote the
paper actually papers up it's kind of
funny but you know it stops being funny
once you actually read the paper it is
how our snack the customer rises their
research a new attack is capable of
stealing social-security numbers email
addresses and warm from https protected
pages that's pretty much against African
rhythm other internet whites well and
here's what the authors of the paper
have to say about their exports hi this
is set of techniques at xsport planning
side channels in the browser blah blah
blah blah and variable of they're
capable of extracting contents like to
cure tokens from any website that uses
gzip compression which is essentially
every website ever that takes itself
seriously and I would like to highlight
that the main thing that made this whole
highest attack possible is panic side
time they're using a combination of
different diving attack techniques
ultimately that we discussed today
actually to fit to you know who this off
which is kind of scary so I think we've
seen enough example of the tax and scary
demos so let's talk about prevention
what can we do to prevent this from
happening to us or to our users well as
a web developer there is much that you
could do and in fact when I gave my very
first presentation of this topic that
wasn't much shorter presentation by the
way there was no real solution for
problem there was nothing that phase we
could do to stop people from figuring
out their users age by just running some
JavaScript but now i'm happy to say that
there is there is a new website are
called same site cookies and
unfortunately this is your only option
right now to protect against these texts
this is a fairly new at a fairly new
feature that only recently stripped in
chrome and opera and firefox has already
too interested in promoting it as well
hopefully on the browser vendors will
follow suit suit you will be able to use
same size cookies and have that be
supportive everywhere so how do they
work exactly well here's how you set a
regular cookie you basically send an
HTTP header that says set this cookie
this is the cookie name doesn't it could
be failure and then these are the flag
that I want to apply to the footage you
should always do is HTTP only under
secure place that's not move now if you
want to set a safe site cookie you just
append to the safe side flag at the end
and you give it a value in this game of
losing the value straight and this
causes the BP to be waiting out from any
cross site usage and this is kind of x3
that quickly choctaw i can lead to even
when the user falls across origin link
to the website this computer not be sent
example imagine that Facebook words you
can prevent this and I will see the only
people facebook post or clear I will
click on the link of Twitter go to
Facebook and I will just get working for
the moon surface could be a while so
unfortunately same side cookies in
strict mode they break a lot of existing
use cases you cannot even fellow links
anymore so I don't think giants like
Facebook or Twitter will actually
implement strict same side cookies
anytime soon that's kind of unfortunate
but it turns out that there's also
another value that you can provide for
the same side cookie you can also use
this relaxed mode and this doesn't break
links and some other common use cases
but it still prevents the timing attacks
that we've discussed today of course it
doesn't give the same amount of
protection as the strict mode and keep
in mind that people might still come up
with timing attacks that work in lex
mode in the future so while it's not
completely
is secure and not the best possible
solution it's still better than nothing
so consider using same site cookies in
lex mode now what can you as a rep user
do to protect yourself against these
types of attacks on different websites
well you can block third-party cookies
by default all browsers accept
third-party cookies but most of them
provide a setting that allows you to
block them and here's how we do that an
opera for example you open the settings
you just literally search for
third-party cookies and hit the check
box and there it is and I would really
recommend everyone here to try this go
use your favorite browser your browser
of choice go to the settings disable
third-party cookies and just give it a
try for a couple of days you know if
you'll be surprised by how little impact
this actually has on your everyday
browsing experience and trust me if you
are you know if you are surprised and if
you think it's too annoying and it
breaks too many things for you well just
uncheck the checkbox and go on with your
life and be attacked by a timing attack
every now and then but really give this
a try because it's the only way to
really be secure against these attacks
and against heist
for listening