Source maps through the looking glass

Ben Vinegar

Recorded at JSConf EU 2017


Get notified about Ben Vinegar

Sign up to a email when Ben Vinegar publishes a new video

[Music]
hello everybody so welcome to sock it's
called source Maps to the looking-glass
I realize now that a looking-glass
is not actually a magnifying glass it's
a mirror so but humor me the idea of
this talk is that we're just going to do
like a deep dive into source maps maybe
use them today and but they're kind of a
black box technologies we're just gonna
go deeper and kind of uncover what's
there
my name is unfortunately Ben vinegar
it's a real name I go on Twitter by the
handle Bentall gin if you want to check
that out I work at a company called
century we're an open source company we
developed a tool called century that
informs you of errors in your production
applications be they single page
JavaScript applications or server-side
code or mobile apps and I'll touch on
that a little bit later in the song so
to kick things off I just want to talk a
little bit about how javascript is used
today in 2017 how many people here use
you know like a modern dialect of
JavaScript yes es6 2015-2017 2030 2045
okay so most people then you're familiar
with this slide which is a really simple
function it's using es6 modules it's
using this like rest operator and it
also has arrow functions I love all
these little language features they make
writing JavaScript more fun but I work
on a software product that still targets
some older browsers like IE 11 and I
want to make sure that that code works
in those browsers so I run it through a
tool called Babel and it generates code
that looks like this and this is this
will run an I 11 just fine ah
and I want to keep going because for
some contrived reason I must also wrap
this in web pack and so I run that
through this tool and it generates a lot
more content but of course I don't
actually want to ship all of these
characters to end-users so I run this
aqua Phi one more time and I'm left with
something like this this probably seems
pretty familiar for most people so
somebody somebody famous once said like
JavaScript has become the assembly
language of the web and I'd heard this
many times over the years I always
thought this was something that Brendan
Eich said you know creative JavaScript
maybe it was when he announced a smas or
announced web assembly seems like the
kind of thing that he would talk about
but it's actually it was coined all the
way back in 2011 by guy Scott Hanselman
he's a blogger an author and he was just
sort of observing that when he browsed
around sort of his favorite websites
being a Google or Facebook that the code
that was being served to him like nobody
wrote this this was this is being
generated by tools and this was before
Babel this is before web pack etc and
the comparison to assembly is pretty apt
it's not just the idea that it sort of
like a compilation target but assembly
is really hard to read I don't know how
many of you have worked with sort of
compiled languages or even written
assembly but it looks like this this is
like you know the actual machine
instructions that your computer uses to
do stuff some of these commands are
stuff like you know move a value from
one memory location to a register do us
do a addition operation on that register
jump to another location in your program
etc I have experimented with trying to
debug compiled applications using you
know just assembly and I have found this
very difficult I have no idea what my
program is doing whatsoever and maybe if
you're a program in god you might be
able to do this but I'm incapable and of
course you know if you're debugging in
the browser it doesn't look too
dissimilar my code looks like this I
have single letter variables all my
functions have been reduced to nonsense
stepping around you know like even kind
of maybe understand what your program is
doing but it's really difficult
similarly this isn't just debugging in
Chrome
this is a screenshot from century said
open-source tool again what we do is we
take sort of like crash reports from
client-side JavaScript and we've set
those up into into our web server and we
give you sort of a stack trays to help
you reproduce the bug but it's not a
very good experience when you're dealing
with minified code you know for me to
tell you hey this is a bug and it's
occurred on line 27 column 30 1652
that's kind of difficult to understand
what's happening so compiled languages
have always had a place for the longest
time they've had this concept of debug
symbols what that means is if I compile
an application and I add this like dash
dash debug symbol on Mac OS I get this
you know besides sort of like the
program that I've output I also get this
decent folder on Mac OS if I'm building
with LD be in it or sorry L VM I don't
wanna get too deep into that but if I
fire up my debugger again with my
compiled program and that like those
symbol files are available the
experience of debugging is a lot easier
now I can actually step through the code
that I wrote I can actually inspect
variables using their logical names and
not using like memory addresses or
register locations and it's plausible to
debug things this way so if you bug
symbols you know they map machine
instructions to source locations and
symbols etc so why don't we have this
like why if we've had this forever in
sort of a compiled languages why why
don't we have them in JavaScript or in
other languages
well JavaScript is different because
we're not you know we're not compiling
into some intermediate form via bytecode
or machine code we're really just taking
text and we're transforming it to some
other piece of text right so the
existing sort of debug simple formats or
whatever then really work in this world
and furthermore you know when you have
you bug symbols like when you're when
you're compiling with your bug symbols
like that's that's a folder that you
have on your local machine and you're
debugging with it you're not you're not
sending it over the internet back and
forth and so many of these like
formats that support debug symbols
aren't really designed for consumption
over the web so this is a bit of a long
preamble to where we all know where this
is going which is source maps and
whether the topic of this talk is so if
you didn't know source maps are pretty
much just a JSON file and it's got a
format whose contents let you map
filenames lines columns that appear in
sort of a output file back into up to n
input source files the source files can
be any kind of text like there's nothing
about the source map format specifically
that's you know designed strictly for
JavaScript it can be used for things
like CSS SS or CSS celeste
or other transformations and it's also
optimized for plain text transfer over
network over HTTP we'll see what that
means a little bit so a little bit of
history the very first version of the
source map spec I'm not exactly sure but
I think it was dated around 2009 and it
was built for a tool called closure
inspector
does anyone here use closure compiler
okay
maybe five hands I'm actually surprised
I thought there would be more than that
but closure compiler is sort of like
it's a it's an optimizer and am in a
fire-- kind of like uglify kind of like
pre-pack if you if you've looked at that
a little bit but it's been around for a
long time and one of the closure
compiler developers wanted to map what
they were seeing and they're in a sort
of minified you know compiled output
back to the original code and so he
built a Firebug Firebug plug-in and this
Firebug and used effectively the very
first version of source Maps it went
through a couple of revisions and the
latest version is actually revision 3
which was written in 2011 so once you
know six years ago is a long time in our
world it's been updated a few times but
something to know is that this is just a
proposal there's no like you can't go to
mdn or what WG and find some you know
really like fleshed out specification
that says what a source map is it's just
a Google Doc that's on the internet and
you know at any given moment you can see
who's reading the school lock at the
same time it's use your got a dozen
people and you can attempt to chat
but they're usually not listening so
despite the fact that this is sort of
like this kind of janky specification it
doesn't have like a standards value
behind it it doesn't really matter
because everything kind of uses it which
is really cool
compilers you know be it Babel or type
script or even M script in which is like
a JavaScript - sorry a c2 JavaScript
compiler optimizers like uglify closure
and yes pre-pack module bundlers every
browser tools like century so we you
know we unmanifest act races nodes
source map support is an interesting
node module that will sort of
automatically convert you know stack
traces that come from exceptions into
their original format if you're using
you know maybe using typescript on the
server or something like that so we're
going to go through like an end-to-end
example to just kind of understand how
source notice work I'm going to bring
this all the way back to this function
that I showed you at the beginning at JS
it's just an add function with arrow
functions and you know rest operators
etc I'm going to run this through babel
we're not going to go through the whole
like web pack uglify thing just to keep
this simple
so from babel I can just say like hey
take a DJ s output this file add that
disco j/s I'm using this es2015 preset
to you know targeted particular set of
browsers and then also specify this
source Maps configuration and if I run
that command a couple things are going
to happen my disk file is going to be
modified a little bit if I didn't run
from when I would have otherwise written
that this sort of babel command and also
i get a source map file but before i
jump to that source map file like let's
take a look at the output file and i
just i guess it's pretty much exactly
the same as i would have run it without
that source map flag except one key
addition which is it adds this line to
the end of the file which is this source
mapping URL directive this is the thing
that tells browsers and other tools
where to find the source map file that
is sort of associated with this
javascript file so browser downloads
your javascript file goes to the very
end
goes to the last line looks for this
comet and go
huh I need to download add just a JSF
map and that path is relative to
wherever you're hosting this disk file
it doesn't have to be relative you can
specify sort of like a full path and
that's where you know the browser will
download it from a lot of people talk to
me about like they want to use source
Maps but they don't want to expose them
publicly well there's a couple tricks
you can do with that like for example
point to a location that's maybe only
accessible on a virtual private network
so only you can download it and other
sort of interlopers can't you could even
host those files like your source not
filed locally on your own you know on
your own little web server so you can
point back at localhost and you'll be
able to download that file another thing
that not everybody is aware of is you
don't have to use the source mapping URL
directive there's actually a header
called source map that you can just sort
of send down with your JavaScript file
that is a clue to the browser and other
tools where you can find the Associated
source map so that's something else you
can do but not everybody has the power
to just sort of arbitrarily change
headers again I mentioned CSS earlier
like it doesn't strictly have to be this
Script understands you can also use this
you know CSS comment to the end of your
file too so if you want to get started
with source Maps and you just want to
use them this is pretty much all you
need to know you you jet you use tools
you generate a source map file you put
them on your web server the browser will
download them and then now when you
start debugging you get to step through
your original code which is pretty cool
that's it so I recommend doing that
similarly sentries like we we kind of
act like a browser we actually like when
we see stack traces that have JavaScript
files in them will actually try to like
actually fetch those JavaScript files
and if we see that there's a source map
header or there's a source map URL
directive we'll download that source map
and we'll apply it to your saturates and
try to like you know show you the
original file the line in the column and
we even pull surrounding source code too
which is kind of cool so in this case
this is actually like an example from
our live application with some JSX
and that's kinda neat but we're going to
go a little bit deeper because I think
it's interesting to just understand how
does a source map even work how many
people here have like tried to open a
production source map in like their text
editor okay
was that a good experience hello okay
typically this is what happens to me
which is like my editor pretty much
crashes and that's because source maps
can get really big it's totally normal
for them to be megabytes in size we've
seen source maps as large as 30
megabytes which is pretty absurd the
reason that we started this with like a
pretty contrived simple example is that
this entire source map can actually fit
in one slide so this is what a source
map looks like I've sort of you know
pushed around the white space a little
bit so you can read it a little easier
we'll go through all the pieces of this
really quickly so the very first thing
is just a version string and this is
just like this is just a clue to the
browser like what version of the source
Maps track am i dealing with as we
learned earlier version 3 is the latest
thing from 2011 so pretty much
everything says version 3 the file is
you know what file is associated with it
it's like what is this source map for
and in our case is a data set is a
source map is associated with one file
one output file sources are a list of
input files that went into this output
file in our case there's only a single
source file that's added to s but if
this were a production application with
many components many modules whatever
you can imagine SKUs dozens hundreds
maybe even a thousand files sources
content this is sort of like an optional
feature of source Maps Babel actually
just in lines all of our input source
code into the source map for convenience
and you don't have to do this but this
is also what contributes to source nodes
being so large but it's pretty
convenient cuz it just sort of works out
of the box do you have to deal with
other problems like where do you find
all of these other source files lastly
this is this is you know this is the
biggest part of what makes a source map
a source map which is this big mappings
blob so to bring that up here it looks
like this it kind of looks like a bunch
of nonsense but it isn't we're going to
walk through actually translating
something by hand
just so you understand what's taking
place so one thing to know is that when
you're doing like like sourcemap parsers
they work through a source map or this
mappings property linearly they start at
the very beginning the very beginning
represents line 0 it's not like random
access you can't just go to the middle
of this blob uhactually have to process
the whole thing in order
so each semicolon denotes a new line so
we started line 0 and you'll notice that
this source map actually begins with 5
semicolons and that might seem kind of
weird the reason for that is that babble
outputs some sort of like preamble to
the output file for which there's
actually like no matching code in our
input file and the source map basically
recognizes that it's like hey just skip
over this because there's there's
there's there's nothing for us to even
point to here so if we continue we go
past those sort of like you know those
first opening lines we get to what's
called a segment segments are comma
separated and these are the things that
actually you know make the translation
from your output source to your input
source segments are made of variable
length quantities now what is QA maybe
that's an L or and I am not actually
sure GB so there's a variable length
quantity or called vlq this is sort of a
format that's designed for efficiently
encoding arbitrarily large integers it
was actually it's like an old spec the
values are sort of based 128 encoded
it's a little different and it was
designed for MIDI files originally which
is pretty interesting so it kind of
looks like this like for for single
integers like there's one character that
matches this this is fine but as you get
larger like 123 is only 2 characters 1 2
3 4 5 6 7 8 9 is 5 characters sorry 6
the idea here is that because a source
map is going over the while like it's
characters its plaintext we've got to
download it we want to represent that in
as small a format as possible what's
also neat about vlq is arbitrary
arbitrary length tuples of data can also
be encoded efficiently and that's what
that last value there is on the ball
right 0 1 negative 1 1 2 3 is
represented by this 5 these 5 characters
so we get to avoid commas we get to
avoid like the negative prefix which is
pretty cool the specifics of how to
actually sort of like convert these I
don't really know I just let this
library called voq which you can install
via npm to just sort of decode them and
that's how i work through a lot of these
problems so let's go back to this you
know QA IG b if i decode this i actually
get a tuple which is 8 0 4 16 so what
does that mean I mean once you break it
down it's pretty simple this is where
now as the source map sort of starts to
make sense
the very first value is the column in
our output file add disk j s remember
that we're because we're working
linearly we sort of already know what
the line number is which is like line 5
sources array we only have a single
source that's a DJ s so this this value
is 0 then the final the final T values
are the line and the input source in the
column in the input source file so if I
convert this this is basically what this
this segment is trying to tell us we are
currently on line 5 column 9 of and of
this edge is for this particular segment
it's a digestif line 5 its column 17 and
if you we kind of break this down and I
go back to my input file my output file
that's really just this it's saying like
hey in the output file add is over here
and in my input file add is over there
that's pretty much how this thing works
one thing to note here is you'll notice
that this is not a character by
character translation right source Maps
really just I mean they could be
character by character there's nothing
that would stop you from doing this but
the idea here is that we only need to
map the start locations of identifiers
and that's efficient because if we
mapped every single character you know
this mapping this mappings property
would be gigantic but if we only have to
do identifier z'
it's much smaller so I'm only going to
convert one more value just for just to
kind of like bring this idea home the
very next value the very next segment if
you could see it is actually just a
capital G
which is a little confusing because it's
like wait a minute didn't I need four
values to actually translate something
like what am I going to do with this
single value which is actually an eight
oh actually I think it's a three I may
have messed this up so segment values
are relative this is sort of like a
space-saving kind of idea right like I
don't need to have the absolute value
every for every single segment I can
just work off of what what I was doing
in the last segment so we just add this
value to the previous segment and we get
a new location which is 11 instead of
character 8 bear with me is a little
confusing but the idea here is you know
there was another there was another
identifier add like add appeared twice
in this in this output file right so
this is actually just linking back to
the same location remember the very
first value of that tuple is the
location in the output file right so
line you know line 5 line 17 like that
didn't change so what's really happening
here is that the source map is telling
us that this add function just appears
twice in the output file which is kind
of interesting so hopefully you have a
basic understanding of how this was
going I would just kind of keep working
through these values keep translating
them keep getting new translations but
the idea here is that I finish in a
place where I basically take all these
values and I dump them into some data
structure like a table so that you know
going forward I don't actually look at
the source map I just kind of like query
this table for the data this is how web
inspector works this is how other tools
sentry etc work they just they just
munch through a source map generate you
know this big sort of lookup table and
then the rest it just kind of works by
itself if it seems like a lot to take in
the good news is that you don't really
ever need to know the particulars of how
this work I just wanted to sort of
explain you know what's happening
because it's kind of neat there is a
tool called source - map it's on NPM
it's from Mozilla and it does kind of
what we just did it actually just kind
of breaks down the source map for you
and provides an API where you can query
this
SMAP to look up locations for yourself
the API is a little complex because it
does a lot but it looks like this I
import this library I read my source map
file from the file system I then create
what's called a source map consumer and
then I can look up you know original
position for line six column zero
whatever whatever my lookup and it will
tell me which file what line what column
so if you are working with source maps
and you've ever struggled with sort of
like it seems like the lines and columns
don't match up exactly the way that you
wanted I recommend using this library to
just look up the values yourself to see
if they make sense I could spare like a
lot of time with debugging so this kind
of takes me to the end what do we learn
is a bit of a wrap-up so source maps are
sort of kind of debug symbols for the
web that's close enough
they're just files lines and columns
transmission over over HTTP like that's
why that's why vlq values that's why
there's sort of these relative segments
etc and the cool thing is that almost
everything supports them today so if
you're not using them you should be
using them so again my name is Ben
vinegar I hope that you found this talk
vaguely illuminating if you want to find
me there's me on the internet this is a
link to pretty much everything that we
taught you today please check out
century it's really helpful in its open
source you can just kind of run on your
own server