Augmented Reality - Challenges & Writing AR Experiences in JavaScript

Philipp Nagele

Recorded at GOTO 2017


Get notified about Philipp Nagele

Sign up to a email when Philipp Nagele publishes a new video

[Music]
so first of all hello and welcome as you
can probably guess from Amex and I'm not
Danish I'm actually Austrian so if you
want to practice German afterwards
that's fine as well but of course you
can also ask me questions afterwards for
everyone who's looking currently in the
go to guide app there are two things I'd
like to mention one is you might notice
I changed the title of the presentation
that's primarily because things have
changed in the IAR space so dramatically
that when I decided on the title two
months ago this is basically entirely
outdated so doing a little bit different
session I have kind of a conceptual AR
part and technical part and like Jason
said things are moving super quickly so
I call this now gold new realities go
the second part I want to mention is you
have this app where you can rate me
basically and my talk and please make
use of that I'm super interested in how
you think this talk went if you think it
was crap please give me a crap I can I'm
totally fine with that
then I know I have to do things
differently if you think it was good and
correspondingly give me a thumbs up I
know it's late it's the last session
before the evening closing keynote so I
thought about what can I do to kind of
keep you awake and on the last day of
the conference and I thought in one hand
it might be the presentation itself I
hope you're gonna like that but I know
for a fact and if you look at me and I
am definitely an expert that sugar helps
in keeping awake so I brought some sugar
I will come down back to that in in a
second
so let me quickly introduce myself my
name is Philipp as I said I'm working at
the company called Wikitude my role
there is being the CTO for that I own
product and technology what a technology
team in a way I have a master's in
computer system engineering from a
university university not so far away
from here
in Hampstead Sweden don't ask me
afterwards to talk Swedish or Danish
the only Swedish Indonesian know is from
stupid drinking games and I used to work
for a company in Austria called three
United which was then acquired by
Verisign a us-based company and later on
as a product manager and innovation
manager for t-mobile Austria before
switching to WIC it's huge 70 years ago
so as I said the topic of the agenda
what you can expect now for the next 40
minutes is a little bit different than
what it was in the in the in the guide
I'd like to give you kind of what I call
the State of the Union on AR so it kind
of wraps where we currently standing AR
it should give you an idea what AR kid
AR core kind of oil is all fits in and
the second part will then be a little
bit more technical I'm gonna show you
how you put for example can code a our
experiences in JavaScript with our
framework my intention is not at all to
give you kind of a product advertisement
of our product of course I know it best
so we'll reference it quite often but I
think or I'd hope that you don't get
this kind of a product advertisement for
the wicked' SDK and there will be some
time for questions and answers so the
company Wikitude is a startup that has
been founded nine years ago at least
back then it was a startup we're located
directly in the city of salzburg which
you can see here so in Austria currently
our size team size is about 30 and
strongly growing and Salzburg in the
city of salzburg is primarily known for
three things the first one is Sound of
Music and I know that everyone in the
u.s. knows sounds of music I've no idea
about Denmark who knows sound of music
okay good I'm safe so the scenery and
the Salzburg is the seniority location
for that movie fun fact no one in
Austria knows that movie it's not a
common movie I saw it the first time
when I was in the US second thing
Salzburg is famous for one more serious
side is is Mozart
it's mozart's birthplace there are
Mozart Square most stages all over the
city and some genius
chocolate tree founded or created this
sweets called Mozart kugel back a long
long time after molted already died and
that's actually what I brought with me
there have been legal battles and claims
about whether they are true salzburg
motor cougar or original sorts with
good they will give you in the sugar you
need to sustain the afternoon and I'm
not going over the English literal
translation of Mozart Coogan but you
have to work for that I'm gonna ask you
a few questions you the auditorium
whoever has the answer right and shouts
it out first is awarded with the most of
kugel and I think I have three questions
during my presentation and I have a few
more mozart cooking with me so for
everyone who's going to ask a question
in the app and we cannot talk about this
I'm gonna award another Mozart kugel
so first question I said Salzburg is
famous for three things right Mozart
Sound of Music and then I want to know
what the third thing is and this is the
headquarter of this third thing which is
very close to Salzburg who knows which
headquarter of an international company
that is maybe Red Bull is a very good
answer and it's correct answer
so traditionally I'm gonna shoot at you
I hope you could at catching I'm better
throwing one
very good yeah so the third thing close
to Salzburg is Red Bull is this the
headquarter this is probably their
smallest building Red Bull is all over
Salzburg they own the soccer club the
only eyes have to fill up their own a
lot of things in Salzburg maybe it's not
but it's gonna be famous for we get you
the little time as well okay so onto the
topic you might have heard about VR AR
and have read some news VR versus AR
what is it different you know there's
quite many or quite some articles
recently about that and here there is
another term mixed reality and Jason
used mixed reality as well and because I
I send some confusion usually about
those terms I'm always kind of starting
off with explaining those terms and a
pretty good way to explain those terms
is using this what they call the reality
continuum that's actually pretty old it
has been kind of defined in the 80s play
by a guy called Paul Milgram and on the
right hand side and on the left hand
side you have kind of those extremes of
the reality continuum and on on the
right hand side we do have reality what
we perceive as reality what is so
nothing unchanged nothing on or altered
on the left hand side the other extreme
the virtual reality or the virtual what
could be kind of Fantasyland right
no nothing real happening in there and
everything in between is defined as
mixed reality so as soon as kind of you
start to mix virtual content with
reality and kind of attach that to a
context that is the mixed reality and
then we have two other terms
AR and a AV so if you kind of come from
the reality side and have a bigger
portion of reality and superimpose
digitally information location showed
you that's what we called AR augmented
reality if you come from the virtual
area and you primarily have virtual
content and add some real content then
it's called augmented reality no one
uses that term actually I think it makes
sense so if you if you think about
oculus HTC vive VR headsets and think
about that you could have your hands as
natural gesture controller and you see
your hands in virtual reality
that would be an example for AV again no
one uses that term so everything in
between those extremes is mixed reality
sometimes people mix up mixed reality
and say okay mixed reality is AR and VR
no they're not that's entirely wrong
VRS VR reality is reality everything in
between is mixed reality so I prefer to
make augmented reality because what
we've seen lately is mostly coming from
the reality side and we're superimposing
some information some virtual
information but Microsoft for example
prefers the term M R in my talk I'm
primarily using AR but for your
reference for the future M R equals a
are basically although VR and mixed
reality or AR share the same reality
part is that they are totally different
also if you have been working on VR
experiences and you want to move over to
AR you need to learn it's very
differently there are very different
concepts the use cases are very
different the benefits are very
been into VR to share some core
technology that it's kind of similar but
kind of the benefits and what the user
expects up there very different another
slide that's not technical at all but I
think explains pretty well what's
currently happening so this slide is
from a company an analyst company called
digi capital if you are in the field of
Mr AR it's absolutely worth checking out
their blog I think they do pretty good
analysis on this area and that kind of
predict what's happening currently is
the fourth wave of computing so we have
PC as the first wave internet it's the
second wave mobile as the third wave and
now we have this VR AR Mrs the fourth
wave happening and if you think of the
previous three waivers that have been
there
they have been affecting all our lives
right not just the single use case a
single branch or single industry they
have been affecting all over our daily
life and that's what kind of the
prediction here is well that AR or M are
we
also affect all our life the major
companies that are driving that nearly
entirely on their Facebook Microsoft
Google and Apple this is a slide from
2016 so before a Ark it has been
launched that's why I think Apple is
missing but what you can see is that AR
is already a big boy game who knows what
the Gartner hype cycle is we've seen
that before just to know what I should
explain it okay
I think the minority so the Gartner hype
cycle is a report published by Gartner
the consulting company every year this
one is the the latest from July 2017 and
they found a pattern how emerging
technology kind of moves through time
and they identified five very explicit
faces innovation trigger peak of
inflated expectations or the hype the
through of disillusionment the slope of
indictment and the plutocratic tivity
and they kind of map all the
technologies they see on this curve and
they are currently is here which is a
very interesting spot note this curve
doesn't have anything to do with revenue
or something like that that's just the
expectations that are set into this so
usually if there are somewhere here on
the peak of of the hype they think this
technology is going to solve everything
it's the golden bullet for everything
and so this part is pretty interesting
because it's kind of the spot where
before it kind of goes into mass market
and proves whether this technology
actually sustains in the market there's
also a shortcut to this cycle which is
very often overseen this shortcut goes
like this there's like no relevance it
anymore drops off from the market shows
doesn't show anymore in the market this
could still happen to a are I don't
believe I think AR is here to stay but
it still could happen to this technology
and if you're interested how this looked
like 2016 very boring Li
our state their great was also we are
basically state where it was 1716 oops
the only thing the only thing that
changed was the time also anything to
I'm showing this to you to kind of put a
our into a context or any frame for you
I agree with when this is going to reach
mass-market adoption a mass-market
yeah mainstream adoption five to ten
years I think we're still very early in
they are and we'll show you a few other
parts so now back to Arnold as an
Austrian and they are I have to show you
Arnold there's no way around that but
I'm showing you this for a reason
because sometimes a are is referred all
right the things that you see are
referred as Terminator vision and that's
from the very first movie Terminator 1
where the Terminator you know just
before entering into the into the bar is
kind of scanning the area or its
afterwards actually I think and it's
identifying this object so the thing
that you see here is actually quite
advanced I think we couldn't do that in
a proper way at the moment because it
involves on one hand an understanding of
the scene it involves identifying the
object it includes kind of knowledge
about the geometric shapes of those
objects you know the dashed lines that
you see that kind of index indicates
occlusion so there's something before
and something behind the interesting
object it has a database of all the
models and it matches those models
that's that's pretty advanced that's
nothing we currently can do i I mean in
some edge cases yes but not on a
general-purpose and here comes my second
question for a motor cooler which is
about the movie Terminator who knows how
old the movie is what's the yeah what
was it that's correct
how do you know I mean how do you know
in like 0.5 seconds
I didn't wouldn't know that Thanks I
[Applause]
find it astonishing that this movie is
already more than 30 years old and and
the creative part of in that movie
already had kind of this notion what it
potentially could be I mean I don't
think that we super far away of that one
but at the moment I think this would not
be possible and I what kind of want to
draw the kind of a timeline of different
projects that we've seen with a R or a
historic timeline all with the purpose
of kind of getting across the point why
I think AR matters so I'm not going to
I'm not showing you that in order to
impress you and think what is it so cool
but that's kind of desire dear that this
one key point or two key points I think
why AR matters so those were very early
hardware-based system in fighter planes
or in helicopters so with the head-up
display where the the pilot was injected
with real-time information so he could
see the target you could see other
things this is already kind of augmented
reality right we're overlaying some
digital information with the real world
in this this the purpose is that the
pilot can watch the scene in the real
world doesn't have to look on the
instruments but gets an immediate
feedback of the plane and the status in
the scene then we hit program and go
last year two years ago or other project
that are very similar program and go
from a technology perspective in AR is
pretty weak why I'm saying this is it's
purely using your location and it's not
interacting with this location at all if
you just close by you will get you will
get the ability to catch the Pokemon but
it does not interact with the
surrounding at all you know if if this
place would have the wall next to it it
would just not no it's a very simple
approach an approach we've seen six
years ago in apps that are have been
built with our sdk there's another app
the Olympics in Rio the
very similar thing they showed real-time
information in a location-based scenario
as did also the the London Olympics in
2012 so that's from a technology
perspective that's quite simple we've
seen this right they are on the face so
snapchat can recognize your face shape
faces are pretty uniform in a way all of
us nearly all of us have have a nose
probably except for tyrion lanister
every one of us has eyes every one of
you know the shape is pretty similar so
face detection is a solved problem in in
computer vision and then overlaying
effects is simple in a way as well there
are other examples also change Jason
mentioned that before
augmenting prints adds packaging where
basically requires image recognition so
I need to have an understanding of the
image itself this is an example from
Media Markt I guess this is a known
brand what they are doing is so they're
male information can be augmented with
this medium act app and you see then the
3d model of the headphones plus some
call to actions to go to the online
store it will see that in a second and
what that requires is that the app
itself recognizes the images the image
the in the field of view the idea here
is obviously to connect the leaflet with
the online store what you also might
have noticed there are no prices on the
digital information because they're kind
of personalizing that they know it's you
who are who is looking at the leaflet
and then they're personalizing the price
based on that that's what they have been
trying but and then kind of moving into
more advanced scenarios and use cases
field services maintenance is another
big thing and the enterprise and
industry so here they help field
engineers to be more efficient
probably that people can maintain and
repair things that probably wouldn't
have been skilled those are some of the
companies we're working with the same is
true for furniture and home decor so you
can pre vision
at home how this furnitures gonna look
like helps you save some troubles and in
arguing with your wife at a IKEA or the
other store or help people with
not-so-good fantasy as I'm problem or
imagination skills that you can look at
that whether this looks good or not that
requires a different technology than one
before that basically just you're gonna
place that in somewhere it doesn't
really matter where you as a user place
the furniture as you like
so you don't have to have an
understanding of what you're currently
looking at you just need to have a
pretty good tracking for tracking
training and support is another one I
think that's that's kind of a natural
evolution from simulators that you can
train with the thing itself
another example here is remote
maintenance or getting help from someone
remote this requires the user to
actually wear some smart glasses or a
camera that I'm that I can transmit what
I'm seeing the guy at the other end is
obviously more knowledgeable than I am
and you could guide me through the
repair process now checking for
batteries might not be the best example
I mean that the fantasy I think you
don't have to think very hard to find
repair processes that are more complex
where you probably would not make it
without some external help that's a very
popular use case we hear quite often and
I think we'll see that one more
particularly enterprising industry space
so kind of making a step backwards from
use cases to a more generic view onto AR
or M our systems and also VR system in a
way they all but all of them consist of
four parts a sensing part a computing
part visualized part in the projection
sensing the natural world or the real
world doing that for several reasons or
for one reason we want to understand in
which situation the user currently is so
that's usually a hardware to a hardware
task and it uses any kind of sensor that
you have on a mobile phone or on a
mobile device so gyroscopes
accelerometers compass could be the
microphone like Jason said before kind
of all your fingerprint to know it which
time
think we currently are the camera as a
sensor there couple or there are many
many sensors on there and that feeds
into kind of a computing engine
algorithms that kind of try to make
sense of all this information coming in
they fuse this information and then
generate some results that we hand over
to the visualization that's typically a
very standard 3d rendering engine
another standard one but 3d rendering
engines at the moment and the output
that on different front ends smartphone
tablets smart glasses whatever and
there's this kind of just divide usually
the right-hand side is very Hardware
oriented the left-hand side is very
software oriented that's a very generic
view on any of those systems if you
replace the projection onto sorry
if you pretty replace the projection or
you basically have is this the same true
for we are if you say this projection
unit is a see-through binocular glass
smart glass you have an AR mr system you
have hololens basically you can
categorize any of those systems through
that and I think in from a technology
company in in the aerospace all of us
work towards being able to provide or
enable this what we call the perfect
delusion and all four parts I mentioned
have have kind of why are at a certain
step and but have some evil evolution
pattern to it so on the sensing side I
was not super accurate in terms of that
this is purely Hardware play sensing is
it's both and in sensing with on a
mobile phone we have mono cameras we see
stereo come stereo cameras coming up
they help to do a better job in terms of
estimating depth in in AR scenes I think
we were going to see HD a HDR cameras
similar to our eyes or possibilities of
our eyes we've seen depth camera with
the latest iPhone X the one front-facing
depth cameras I guess we will see some
technology in the back facing world as
well on mass-market devices not just on
tango device
we might see radar sensors leader
sensors that are currently in autonomous
vehicles that all have the goal or the
objective to get a better picture a
digital picture of the surrounding on
software side it's all about
understanding the environment so sensing
itself is just getting kind of a better
signal after surrounding a digital
software side it's a lot about
understanding what the user is seeing
and this is an area that is not super
advanced so far we probably at the
moment are somewhere here so software
can recognize predefined images so when
we're talking about image recognition
this usually works the way that you you
tell the system we train the system a
certain set of images and this will be
be able to recognize the same is true
for objects shape detection for example
is already something that I haven't seen
in in broad broad adaption that instead
of I instead of recognizing a predefined
image you just define the outline or the
shape of an object and the system can
then recognize that one that's
particularly helpful if you don't know
the texture the color the property of
being able to recognize the shape of a
car cars come in various colors right in
various dirt scales so but you still
want to be able to recognize the car
itself plane detection is something
we've seen with a our kit in a our core
still at the very basic level you can
track multiple planes so the system will
tell you this here is a plane it does
not do the walls at the moment this was
this one is a plane that it's different
we can do tables but that's it
but what we actually want to reach is
being able to understand arbitrary
shapes we want to give developers like
you the ability to work with the entire
geometry after room ideally I would like
to give you a very or a detailed mesh of
this surrounding here so you can work
with that and let the user interact
and kind of the very last goal is not
only to provide a match but also a
semantic understanding of the scene
right chair chair person person person
TV that's something none of the SDKs is
doing at the moment and probably likely
will not do but I think you will see
some of those technologies be solved so
when it comes to computing the status of
2017 is actually quite advanced I think
what a are kitten accorded very well is
solving this problem who knows about the
term vaio or stem okay that's good no
one
so slam slam is short for simultaneous
location and mapping and it's actually a
pretty old problem postulated in
robotics
the key idea the key idea here is I'm
landing a rover on Mars or in unknown
territory and the rover itself has to
kind of draw a map of his surroundings
at the same time localize itself and
position itself at that and this is a
checking edge problem at Chicken edge
chicken egg problem if I don't know
where I am I cannot draw a map if I
don't have a map I cannot tell where I
am so this kind of this is the key
problem for that and there are various
solutions to that in context of
smartphones it's basically creating a
map of your surrounding when I say map
it means basically identifying key
points and that is super it's very
computationally intensive
that's why we've seen this now quite
lately vo is also from robotics is
called as short for visual inertial
odometry visual meaning I have a visual
sensor a camera inertial meaning I have
a gyroscope and I'm fusing those two
information to kind of get a better
tracking and a better
Bobby say pose or position of the user
the camera is pretty good in terms of
detecting texture and texture movements
and also if I'm moving forward and
backward like our ears gyroscopes are
pretty good when it comes to rotation
our our eyes
the gyro is the equivalent of our ears
and our brain is exactly fusing that
information right and it's syncing that
our algorithms are doing the exact same
thing if the phone is purely rotating
I'm kind of taking more information from
the gyroscope and treat that as the
dominant movement if I'm doing mixed
movements I'm using both information
parameters and both Apple and Google
have solved that pretty well in a arcade
in a our core but it will be more in
that as well when it comes to
visualization I think this is all one of
the key parts that are not yet solved in
a way that it's sufficient chasing
before talked about believably
believability and we're somewhere here
right we do have pretty good 3d
rendering engines we have physics
entrance that we can apply rendering is
somewhere at 60fps or faster VR systems
render at 90 FPS usually but then there
are stuff here that make object virtual
objects appear even more realistic
artificial motion player so if I move I
need to have some motion blur on the 3d
objects as well atmospheric effects that
are inherent to the scene and then kind
of the holy grail adaptive light
rendering that is the thing I haven't
seen yet Apple is kind of doing that
with the front-facing camera and kind of
estimating where your main light source
is coming from but it's only for the
face because again the face is a very
known pattern and you can then roughly
know where the main light source comes
from but that doesn't necessarily help
me if I want to place virtual content
right in here the light scene is very
different from what I'd seen currently I
think I have strong light coming from
here in the auditorium you have very
homogeneous soft light and lighting is
probably key to make an AR experience
realistic or not this makes a huge
difference if light does match with your
expectation or the expectation of your
brain your brain would immediately tell
you this is something artificial and
when it comes to projection
this content out to you again or back to
you again are in the predominant way at
the moment is smartphones and tablets
there's some HMDs coming out and smart
glasses coming out orgy is one of the
vendors in California and I guess Apple
at a later time will introduce smart
glasses as well
Jason also mentioned before hololens
with a very limited field of view that's
super annoying you know if if you're
wearing hololens as you're basically
seeing like this in digital but we
perceived the world a lot around more
around that so that will be a next step
as well and then who knows maybe at a
time we will wear smart lenses that will
inject the digital information or even
going closer to the brain have kind of a
brain bridge that injects that that's
all what probably will come up some of
them might be speculative but taking a
step back to what are called AR 1.0 this
is from a excavation site close to
vienna it's called Cunningham has been a
Roman city and this arch here is like it
is and they put up this a our content
to visualize how the arch looked like
two thousand years ago and I call it one
that all because this has no digital X
is no digital experience it's a pure
analog experience it's mounted on kind
of a plexiglass or glass shield and it's
hand drawn how this looked like two
hundred two thousand years ago if you
look close there is even it says high
level so you should put your eyes at
this level to then get the augmentation
I get how this looked like so if this
already is there thirty forty years ago
and it's kind of analog there must be
more to a art than we currently seeing
with this digital hype and I think what
you saw in the past ten minutes or so
with all of them there are two kind of
two key takeaways why AR matters and
might matter to you and your company at
a later stage
it makes the invisible visible in
various aspects the in
can be real-time data real-time sensor
data in your enterprise it can be
monsters and can be game count it can be
anything right and the second thing it
connects the offline with the online so
those two possibilities is what the
technology itself gives you I can give
you and your experience and might
actually transform your business in one
way in the other and there's one other
magic keyword I'm coming to that which i
think is super critical for a our
experiences what a date a few years ago
is I compared how Google Maps is
implemented on desktop is implemented on
mobile and could have been implemented
on a are this is from the very first
Google glass promotion video it never
looked like this but I thought it's
still it's a kind of a good idea it will
look differently though so the first
thing you notice is that on all those
three screens for the same service the
density of interim a information that
you present to user is increasing a lot
right on desktop a lot of options huge
screen and predominantly because of the
getting smallest screen state but
desktop has many options you can click a
lot you can do a lot mobile already
reduced reduced options the menus are
hidden the menus that he can directly
hit from the very first top-level are
reduced and an AR you basically don't
have options at all you only get routing
from A to B and some some information at
the same time the context this
information is delivered is increasing
and what I mean with that a guy sitting
in front of desktop I have no idea about
the location not fully true i roughly
know where the guy is if the user is not
locked in I don't know actually who it
is I don't know whether the guy is
actually looking at me and looking at
this or not using it in a mobile phone
we know the location we know roughly
orientation we can guess that the users
currently using that and with AR I even
get more information because I basically
have the full attention off the user and
my claim are my
purposes if you add up that in a are the
relevance of the content that you
deliver is higher because you can
there's also debate going on that if if
you build the AR experiences that don't
tie into a context and don't respect
this context don't fill it at all with
AR it's just a waste of time in a way
and one of the examples and I'm I still
don't get my head around this why Apple
did this in their keynote I don't know
anyone watch the Apple Keynote with the
AR game a few of you okay then I'll
quickly explain it they had a table a
wooden table and one guy from a game
company was showing off their AR game
and so this AR can basically span the
entire table and the guy was going
around was kind of a shooter tactical
game not shoot a game it was kind of a
tactical game right so you could deploy
star starships and you had to protect
your base so this guy was going around
all the time but it had no tie to this
table except for it it was there but it
could have been on any other table so I
have no clue why why a are matters in
this context I think even they don't I
mean the game looks fantastic but there
is no difference what I use this or I
play this game on the smartphone itself
except for that I can move around that
but there is no benefit in that so if
there is no clue there is no clear
relevance or context that you deliver
your AR experience then don't do it then
it's not kind of worth doing that in AR
or you might rethink your experience so
think context for AR how you get a lot
of context you get a lot of information
context can be oh yeah
the users currently looking at this
particular image that matters a lot and
that's one of the key aspects I think
for ER so that was the first part second
part is how to create a our experiences
in JavaScript so we're moving a little
bit into the technical area but I'm
giving away my third Mozart kugel
and this device has been super important
for us as a company if you look very
closely it even says Wikitude here the
palm tree was our very first app I can
I'll tell you a story in a second my
question is what's the name of this
device one person that's in believable
it's an HTC phone that's right it was
not a developed furnace was the first
commercial Android phone the Developer
Forum look differently yeah it's the
yeah it's the it was as you can see it
was marketed by t-mobile it was the
t-mobile g1 g1 or HTC dream but still I
mean okay I can't shoot that far yeah it
is it was HTC dream and and t-mobile g1
okay I'm I'm asking this question at
nearly every every speech and we kind of
look up whether it's the HTC dream or
not it's interesting that so many people
don't notice it it's really an iconic
device is the very first Android
commercial phone if I would put up the
very first iPhone I guess 99% would say
yeah that's the very first iPhone people
really you people really know what's the
what was the story of Android with this
awkward slide up mechanism the story is
that Google had this Android developer
challenge to boost the ecosystem and
award a ten million dollar to app
developers distributed onto 100 winners
and wicked was one of the top 20 apps
back then and the prize money was then
the funding for the company and as part
of that it was also pre-installed on the
on the device so that's that's about the
t-mobile g1 and Google device
kind of mapping what we heard before
from the components on to an sdk
architecture slide this is actually our
it in a way it could why I think most of
the SDKs out there have similar
components what do you see as kind of
the hardware layer or the base layer is
what we pack into the SDK is camera
access and access to the gyroscope the
IMU the inertial measurement unit and
then tons of optimizations code
optimizations for nearly all of the
mobile platforms so Intel there are some
phones and smart glasses running on
Intel devices majority runs on arm our
MV 8 is the 64-bit architecture so our
code is particularly optimized for
64-bit not only that it runs on 64-bit
but it takes use of that of the rich it
registers and there are a decent number
of GPU optimizations to make the code
run in an acceptable manner because most
of the ecv part the computer vision part
is quite heavy and that's the layer on
top our core components OpenGL rendering
metal rendering and then our own slam
engine we are integrating in our kit in
our core and kind of wrapping that
there's an engine a part that it does
image recognition and object recognition
I'll come to that in in a second
there's a cloud recognition part as well
I'm not talking about the plug-in
manager and then there are components
doing the rendering with a lightweight
rendering engine as Jason said rendering
on a mobile phone is a very different
story than rendering on a on a desktop
or for a movie and then augmentations
that's the content that is overlaid and
the location-based services so our SDK
runs on Android iOS and will run on
Windows 10 there are several ways you
can create applications but it's not the
point here today the point I'm talking
about is going to talk about is the
JavaScript API this JavaScript API wraps
all that I said before
so we're kind of looking at this one
you're creating your experience or your
app in this way here
if you is anyone familiar with cordova
titanium examine one two three four five
six seven eight nine ten
yeah you could use those frameworks as
well if you would like to work with us
but again that's not the point here if
you're going to work with the JavaScript
API you still have to include the SDK
into an app there's a very slim native
API I'm talking here about Android so
there's a very slim Java soon to be
Kotlin api that basically just wraps
lifecycle handling so I'm uncreate on
pause on resume methods and it loads an
architect world so let me introduce the
term architect world it's our own term
for AR experience written in JavaScript
it's a pretty old one as we could do
comes from geo based we had this term of
world an architect was our play on AR
and creating something there are the
things but they were optional not gonna
talk about that so how do we do that how
do we actually create or render
JavaScript or allow people to write
augmented reality so what we have
internally are two views there is a
native OpenGL view that we render it
renders the camera stream at renders
augmentation but the top of that a
standard web view from Android and iOS
that loads the HTML file and thus
basically file loading and asset loading
and it's kind of special because it is
transparent background so those two
views are merged kind of on the phone
and you will see how that world how that
works an experience for the JavaScript
API consists of regular HTML files
JavaScript files and CSS files this is
kind of what defines your experience if
you include this snippet it will make
the API work this looks like a regular
call to a website the normally the
question now comes would that work in a
chrome on a mobile phone or any other
browser no it will not not yet at least
you still need to have the SDK loaded
the SDK will inject the content of the
script if even if you call the URL you
will get a four or four but the SDK
loads that script on the device itself
and then makes this experience working
that you throw into the SDK basically
execute and trigger actions in the C++
layer but I didn't mention before
everything up the hardware stack is the
plus plus the hardware layer is C++ and
assembly everything below is in
JavaScript are the native languages like
jason said before i'm kind of nearly
copying what can you what can you show
as an augmentation what can you show as
content you can show images videos
videos with alpha channels or
transparent background videos 3d models
of any kind web views and labels and you
see the corresponding JavaScript classes
in our namespace they are so we kind of
have a MVM MVC architecture methodology
applied so you have models you have
views associated and a controller to
capture that the models as you see
correspond to the things you can do in
the SDK the views correspond to what I
said before what you actually can show
and the controller then clues that
together a pretty kind of old example
but I think it explains very well the
different aspects you've been hearing so
in the web view the regular web view you
can render any kind of HTML content
that's usually used for menus user
interaction static content you can
include any JavaScript library that runs
on a mobile phone whatever you want so
in this case we're looking at an
application from hotels.com it shows
open hotels around you and they are
dereferenced so this is rendered in the
AR view but still defined by you in
JavaScript and as I said you can include
any
script library the project as you can
see has already some some years on its
back it's a job that jQuery
implementation but it's that's totally
doable so this was an example for G
located a location-based content when we
move to image or market-based
experiences that mark with NS a marker
this kind of spans everything from very
ugly fiducial markers or QR code markers
to natural images to objects to anything
that is kind of predefined and can then
be detected by the engine in this case
how this would work you upload the
reference images like those here to the
target management tool you can do that
in a web UI you can do that through a
REST API the target management tool that
in the background is kind of creating a
digital footprint of those images and
what we then in the engine do we do kind
of a fingerprint match what we currently
see in the camera to your provided
database by the way you can all try that
for free I didn't mention that before if
you're if you're curious and I'm set up
because in this example in this
JavaScript example we're using what we
call a WTC file or this Target
collection file to define what we
actually look gonna search for the
example here is taken from a catalog
from a furniture catalog so we want to
augment this image here in the catalog
which is I think it was actually in
grayscale and we want to overlay in a
video of how this kitchen looks like
with a button and what's happened
basically those four lines of code would
get you there so as I said the first
time the taya collection resource
defines what kind of images we want to
be able to recognize and it said we did
that before and downloaded this file you
can load that remotely you can load that
locally we tell the system we're talking
about an image tracker that uses this
Target collection resource so we're
looking for images there other as an
object tracker or instant tracker the
overlay should be
which we call video trouble this is from
a local asset and rendered at 65 percent
size and then the tractable object is or
the image trackable is putting that all
together we tell them which tracker we
tell which image we actually want to be
recognized and we draw this overlay only
on this image am i running out of time
that's good because this is my very last
slide so you can try that on your own
it's it's pretty simple there are other
materials on our website if you're
interested this is one of the examples
of image recognition I thank you for
your patience at evening I think no one
at least I didn't see it fell asleep and
I think I have four modes of coogan I
don't know whether we get four questions
but thanks again if you have any
questions I'm here afterwards or you can
drop an email I was right yes thank you
thank you okay so for questions happy to