Transcriptions
Note: this content has been automatically generated.
00:00:02
was so yeah i am so everybody's talking about it obviously eh um
00:00:07
and we collecting more and more of it if for a study by in c. for 'cause the by twenty twenty will have
00:00:12
like thirty five that abides of machine generated data but the essence of collecting data is getting inside out of it
00:00:19
and that's we're introducing new applications is application sort of exploratory nature which means
00:00:24
that they are dynamic so we don't know a priori what kind of
00:00:27
workload work on that the sort order banning gone and also that the queries
00:00:30
depend on the data as well as the result of prior queries
00:00:35
examples have thought such applications are um uh scientists exploration applications such as this remembering
00:00:40
project or astronomical observation experiments as well as modern in terms of things application
00:00:44
what the user does not know exactly searching for what he's looking for interesting part
00:00:49
and these are the the increasing data collections as well as these exploratory nature of modern applications
00:00:56
creates new challenges for data processing system specifically the user
00:00:59
wants instant access to data so multi processing time
00:01:02
also the the uh the product either the user relies on
00:01:06
the interactive query response time and finally increasing data size
00:01:11
also increases the the requirements for storage and computation applications
00:01:15
so coming up with cost efficient storage interpretation solutions is another charge
00:01:20
with the face but let us see why these actually problem
00:01:23
so a conventionally in order to explore date out scientists to use databases
00:01:29
and however introduction to start quitting you have the first load anything's
00:01:32
the beta indices are essentially a and read on the end of
00:01:36
the lady that structures which make known data access parts faster
00:01:41
eh it however eh the what what is the trouble is that in actually to
00:01:45
load in in this data this preparing the preparing step is very time consuming
00:01:49
on the right hand side you actually see the can will that execution
00:01:52
time for a executing game i interned of things work load
00:01:57
and what we see is that um see how the user has actually to load anything so they dubbed the
00:02:02
grey line which shows agree time is uh is starts up high since it contains the processing time
00:02:09
a a in my in my research uh uh yeah in my research what i do is actually um try it it
00:02:15
it would it and into into it you enable the existing
00:02:19
query engines with interactive capability a interact the expiration capabilities
00:02:24
by taking advantage of the underlying data distributions and i uh adapting to the work we work out why run queries
00:02:30
you know what she's that i eh develop online tuning algorithms which a body in a build
00:02:36
overlay data structures as byproduct of quick secretion and also by reducing the result decision requirements
00:02:44
eh tool by by doing that we actually remove the
00:02:46
requirement for preprocessing we reduce the storage overhead
00:02:50
and we enable the the the user to explore data efficiently while