Transcriptions
Note: this content has been automatically generated.
00:00:00
exactly so thank you very much for the invitation and have a nice introduction
00:00:04
so i'm i'm not gonna be too long 'cause it's uh almost time for the up
00:00:08
arrow but yet to them gonna talk about jenna makes uh yeah and and prices so
00:00:12
just before anything else just start with a little introduction about our company so it's called
00:00:17
next going at it takes uh we are artificial intelligence and uh that dan ellis is uh
00:00:22
service providers so we work with academia i would work with
00:00:25
the fire mind history and also um i get in and
00:00:29
uh we're focused on i've signs so we are experts in a and g. s. which is next generation sequencing
00:00:35
so on genome sequencing um transcript domains which sequencing of the expression of jeans i'm
00:00:41
always we have i. p. but we developed together for for more than ten years
00:00:46
and that's actually where we we specialist um so before going into
00:00:51
um into gentle makes let me just start with a situation that
00:00:54
i i'm sure all of you um encoded once you you you wake up you want to go for work or uh or run
00:01:00
and your your shoes look like that so they are company broken so you go on your feet websites to actually buy
00:01:06
some shows and this seems completely harmless but actually we all
00:01:11
everyone is room i think knows that we uh actually feeling
00:01:14
quite a lot of data to the website where we have visiting the and buying a our next gift
00:01:19
and it's it it seems that's we are with anonymous in all of this data because we
00:01:23
know we we give data but with the amount of data that we actually getting to to everyone
00:01:28
and the amount of people doing the same it's maybe it's difficult for us to be traced back right even if we're not loading
00:01:35
but actually it's not sure you just need to have a couple of data
00:01:37
points to be able to infer precisely which can be italy's is who actually
00:01:42
and it's it's not on meeting you i mean it's it's been proven also but uh this mit researchers that
00:01:47
show that with a just a few data points um the from your credit cards uh actually for data point
00:01:53
uh from one on one uh one that one million people um they could actually choice ninety percent of
00:01:58
the people unique so they don't need to know what you buy the just the time and date and
00:02:02
that's just enough to uniquely identify people so if you have have it's it's very easy to find and
00:02:07
you know we record behaviour in out of the so how does this translate to channel make well actually
00:02:13
as you may know that we are all a form by a genie right they need something to instruction to construct a human
00:02:19
being from beginning to end so it's it's quite a lot of
00:02:22
information but how much information actually is is in there so um
00:02:28
it's not moving i guess the computers reachable because the pictures speak but uh so it's all encoded
00:02:33
into well it's what encoded into a four letters right it's this eight a. g. c. that you may
00:02:38
know that founded in it so it's it's just the name of the different money 'cause that are
00:02:42
uh as a link to each other to make this very long string of character defines a our genome
00:02:48
and how long is it exactly for human well it's three billion of these letters one after the other
00:02:54
but if you want to print you would actually end up with a thousand
00:02:58
books of a hundred a thousand pages on the books of a thousand pages
00:03:01
which is a hundred thousand pages um of little letters that
00:03:05
defines just who you are actually and to discriminate between two individual
00:03:10
uh from this phoebe and uh bases you just need to look at seventy places
00:03:14
statistically to distinguish one of the eight billion a person meeting on this planet so it's
00:03:19
it looks like a lot of information but actually to to discriminate individual it's it's quite easy
00:03:25
and it's not completely lost in the in the amount of information that you how so what actually can you read uh
00:03:30
in the genome um and what is what is there to protect their um so jen energy obviously i mean we all
00:03:37
did uh just moments which we know that the jury energies obviously something you can infer from engine on the just looking
00:03:43
annotations you can infer you parents or siblings and your cousins um which is uh some people actually do that for a living
00:03:51
can you actually track a genetic diseases and cancer risk uh just looking at fermentation you can have a new d.
00:03:56
n. a. well the answer is obvious yes because we know that you have this position maybe seemed happy that's booming now
00:04:03
and e. does you know did mine they um they well after beating
00:04:06
the chess game uh the go game the protein folding again without couple
00:04:12
the games paper like a month ago now that is called a and family sense which is a a to do
00:04:18
predicts mutation and to know the pathogen e. of each mutation
00:04:22
on the job so you take the four dimensional disproven bases
00:04:25
and for each of the different mediation you can have compared the it's a a normal individual
00:04:30
you can measure the pathogen you to visit risk for this this is yes no
00:04:34
and you have a risk factor so it's um it's it's just came out now
00:04:38
um can you look at the hair colour and i colour yes obviously this we all know it's it's just come from the normal uh courses
00:04:44
that we don't also but what about heights is is it possible actually to tune all the hype of unusual just looking at his d. n. a.
00:04:51
well actually yes you have study that shows that um you have a big
00:04:54
factor that is coming from the genetic and you can have a pretty good estimate
00:04:58
like ten centimetre and it's not like it's gonna be extremely precise like for the age you
00:05:03
gonna be riverside was you have a very good idea and why wait actually is the same
00:05:08
because you have a factor which is genetic from the um the metabolic
00:05:11
that you have the metabolism of course you can estimate then have a
00:05:16
everything will depend on your diet in yours the sport you do in that i study huh but in general uh you can have a very good estimate
00:05:22
the h. from the genie from the faces apparently it's easy uh uh from
00:05:27
the dean is it also possible uh yes because you have the gender genie
00:05:31
that changes across what what age it's like the end of the call was
00:05:34
only at the shrinking becomes once more if you can have a pretty good estimate
00:05:38
and finally is it even possible just system interface of someone just looking at that ginny and actually
00:05:44
yes you have uh the trend is the the neural network trying to recognise and linking the in imitation
00:05:50
and what just chit jenny in general and the the the shape of faces and it's actually doing a very good um you know this
00:05:56
like this portraits the two for the the police it looks very similar it's not it's not the precise picture but it it's close enough
00:06:03
so with all of this and i'm not saying you know obviously but it's very clear that
00:06:07
generic data is very sensitive and needs to be very uh secure 'cause it's a privacy uh concerned obviously
00:06:15
so how long ah yes last thing uh just the treatment efficacy which is now the the field is moving
00:06:22
because you can sure get in invitations and you can actually pretty
00:06:24
if the treatments for instance let's say for cancer and can have stratification
00:06:28
after different diseases and you can have uh the best treatment which of course is a big uh is a big win win for the
00:06:34
uh for the firemen is if they can predict if the drugs working that one with ones and and of course but the for the patients
00:06:41
so how long exactly does it take to have all of this information uh uh well
00:06:45
with the new technology that are developed now this is another aspect uh from human now
00:06:49
one of the sequence machine that is the most the common now it just takes one hour to have your
00:06:54
functional so the time for you it takes two to eat lunch that they can actually sequence or full uni
00:07:01
so it's it's very quick and this goes of course with the um
00:07:04
a drop in the price in the general price of uh of the sequencing
00:07:07
of william some i i guess you can agree but uh when they
00:07:11
started sequencing the genome so you the human genome project in two thousand one
00:07:15
uh the cost was a hundred million dollar and into doesn't twenty one it's less than one thousand or
00:07:20
and they are talking now at last no hundred dollar to sequence of for a human so it's it's getting cheaper and cheaper
00:07:27
and obviously with the the the um this is price dropping you have
00:07:30
an adoption in general of um of precision diagnostics as i was saying
00:07:35
you can just have a good idea for drugs can work or not uh or we can better better characterise the diseases
00:07:40
it's much better for the for the um let's see the the full house system so they're trying to push this and you have a
00:07:47
this this increase that is predicted from the um the domestic market
00:07:50
actually present agnostic market would increase a lot during the following year
00:07:54
and this is something that timber isn't and it's from up to five and this this uh this thing on fourteen business inside
00:08:00
so um yeah so you have this increase of data of generally data in the public domain in austin
00:08:06
in general i have more more of the state and as i was doing at the beginning more data i have the less privacy you have
00:08:12
um so you need to have solutions for this so the different glitzy attacked
00:08:19
that can happen uh engine regarded the same that were discussed before in peace talks
00:08:23
so you have this identification in this fan type inference so even um what
00:08:28
everyone is doing obviously to comply with a different laws is to have anonymous data
00:08:33
or the identification of the big data sets so you you have all the generic data
00:08:37
you have no information on the um on the the names of the people in the study obviously
00:08:44
and it's just action doesn't work as was said interest talk because you have this paper from i
00:08:48
think it doesn't thirteen that game that's just they identify the participant in a big study my name
00:08:54
just crossing different data sets looking at the different they could infer ceilings
00:08:58
in genealogy interested because you have some of the data that is probably
00:09:01
like the h. some demographics and it's enough crossing with the public databases
00:09:06
to identify people in the study by name you can have the full genome
00:09:10
just like uh some some information um and of course this is very useful to sort a to sort of court
00:09:16
cases is if you have to have this this also i think i i think was five or six years ago
00:09:22
they had this uh this cold case of uh for criminal in the us it was not fun for very long with the hat
00:09:27
the genie so they could trace back looking at probably data sets and they could actually trace
00:09:32
back uh siblings ask the siblings and they could trace the the who it was and counted
00:09:37
i'm hiding information from production homes like masking
00:09:41
some information that would be very useful to recognise
00:09:45
what the corrected aspiration it's a uh for instance you mask everything that is related to diseases
00:09:50
um this is difficult because you can actually inferred information from all the genes
00:09:55
so as soon as you have a germs that match under close enough by siblings
00:09:59
the missing pieces you could replace and it's it's very difficult and also since
00:10:04
a lot of data has been in the public domain for why you completely lose control it is the the data
00:10:10
is is actually there um and you have you markers i bring everyday because people are doing research on this is very
00:10:16
important for new markers of uh of diseases and if you mascot genome then everyday you would have to mask and
00:10:22
if people download the data locally you have no control on what to do with the data so that's this doesn't work
00:10:27
and finally doing aggregated statistics this is what just on a um and works quite
00:10:32
well um but you have to be careful because if some people have and say
00:10:38
partial information on the gene jenny to go for your individual they can actually interfere
00:10:43
if this person was in the study or not so you can know if they are
00:10:46
in uh the case group a or b. knowing if they are with a disease or not
00:10:50
it's a bit like uh the um yes with your networks when you try to infer if that they that was
00:10:55
used to train you can do the same scene of things looking at the the some genetic markers if you want
00:11:01
of course it's not a it's a it's probability that it's it's actually bigger such
00:11:04
that ends in a lot of people are just giving a their genetic data for free
00:11:10
or for very cheap uh two different companies as you may know twenty three and
00:11:13
mia and street you need all this uh i make an a company that are just
00:11:18
for very cheap to take your genome this you can see from being twente uh
00:11:22
and they do some analysis and then you have access to the results we can look
00:11:26
what is your ancestry if you have a risk of disease uh and you have all of this information is available for you and for very
00:11:32
cheap but because actually they are just guessing data that they can use
00:11:35
to do big study that twenty three and we actually did a big um
00:11:39
cell of the data to happen didn't send the data itself the salt some of the results they found
00:11:45
to develop new truck compound so they're working now with a with a farmer company to develop destruct
00:11:50
i'm from the data that it just acquired basically for free and as you
00:11:54
know when you you defeated to to any company you have to hope that
00:11:58
the to the data safe and this is just bringing a lot of work
00:12:02
to my mean because the um actually it's october seven's wasn't in the news
00:12:06
the some crackers were selling the data of millions of um of relief that things from twenty three
00:12:11
in me because they they had access i think it was in the chinese the oration that the base
00:12:15
and they have access to the looking so they could just log in and just
00:12:18
downloads basically the um well just look at the information that any user would have
00:12:23
access to which is all the mutations you have and you can download the data
00:12:27
and are now selling it on the on the dark weapon according to this article
00:12:31
uh and it's a vision i mean that twenty three me acknowledged um and they are very sorry and
00:12:39
the problem is that you don't have a lot of newness of uh action to do anything about that because you have the um last
00:12:46
obviously which is the the the most important thing that should be in place you already have the um different laws according john makes which is
00:12:52
very strong uh you cannot of course have names like with the the agenda date that you have to comply with many things
00:12:59
um but still i think the low it's not strong enough
00:13:02
um because of all the information you can extract from genome
00:13:06
uh i think it should be very clear definition of cool and
00:13:10
what can be done with human genome a study because it's a
00:13:14
it can be it can it very to be probably don't want to have any you know the screen is commissions a of a
00:13:20
of people from hell says them or whatever i mean it needs
00:13:24
to be very protective something that is already in place is control
00:13:28
access so now more and more trucks they down line which is
00:13:32
it's becoming tricky to do analysis but the they are blocking data
00:13:35
meaning that you need to apply need to feel um the full document explaining why you want to have access of the data
00:13:41
what you are going to do with it and that you will destroy the top you after you have been doing you analysis
00:13:47
and this is very important that this is in place to just control who can access to the
00:13:50
data and what people do with uh finally you have storage security of course in france for security
00:13:56
uh this rule we always at the things we tried our best to not keep any day that or
00:14:01
because we want to have to deal with all of this which is to then i says and then we remove
00:14:05
it and same for the transfer uh if we can avoid to transfer data because it's very sensitive data then we can
00:14:11
just run it on site or anything but for that you need to have also strong encryption when you exchange data
00:14:16
and it's quite as you can imagine is quite a lot of data which
00:14:19
is not easy to include the crypt kind just to stand over anyway doesn't work
00:14:24
uh and one thing that really helps but is not
00:14:27
used um at almost i think john excuse the cripple graphic
00:14:32
'cause you cool it's typically do this a home or fig encryption and try to do
00:14:37
computation on the encrypted genome on the clouds and you don't really know what you are looking
00:14:42
at that you could do some analysis the promise that with jenna makes it very often
00:14:46
you need to really understand exactly what is happening so you need to know exactly which mediation
00:14:50
related to put the data sets no it's this one's goes to this chains with this this and and you have
00:14:54
some biology to to to do not only just machine learning you know so it's you have a mix of two
00:15:00
and we knew you just give a result to the badges they want us to to know the full detail of what you
00:15:05
have been doing it needs to be in you need to be transparent it's not gonna be a full machine or anything for you
00:15:11
put it in the black box and have an output and you need to spend a
00:15:14
biology somehow so it's it's tricky but there are some ways and i i mean i'm
00:15:18
thirty and you can plug off eleven live science engine or so i have no proof
00:15:22
to directly but i hope that there will be a clever way to actually do this uh
00:15:27
this uh this analysis or query information without having to decrypt the genome so then
00:15:31
they are they're sick let's see and obviously keep the data encrypted and probably data sets
00:15:37
so yes i'm not gonna be much longer than that um but some of the
00:15:41
take home message i want to uh to transmit to today's that should be good
00:15:44
ice on the right the rice um so you have more margin device it's really
00:15:49
crazy the uh the size of the study that are being put their uh online
00:15:53
nowadays and some of them are just in public space you can just download that you know the more uh and we we need to have a
00:15:59
better like a framework uh to to define and and have a do and
00:16:03
don'ts of the genie basically like what you can do what you gonna do
00:16:06
uh and we didn't name the low d. v. d. if they want the it should exist i'm also just
00:16:13
always work with trusted partners and sign privacy agreements asking for the removal
00:16:17
because i think it's it's very important not to keep the status somewhere
00:16:21
uh you know not on the hard drive just to hang in there
00:16:23
and taking a dust because it's uh it's it's very sensitive data and uh
00:16:28
yes just a racing at what awareness that that sharing your genome is
00:16:31
not only a just a like a given your email address is a good