Player is loading...

Embed

Copy embed code

Transcriptions

Note: this content has been automatically generated.
00:00:00
oh on the night table which on as as possible um try to say
00:00:05
um so of course think globally professionals level lotion nazi from the university of geneva
00:00:12
um he's a professor with the department of computer
00:00:14
science and the head of the stochastic information processing
00:00:19
slowly slowly talking to possible privacy preserving machine learning
00:00:23
just don't give a little theoretical and practical considerations soul
00:00:29
um yeah well i'm just well so you have about thirty five minutes and then five minutes for questions okay
00:00:36
right okay thank you very much for your nice introduction
00:00:41
so for me it's a immense pleasure to be here thank you
00:00:43
very much for innovation and especially to speak about privacy and security
00:00:49
and the expression this place incision reputable place where if you
00:00:52
have a room full of professors the birth of my failed
00:00:55
was one of the face with european either either the main rising secure if you think you also quite over your patient
00:01:03
and today uh i would like to share with you some of our findings
00:01:07
and is the main and uh i will try actually got that
00:01:10
as this problem from both theoretical and practical perspective main imagine might challenge
00:01:16
to speak about this you area oh for you begin privacy and to give you an very short time in
00:01:21
thirty minutes all details so i will try to do my best trying to explain maybe not so much mathematically but
00:01:27
with my hands and hopefully if it works if it doesn't work don't hesitate to ask questions afterwards so first of
00:01:33
all where pushy they would like to acknowledge the contribution
00:01:37
of my great collaborators and and and and and students and
00:01:41
the first one actually a bit who contributed to boost the main the most of it's a lot of ideas
00:01:46
with all the contributions and innovation so that's that was now he's a member of for and which might be yep
00:01:53
the group also with animals and a and the menu menu resulted i'm going to
00:01:56
present here are actually you're right but no pain was you direct participation is his
00:02:01
initiative then that i'd critiques a so proper goals you or you get around that
00:02:05
would yak uh italian you know who is also here and should uh with the uh
00:02:10
so that is all commuter continue by yeah and also they have had great liberation bizarre where weeks from
00:02:16
harvard university that professor the like the game on one of the guys in privacy
00:02:21
especially when it concerns too radical aspect machine learning then is when windows from imperial college
00:02:26
and also cool because someone your it known will contribute or to the main over
00:02:31
a shorten coder and uh normalising flow that the that is it any longer than that
00:02:36
though is by the way that you could come into fell and now he is one of the researchers at the i did mine too long
00:02:43
okay so i don't really first with a very light justification is probably a
00:02:49
known for all of your about the need to have the ah by this
00:02:54
insecurity machine learning can i i and i will try to give it from
00:02:57
the perspectives that first introducing the general concept of a i or ensuring from observation
00:03:04
and just did you better understand from it which more
00:03:07
method which plays a privacy security comes to get and then
00:03:11
um once we understand that is really a very important
00:03:14
to have this aspect properly addressed we will come through schemes
00:03:19
one based on the optimisation or we can go to learn about obfuscation second one based on the jury secrecy
00:03:24
i deliberately will not talk about different crew to base approaches based on how more encryption because
00:03:30
anyway it will be not possible to covered your it more i spoke to several speakers will
00:03:34
address is so i only will focus on these two approaches and then i will give you
00:03:38
all non exhaustive list of different applications were deprives
00:03:43
insecurity uh do play a very important role and
00:03:46
few weeks use that may be of some of them are white and expect will you so
00:03:50
let's see what will be the outcome of that or so before i start i would like
00:03:55
to maybe to summarise something that is quite old news but nevertheless in the seniors you observe
00:04:01
several important tendencies and that the the standards is
00:04:05
a very important to emphasise for our presentation today
00:04:08
so the first and this is a view observes it evolution
00:04:11
from the convolutional network toward the can force that basically allows
00:04:15
whooping better presentations and the since we talk about the transformers
00:04:20
especially to three last year that is a very very impossible
00:04:24
it is also paying bills so cold mask image but i think that is the one of the type of accommodations
00:04:29
that actually produce a top results in especially in visual cost what we're mostly concerned about but metrics
00:04:35
here and that is actually deserves a lot of attention and still more studies already cost i guess
00:04:40
also allow me to mention what's happening right now in
00:04:43
generated by i that concern bowls natural languages and also images
00:04:48
and we're the observer air pollution from say oh what h. open call that it's gotten so close
00:04:54
normalising poles to work though decently booming framework such within within the fusion probabilistic model b. b. m.'s
00:05:00
that produce oh really impressive results and actually raises the same time little concerns
00:05:06
also inference models to be uh observe in evolution from uh
00:05:09
they're supervised learning masses to work to sell supervise working muscles dimension
00:05:14
and also that that are extremely important for privacy security yet they have their own issues i will speak about them hold it
00:05:20
and also uh what is important is that all this concept led to lose the appearance of new business models
00:05:27
so for example where you have learned model and basin did learn model you
00:05:30
can provide a service strikes or uh based on this is the foundation model
00:05:34
it might have also quite important parts for the privacy where you can train something very
00:05:39
large public database and then you try to fine tune it that to adopt it to
00:05:43
this sensitive or privacy sensitive it okay are just not too big an exhaustive here but
00:05:49
i think no need to convince use that drive security or are both extremely important magnification nowadays
00:05:56
whether you speak about surveillance kinda person like services by matt excels pure commerce
00:06:02
i will not go into the details of that i i'm pretty sure you read
00:06:05
a lot of for blocks and information it is and then the about this especially
00:06:10
just to emphasise that very important it's it's all right to do the roles was on that how we want to pose as an where
00:06:16
you want to drive the words to a digital economy the words
00:06:20
the digital security towards the digital twins words it in this you push
00:06:25
okay i think there are several points before we start to
00:06:29
be emphasise about their privacy insecurity there's actually considering three main points
00:06:35
the point is that when they have some they can be trying really the train some of the
00:06:39
models based on this data i'm only talking in the context of the machine learning in a i
00:06:44
basically we need to ensure said the data that we use
00:06:47
the for training about the model is the satisfying satin confidentiality uh
00:06:53
in that really concerns is that there are was not modified would not downpour and b. might
00:06:59
and they should be sure that actually there is nothing happened to this date
00:07:02
otherwise we might have very very big problem and biases was ever models so therefore
00:07:07
at the same time we need to or insurers that will issue related to the on off the
00:07:11
rice access to the data are in actually misuse of the data should be probably had once you have
00:07:16
the model trained and we have the user of the model of some service we need to ensure
00:07:21
that the information that comes and how to model
00:07:23
communiques or perform some services should be also properly satisfy
00:07:28
meaning that information fan from the user to the model to the server were
00:07:31
informations coming back or how this information is stored on the server on cartons that
00:07:36
there aren't calls should also probably product and the last but not least alone to
00:07:41
wait when we're talking about the machine learning we definitely need to take into account
00:07:46
not only simple abducted you typically address in the real world where is that
00:07:50
or some might be a mm computed identity or maybe we can try to
00:07:55
go on some specific features of the uh oh of personal object will contributed
00:08:00
in some way but recently we observe the puritans um along the parents of gowns
00:08:05
uh uh the the generator several networks we also observe the appearance
00:08:09
of some examples that's cool adversarial examples that might be very harmful
00:08:13
for some missionary markers so these aspects of privacy insecurity
00:08:17
from this perspective should be handled properly they speak to force
00:08:20
peace interrupt me but i think in general introductions to okay right okay so the as i said before i will primarily
00:08:28
focus on the two approaches to the price insecurity that is based on the learn able to speech and let's go
00:08:34
in looks at this term and also sherry secrecy then maybe not very common but i will try to explain what thing
00:08:40
what they meant that's so okay let's start with a little bit of mass i
00:08:45
hope it will be not so difficult about again it just just a small introduction
00:08:50
so i really uses diagram generate to speak about the issues was
00:08:54
bribes insecurity because the the ground gives an idea about the main
00:09:00
variables and about the main wanted does it get interested in so i will start
00:09:05
with the notion of utility data for example let's think about by math except maybe
00:09:10
the simple case for us the worry they see it can be some label in
00:09:14
the present some identity of the person right my name my my my my id number
00:09:18
and sit there and this is my my presentation in disk either presentations given like
00:09:24
it for the uh of the person but it can be uh any representational data or
00:09:29
maybe a a talking about the human can be voice d. n. a. or whatever
00:09:32
autistic once again of fingerprint or b. might think about also physical objects i will talk
00:09:38
about it later like it physical income but function but you need representations object as
00:09:42
well so it's very generic daughter presentation that we can acquire but some sense it's at
00:09:46
the same time what is to be very very important possibly talk about this since
00:09:50
sensitive data that it depends on the task in about this solution would that trying to
00:09:56
address it can be some sensitive representation like medical directors can get messy to you
00:10:01
can be my emotions any sort of data that i would like to preserve all that
00:10:06
would prevent the anthrax excess to and what we're trying to do so given this
00:10:12
visual presentation x. we're trying to use yet
00:10:16
another presentation that is enough innovation in z.
00:10:20
that might be in some sense more vastly different the the the two
00:10:25
different miss alignments mean probation like variation maybe more come part maybe more secure
00:10:30
and maybe more privacy a fancy to well maybe it from them shimmering perspective
00:10:34
can be in terms of the basic sufficient the basics it will contain all information
00:10:39
that is needed for us to sort of adoption cost down daunting task mean decompression authentication
00:10:44
eh indication classifications that they're so therefore would be do you see this
00:10:49
small circle this year that is in my notation will be it should be
00:10:52
learned looking ability training so we are trying to work design and marker
00:10:58
that please hold innovation so weeks is input that's conditions e. it is output
00:11:03
anti r. some parameters of this marker i will talk about them in a bit more detail so right now we're talking about that
00:11:09
like obstruct function that transform our input seeks to z. through some mapping
00:11:14
and basically once we have the z. there's the presentation we can think for
00:11:18
what it can be useful for cost it might be useful for exact asked
00:11:21
related to some inference maybe classification maybe we want to store it for indexing
00:11:26
for search it can be done i will talk about this later supervises also provides way
00:11:31
or it can be used for the compression so from busy begin just
00:11:34
the compress it and get this estimation of weeks okay so we're training and
00:11:40
coder your training the marker and depending what is our thoughts yet but i
00:11:43
mean the quarter for example decoder for the classification that mom easy to see
00:11:49
and for the reconstruction that mom this is you do acts like an output wooden
00:11:53
framework so therefore from one side to you want to satisfy the z. are sufficient power
00:11:59
a solution of of a daunting task target the dust and at the same time they come parked they
00:12:03
satisfy some say compression they do not exist at the number of bits that you wanted to look right
00:12:09
now everything is hypothetical it will go with details all but like now it seems to be uh quite
00:12:15
unrelated thoughts classification and compression but didn't introduce some b.
00:12:20
b. just beside doing some heuristics stuff can introduce some
00:12:24
intuition mathematical intuition somebody there was a now is is it the with the this problem from strictly from theoretical
00:12:30
point of view where you can see what the of a solution is optimal not optimal according to set and frameworks
00:12:36
it's actually breakthrough in their machine learning was a framework introduced
00:12:40
by the group of not that they should be was information bottleneck
00:12:44
probably a somewhat you heard about that and it was the entire take specially introduce
00:12:48
for the classification thought so how compress that that'll put a presentation and then classify it
00:12:53
and the essence of the framework for the classification task it was formulated before right so they
00:12:58
they are trying to take some function would be typical last but it's of objective function here
00:13:04
in our case it isn't which information so went to work with the motion information for the
00:13:08
people who do not know what if i write in weeks from one side z. from another sign
00:13:13
so if we show information is very large so we have to set foot using the very very strongly overlapping
00:13:18
given we show information very small we want to make some very small but the but the overlapping with the small
00:13:24
common area or if they independence the basically there is no overlapping at
00:13:28
all so basically if you have here is a minimisation does we're trying to
00:13:33
make it as much information work cocking between each sin see
00:13:36
as small as possible meaning get trying to suppress information in
00:13:40
z. about the weeks so can be interpreted sort of compassion
00:13:44
but not necessarily different ways how to minimise much information on
00:13:48
but of course we can you calculate and you can minimise it much information up to someone might
00:13:53
but it it objected to do we can make it even a simple typical 'cause you're not
00:13:58
we have the second term that that tells us that he can
00:14:02
go and we can try to minimise motion information between x. and z.
00:14:06
but to such a limited that they see any present wall necessary information
00:14:12
between is that n. c. c. it is our classification task so meaning
00:14:17
just if i just the interpret it just if you are given a very huge image in
00:14:22
this image you have only information that classic that character i the person about the i'll explain metrics
00:14:27
have you ever seen with this on the lake it for my don't feel does to my classification
00:14:31
and just keep it only in my presentation see it would have input before okay and this will be not a case that which is the first
00:14:37
so the first term that is minimisation here the minimising this but the second pair is with the sign mine is
00:14:43
we're trying to maximise it should be put into this permitted beater that is just play the working meeting to us
00:14:50
so that was original work of p. h. b. m. but usually only said i know how that would put inference
00:14:54
classification task and he didn't tell us anything i'll do like the framework for the compression tossed me out on products
00:15:00
and as you know open borders represent really quite important
00:15:03
part of uh oh different application so that's why starting from
00:15:07
miami and then our group him back to seventeen nineteen
00:15:11
we just extend in this and it's a time didn't have
00:15:14
accommodation now it's so simple and so beautiful with the jesse places see here that is a classification by the simple
00:15:21
peaks that is like a construction so basically if you see this that i'm guessing okay go from peaks to z.
00:15:26
but preserve certain information z. just to decompress so soon you have j. peg
00:15:32
like compressor you have an older people that you can compress it and then try to decompress oh but actually pays like that but uh
00:15:38
get back you have fix and put it with different here it is a laudable in political again i'm trying to on simplify it but it's
00:15:45
simple as that okay so that is that is it to frameworks what we have and then the main problem with this approach is how to
00:15:52
go and how to mess semantically expresses germ from which information is very
00:15:56
nice intuition right what i'm just saying but how to apply the brackets
00:15:59
and especially impact is the advice in several problems the problems that that from
00:16:04
some distributions we don't have many samples made training examples again not opening night
00:16:09
yeah not a a good mind and will be don't we don't have four hundred billion training examples right
00:16:14
we need to do something with thousands uh uh hundreds of thousands of ah so that's a very big challenge
00:16:20
secondly a dimensionality of data is very large right it's not in that that's not discrete data
00:16:24
where we can compute much information in the frequencies approach and probability sense it's quite so that's why
00:16:30
that several approaches i i just a split it three of them here but you can maybe
00:16:36
they can find more than one is very very famous that there's a contrastive approach and very
00:16:41
thing most work of under work that is evil and see probably you heard the about that
00:16:46
that is a basis but will contrive to learning methods and s. s. l. nowadays the second one
00:16:51
and don't forget about that cans that this come up that in the practical application of
00:16:55
the group of unusual bend your known mine mine stands were there this title the paper
00:17:01
this one much information overall estimation can finally the approach that was initiated by alex adam from
00:17:07
global brain and then developed by other group in this first paper and then by the rules
00:17:12
in in more extended version so that's what we have our own full of the variation approximation
00:17:17
that much information that is how to start from this very early days of approximations okay now
00:17:24
in order to bring us to the fundamental understanding of the problem antics of a a new machine
00:17:29
learning passes i would like them to mention reviews that that different the approaches how to train ave
00:17:36
i system how to train machine weren't so it's depends what kind of data we have and it's
00:17:41
depends where did we have basically a database labels without labels and what kind of labels meaning that
00:17:47
is labels image get box and sit there or we have the label basically it's sitting somewhere does
00:17:53
get is is happy and sit there and sit there okay so and there are several approaches that
00:17:58
supervised and supervise sell supervise specs labelled learning and
00:18:01
sam's upright especially interesting parts so instead of giving
00:18:05
the definition i i propose to go visit visual drawings from work will be clear with a docking about
00:18:10
so suppose that historically the first group of men so that there's um
00:18:14
supervised techniques if you're going to have the label data weeks and see
00:18:19
and these labels data are represented by the and training samples right so that is
00:18:25
them approach where you have mixed you to present the image c. e. represent it's like
00:18:31
okay for example get box incidence their own nature object or whatever and they hear this images you
00:18:36
know can be presented by this you cover components uh energy be or you could you be et cetera
00:18:42
and as i said the mainframe work is tool building in order this marker set excess map to see
00:18:48
and v. it's of a it's a lot under presentation and then for example for the classification task that they can
00:18:54
the whether parts that produce from busy like a estimation of the c. c. had it is estimation dislike so
00:19:00
once they have these label they that we're trying to train both in coder and decoder in such a way
00:19:06
that once our system supports some sort of the image it will
00:19:09
be able to produce the label that the schools as close as possible
00:19:13
for the training that the set even put it on sunday now as i promise i will just all the decipher what is inside here
00:19:19
so basically it mostly that speaking about didn't know networks implementations
00:19:24
whether it's finance work transformers but in not shells if you
00:19:27
have the data peaks you applying some transformation mostly yeah it
00:19:31
is a mapping as imagine you multiply this market space then
00:19:34
you have the bias you apply some nonlinearity again you apply incidence address and you do in many
00:19:39
many many times it's easy thing rectum and basically uh all the by me through the transformation matrices by
00:19:46
basis is matrices basis is the form policies barney that's he said i just added yeah right so
00:19:51
when you're talking about training and go during the quarter it means that we need to find all the
00:19:55
still through these matrices journalist bucks okay okay so but now i'm something interesting okay maybe it it's
00:20:02
just training for this supervised classification was important at the time and the bob whites next uh developed well
00:20:08
it doesn't tell but now out engine parts in terms of blame ethics and in particular to it is
00:20:13
interesting for the full in fact once we trained in colour we don't care anymore about the second part
00:20:19
d. have there's lot under presentation and it does not matter presentation we can apply even know learn learn
00:20:24
about functions for example just simple stressful working sign function and you can get some representation that is binary right
00:20:30
so it is binary presentation been short been informative been robust might be
00:20:35
interpreted as a sort of hash or template that is robust that this comp
00:20:39
ah and it's your okay so again it's not a cripple hash it
00:20:43
doesn't satisfy the same principles but not a community refer to this as a
00:20:47
abbas hash or perceptual hatch were as a template okay now about how often works
00:20:53
oh don't worry it's it's another framework that is open use when we have only
00:20:56
data without the labels so you see here we had like most european don't have labels
00:21:00
so the idea is that the uh trying to train and we didn't really quarter to process the same data and to market back
00:21:06
but we're not doing in that she will way otherwise will probably not far in mind should be just
00:21:11
controls the size of a lot of space but the imposing different constraint on the open space and it's
00:21:16
not unspoken constraints can be people started this v. a. you'd be the v. in for the u. d.
00:21:21
done done evil done it's utterance that there are there is little of the zoo all the different methods
00:21:27
but the the present them somehow to bring this loop right to some manageable interpreted interpret ability
00:21:34
most the form again in this first paper and then it was deviate an extended in many many papers again ever
00:21:40
two but it was like this paper you can find even more
00:21:42
informations is that something that is really right now nowadays very interesting and
00:21:48
attractive that is the cell surprise learn so the problem is very
00:21:50
similar information with this one so we have only data without any labels
00:21:54
but instead of asking too and poured into the quote they just having to importers and and how could possible
00:22:00
trains us to get taken the data x. b. create
00:22:04
two very similar accommodations very similar view of the same data
00:22:08
and then the asking pulse and they're coming from the same in h. so naturally they should be also close not only
00:22:14
in the image space but also and a lot of space soviet imposes similarity comes comes the constraint on the lot in space
00:22:20
pushing to images coming from the same to be very close and if they
00:22:24
have some other images to be a part way that is the principle of contract
00:22:27
of learning how new formulas it uh but prizes that's another question for different definition
00:22:33
of the contrast of losses but the most interesting and and use nowadays it's infancy
00:22:38
and just puzzles white interested i cannot give the references because there
00:22:41
is no place but if you will the names like seem still are
00:22:45
from him don't y'all as well uh mostly from baseball could be sage
00:22:49
uh you know version one but that shouldn't to battle planes me crack a
00:22:53
m. s. n. messenger snapper mask we that is they've already been four of them
00:22:58
so if you're interested you can find how these systems the trend nowadays
00:23:03
and then the last but not least i just say the name clip
00:23:07
and many people really see how that the main engine especially when we're talking about text image generated models
00:23:13
so they yeah is that instead of having to all the and others that that's the same for is images
00:23:18
one quarter is that you get with the images one input or that you get it for the text and we
00:23:22
try to make them names in the text describe the same object maybe you about that we try to make their
00:23:28
locker presentation record right semantically 'cause in describing all different languages different images and so they're not they should be cost
00:23:34
okay and finally the last but not least what i think quite interesting for the
00:23:38
price insecurity generalisation that approach uh would be published well recently it two years ago that
00:23:44
is the same supervised learning where you have a lot of a lot of for
00:23:48
a an label data and just few labelled examples okay that what we call sensible wise
00:23:53
and they yeah in actually is very simple that you would trying tool training base in quarter
00:23:59
for the block in space but from the slogan space useful mental task at the same time
00:24:03
all data without labels your asking to reconstruct the data
00:24:07
like working like elton border you have seen it before
00:24:10
and for the data that this was the labels it try to produce their match with
00:24:15
the label that a given and for example that amount given the label it should reduce
00:24:20
one label look whatever and the label because normalise of mark's produces output can be everything
00:24:25
inform but it should be only one cold in winning should be something that is on top
00:24:30
and the body surprisingly and will you be this simple trade gives super boosting especially when winter
00:24:35
a very small amount of labelled so you can imagine for the privacy can be very interesting setup
00:24:41
we didn't especially target the privacy by demonstrated the possibility of the system
00:24:45
nah i'm coming to the importance that actually it's clear
00:24:49
one the systems are trained to how they train it's important
00:24:53
to address privacy and or boston store the satellite box so i will talk about these aspects now and what else so
00:25:00
what is about the privacy privacy is conferred in the
00:25:03
sense when we have our best say train system so we're
00:25:06
trying to train such easy to do by the classification of
00:25:10
pacific so we're reconstruction from the set up today just explain
00:25:15
but in the hope we didn't compare is that parker machine learning never talk about that much less a commercial or learning
00:25:21
now to talk about the price but what i can do about that can come and observing this of the relations e.
00:25:28
might try to get docked looking for all information about
00:25:32
the other sensitive attributes again classical machine learning didn't address
00:25:36
it as such and where to get can come and and exactly seen way is you try to reconstruct today
00:25:43
so and actually the button on the construction of that so actually all
00:25:47
aspects of privacy you are editing right again classical machine learning didn't do
00:25:52
anything but isabel this a passive if i may say when inferential abductor
00:25:56
but it may be even worse keys a active whatever shot up here
00:26:00
then once you have any model from the list what i give you before all these fine models you have a train
00:26:06
i thought your and for example for the classification task so you get the image you pass it through
00:26:10
and quarter you present recorder and you have very correctly recognised object or person i thought get my say
00:26:16
okay i want to do the full i want to create an adversarial sample that is a perceptually very
00:26:22
very close to my original data so human cannot see it like what the market yes but wouldn't buy something
00:26:27
but i'm we find something in basically here size it in this
00:26:31
image comes to the input of my system it gives completely different result
00:26:36
i think we can different identity or maybe pointer tool
00:26:38
whatever forbidden probability author basically decide et cetera et cetera and
00:26:43
our system was not paying initially for that because we
00:26:46
didn't try to address such adverse other buttons but a party
00:26:50
our systems are very very boring riddled with this that alex and isn't it
00:26:53
shone with that even if you have already these recognition system banned it into some
00:26:59
fact generation that they recognise it you can modify it and respect generation system
00:27:04
can generate a completely different description to the image when we use you this
00:27:07
imagery even all beauty some noise or somebody vacation but it's a misinterpretation so
00:27:11
you can imagine all consequences and dangers of such kind of mixed okay so
00:27:16
therefore small somebody in conclusions that classical mush allow learning was not created where
00:27:22
actually didn't consider these two aspects of privacy and uh the set of robustness
00:27:26
initially so that's why they they want to bring 'cause aspect office now it's
00:27:32
six since the the th of this ask the of this argument as for privacy
00:27:37
i think uh the rules was one of the first guys
00:27:40
um who started to think about this from the information bottleneck perspectives
00:27:45
and we try to engage this attribute ass into thinking using the same
00:27:50
as medical products as informational so the question is how to design this optimal
00:27:56
transformation this marker that would keep all information about all the c. about
00:28:01
the classification about identity but try to remove all filter out all information about
00:28:08
other sensitive introduced from how the representations in these guys were really
00:28:12
separable you can do it into handcrafted way but if you do not
00:28:16
know how the two attributes between c. and asked are statistically depended it's
00:28:21
modern obvious task at all so that's why they call it is open
00:28:24
it optimisation problem and once you've station by the uh optimisation so therefore
00:28:29
what's that was done in them several papers from the who was it
00:28:32
was a an attempt to solve the problem of trade off b. t.
00:28:36
and you do what you were do you want to preserve classification accuracy
00:28:39
we want to go out yet um low complexity of compressor presentation at the same time with the
00:28:44
minimise leakage about security attributes i think it's clear so and here i would like to give to one
00:28:50
oh concurrent works that the work of that was that was this you radical justification in complete the optimisation
00:28:56
and the second one was more ad hoc at them to demonstrate that how it can be done well stocked too
00:29:01
implying this and uses like a different um
00:29:04
approximation notes they could not that other configuration approximation
00:29:08
which information but using their a contrast to learn approach so what about the first one claw so
00:29:14
clock that is the trade off between complexity kitchen utility bottleneck that is that division so basically what
00:29:20
was done in this work there if your recognise it again i i slightly modified respect original paper
00:29:26
so it's exactly these two terms this one the the uh trying
00:29:29
to minimise the link between you can see they're trying to compress data
00:29:33
yet we're trying to all these are in the three presentations
00:29:37
you hold information about the classification attributes yet so what we
00:29:40
want to reach you at the same time we that this is a loss and it is a minus idiot trying to minimise
00:29:47
any motion information any link between the presentation and asked about their uh sensitive
00:29:52
attributes right soviet compressing the data the clinical information in this presentation about this
00:29:57
but yet the observing around for much of course it will work very nice
00:30:02
and very cool is between past and see if there are no statistical depending
00:30:06
and those gives a lot of a lot of different diagrams explaining all
00:30:09
these combinations so i really uh uh just advice if somebody's interesting more detailed
00:30:13
story this paper and to see all the spiritual divisions yeah in the spectrum
00:30:17
so a and b. for but was was playing with them in different settings
00:30:22
for example very nice if you could take the colour nest and digits zero one two
00:30:27
three ninety percent and class and colour it's always sensitive attribute they just that for example
00:30:32
so it was possible go shows it began idea try
00:30:35
to compress but uh try to be in in general way
00:30:40
to avoid even reconstruction but was out very smart things but there's more to my strategist
00:30:44
you can see that you can keep you can preserve in the construction old age
00:30:48
is how they were in the original one but you can completely eliminate the colour information
00:30:53
right so all of them are the colour and it's obvious incentive get reviewed but i
00:30:56
cannot reconstruct on the second partial setting could be generalised to talk lane was a parameter either
00:31:01
but that also was possible demonstrated you can even prevent the reconstruction from the log in space
00:31:07
right but again this is should be treated okay what is all the loss in terms of
00:31:11
the classification right but this is also the option so this frameworks very general is very
00:31:15
nice a toolbox and uh it to play with as a framework so maybe liam advice i
00:31:22
same time this is if we're a framework that issuing that you can train and coder and
00:31:27
decoder send how come you might your presentations with very so radical but maybe in fact is
00:31:31
you might come to the foundation models what any of the foundation models so for example suppose that them
00:31:37
you have something go gravy train on very large corpus of data
00:31:41
maybe pretty trained by uh by face book for your for example
00:31:44
you know version one two or m. s. n. network or whatever and strain in the in the contract away so you have it
00:31:51
but it gives you this representations here right so but still we have seen that does not represent the it's not
00:31:58
baby see a injuring them and and eating so basically yeah the problem is it imagine here via at the uh
00:32:04
taking here the insert into the network a very small production at except just several layers
00:32:09
and you're saying okay let's train this network in such a way just a
00:32:12
post doctoral both training right in in in in pursuit would be this paper
00:32:16
that would that would preserve begin here all information and try
00:32:20
to filter out all information that might be present causes the privacy
00:32:23
issues okay so that's the difference and that put them astray that
00:32:27
this approach is also force will in terms of school ability at
00:32:31
that if you see the original data and this is was very construction from these their presentations so you see uh
00:32:37
it's not necessarily if you have already at some s. s. l. training training not or it will be privacy
00:32:42
secure get them people claim this it's not the case use you still can reconstruct that might make a very perfectly
00:32:48
some people are saying well we talk about the potential privacy or we
00:32:51
can maybe simply add noise to other presentation just fine as a way of
00:32:55
other confiscation so you see steel you can a little bit the construct maybe
00:32:59
not so nice but maybe if you push more training you can do better
00:33:03
and you see that the metrics i didn't want to go in detail so
00:33:06
we can see visually judge about that so slightly that is a performance accuracy preservation
00:33:11
without any obfuscation that is a visit visitation by edition
00:33:15
of the noise and if you train this way 'cause
00:33:18
i presented this pop you speaker here that tries to please their classification so easy almost the same almost what laws
00:33:25
but in terms of reconstruction yes person t. and a
00:33:29
construct and see that the it is the agenda that was
00:33:32
our say the okay to use your average image for
00:33:35
all ah email you there is an or person and all
00:33:39
and and mount so you can courtenay are correlated but basically you can see that we cannot reconstruct against
00:33:44
so basically imposing different constrains you can play with this and you can see the different interplay between them
00:33:50
okay so now as the c. yes it's possible to do it
00:33:54
but uh uh we need to take your whether the essence of
00:33:57
the the get the dudes and privacy uh uh and and the
00:34:00
attributes um up utility of how they like it won't be like that
00:34:04
and we need to train the system from the beginning to the and to ensure that so it's doable
00:34:09
it's it's you it's super cool but what about the scalability main b. and more practical separate multiplication the where
00:34:15
i will call the second part of my talk so it is possible to which i like to raise one
00:34:21
concern in the settings of classical machine learning problem from
00:34:24
relations when they're talking about the defender commercial learner normally
00:34:29
you never talk about the key even into the directed learning whenever talking have you ever heard about the key
00:34:36
like typically the first starting point if you talk about that
00:34:39
a particular application not because in this case it's always again that
00:34:44
a defendant the person who designed the system and that bikers
00:34:47
aeroplane basically in the same in the same conditions maybe someone has
00:34:51
more training data and we less somebody has and it is only yeah was more to people who have more computer was not
00:34:57
right but basically it is nancy a deal the classical definition
00:35:02
of good over if he adapted to the give of principle right
00:35:05
so the main idea where you want to address it from from this
00:35:08
principle then you need to ensure that you as independent you have that
00:35:11
information advantage of the attack yes and how to create and that is
00:35:16
it the brochure what i called a privacy preserving but sure rick secrecy
00:35:21
so let's start uh uh let's start with this uh thought that was good three
00:35:24
before so we train the system in the classical machine learning so you're trying to work
00:35:29
did that somewhat under presentation from the letter presentation you try to classify
00:35:33
or you you try to reconstruct but basically what about it they inject here
00:35:39
one module that will will have sure it's it's a medical sharing
00:35:43
the key between zap obfuscation model so that will produce from c.
00:35:48
and new i'll put the v. that will be in public domain and size
00:35:52
city they have the uh tiger and a packet has access to the c. d.
00:35:55
i thought there will be not capable to deduct this sensitivity every bit
00:35:59
as i'd like it will be not capable to classify i'd like a rubber
00:36:02
not a capable of reconstruct because attack it does not know this key
00:36:07
and the entropy of the key mean and amount of different combinations relatively large
00:36:12
and they said okay but not null this space and doesn't know the particular cute but we use
00:36:16
like incorrect occurred over get it will be very difficult to address so that's why they're the the
00:36:21
guys who has access to this will be capable of doing this this is a point that it
00:36:26
can be done to jointly or otherwise it can be also done jointly where the train this together
00:36:32
and this approach was can vary in yet another of a paper uh i think it would internal version of this paper so
00:36:38
and the the most rated said this approach can be also
00:36:40
useful not only for privacy but also for robustness persecuting because
00:36:45
in result only the ski attack in on that propagate the
00:36:48
the the distortions and either sorry about the not useful thing
00:36:52
okay so tripped over here does play a role for mitchell arnie still with just the beginning but it should
00:36:58
be scared of resort and i've seen already recently spoke to some publication from the group of a of a genocide
00:37:04
and i'm very glad that the vehicle here in this direction that
00:37:07
was on that d. r. introducing the seats so now in the
00:37:11
main part i would like to cover several application that okay i don't have time to talk about all of them maybe the most remarkable
00:37:18
and before i talk about by magic say saying again no need probably to
00:37:23
give all the overview about that in many aspects you know it better than me
00:37:27
but just to emphasise that it's consist of a for the people who
00:37:31
are not familiar with that so that consist of the two main stages
00:37:34
the page worry be you some features some back towards you and then we try to do something with the speech or is it
00:37:40
as a extraction to be sure there are many many modern acid supervised learning what you look recently then install
00:37:45
learning the for those people who were a little longer term that is ah face to face goes space et cetera
00:37:51
and no more more decently sell to provide a method alone and that will
00:37:56
tell loser pathetic basic you would be based on the concealed will buy magic's and
00:38:00
but metrics get the system means that we're not playing was by magic itself we're playing bass added a random
00:38:06
list at the end it key okay as a hash under construction like by the commitment possible help or they
00:38:10
can sit there okay bye might exist very well established bill there are requirements for uh put the blame ethics
00:38:17
in two words by magic should ensure that if something
00:38:20
is compromise should be easy to template replaceable either presentations you
00:38:24
we should not capable of reconstructing this from this
00:38:27
template back original data exactly what they demonstrated before and
00:38:31
in by metrics so we need to be sure that it's not easy to link the data
00:38:36
between different users and also uh basically we should satisfy the same time will before was as you
00:38:40
requested before so in anybody for a formulation we have two stages dependent that is doing and
00:38:47
all meant about that part and identity c. and maybe some privacy been told it to the template
00:38:53
and be stored in the database but they check a is that before this to when the database
00:38:58
he had the idea randomness to zap parameters so falling apart modifying some features
00:39:04
i've seen the several papers uh on this matter or you simply concatenating based
00:39:09
to the data is icky and navigation stage when you have why as a prop
00:39:13
you would assume that you have the same key here you have your
00:39:16
presentation and then you can do identification meaning say who is this person
00:39:20
or if this person is saying i am slot then it can confirm yes
00:39:23
it is lot need one more reject that slot is binary that litigation all but
00:39:29
it's in time we should not forget what i can do that i can
00:39:32
come and say oh from this popping version i might try to get out base
00:39:36
a sensitive attributes i might try to run the reconstruction about all i might perform there
00:39:41
not analysis and ling some user so all these s. that should be carefully taken
00:39:45
to cow okay so besides that we should not forget about the other side a lot
00:39:50
biker or maybe an uh people who will come and practically practically person yes and
00:39:56
physically present another one so these minor details should be also well distinguished and taken into
00:40:01
account when we design they both template extraction and complete productions keeps uh one thing
00:40:07
that what we encounter in many application when we started to work and is the main
00:40:11
that's about the medical records a working puzzles people that after that there's work at
00:40:15
a continuation without that liberation established by the rules there's a cuckoo a hospital in geneva
00:40:22
uh and also for storage of a very large collections in a regular twenty that's where
00:40:28
people also want to collect but not want to share a barbecue images a very huge
00:40:32
so they you know there's a presentation is based on the compress that
00:40:35
privacy preserving your presentation so i imagine you have in order to compress the
00:40:39
data and you will that that you compress the data but isn't whether
00:40:43
gives you hear just i'll put that this will continue as a a presentation
00:40:48
the apply here quantisation very special way that the output is pacify
00:40:53
specify if you have continues and use your here you have like the
00:40:56
zero so nelson will plus minus one each of these viewpoints so the
00:41:00
way oh computations apply here based on the key you add noisy components
00:41:05
plus minus ones in the secret locations so that i can if he
00:41:09
or she doesn't know where these uh um because nation added you cannot
00:41:12
reconstruct it or if you have the knowledge of this you can eliminate
00:41:16
this noise at yes as a vacation you can reconstruct so i'm cold
00:41:20
and and basically it again advantage if your fries user you can communicate
00:41:25
you can compress decompress if you're not of fries you target and you
00:41:28
cannot reconstruct and the last but nobody's application to talk only about physical
00:41:33
and we understand that it's important for the humans will buy metrics but
00:41:37
it's also important to the physical optics physical do we present different aspects
00:41:41
of our life yes i'm not talking about only lottery watches abroad it's
00:41:45
uh oh maggie a worry about some hard but is also important fact
00:41:50
electronics it's also important from t. s. and he didn't sit there are
00:41:53
and basically uh right now that is it and the wall so good like the
00:41:57
same privacy security principal to the protection of the physical objects so how it is done
00:42:02
i will say the in on string why classical mess it will not work basically
00:42:06
if humans they possess unique uh by magic spend a print iris and said that are
00:42:11
physical the big do possess the same by metrics but they
00:42:15
should be observed on their me crusty on on a scale okay
00:42:18
and then if you have these not muscular presentation can extract exactly the
00:42:22
same in the principal template like it wasn't by magic then you can start
00:42:26
so we're working also in this direction so we have automatic line somehow but
00:42:30
the text are the sneaker feature they stored in the database and then users
00:42:34
can send this information verify they have the whole record from which place on
00:42:39
the world is done it with time with the outcome incidence that there are
00:42:42
and basically it to summarise it is exactly the same seems exactly same scenario like four by metrics but as the input
00:42:48
of the system be using micro structures are the physical optics and that can be applied to anyone chickens well it used
00:42:55
to be you there's a lot of benefits clear 'cause mobile phone you can say it's a pentagon although taking you can
00:43:00
track trace you can have both chain records units can be
00:43:03
uh also on a for direct marketing and analyses and cetera
00:43:08
and in conclusion as you see uh it's interesting to have the interplay between this you into packets
00:43:13
and different applications and especially in you mind we talk about future
00:43:19
and futures that many machine learning techniques requires similar techniques
00:43:23
like using by metrics meaning some keys some cripple system not
00:43:27
only trying to do with that optimisation dreamworks how it is
00:43:31
done mostly right now and be a lot of new challenges
00:43:34
will come they already arrived with their uh a a i generated content to distinguish where this content to generate that

Share this talk: 


Conference Program

(Keynote Talk) Privacy-Preserving Machine Learning : theoretical and practical considerations
Prof. Slava Voloshynovskiy, University of Geneva, professor with the Department of Computer Science and head of the Stochastic Information Processing group
Oct. 11, 2023 · 7:55 a.m.
2512 views
5 minutes Q&A - Privacy-Preserving Machine Learning : theoretical and practical considerations
Prof. Slava Voloshynovskiy, University of Geneva, professor with the Department of Computer Science and head of the Stochastic Information Processing group
Oct. 11, 2023 · 8:40 a.m.
Enabling Digital Sovereignity With ML
Vladimir Vujovic, Senior Digital Innovation Manager, SICPA
Oct. 11, 2023 · 8:44 a.m.
5 minutes Q&A - Enabling Digital Sovereignity With ML
Vladimir Vujovic, Senior Digital Innovation Manager, SICPA
Oct. 11, 2023 · 8:58 a.m.
Privacy-Enhanced Computation in the Age of AI
Dr. Dimitar Jechev, Co-founder and CTO of Inpher
Oct. 11, 2023 · 9:01 a.m.
139 views
5 minutes Q&A - Privacy-Enhanced Computation in the Age of AI
Dr. Dimitar Jechev, Co-founder and CTO of Inpher
Oct. 11, 2023 · 9:20 a.m.
Privacy by Design Age Verification & Online Child Safety
Dr. Onur Yürüten, Head of Age Assurance Solutions and Senior ML Engineer in Privately
Oct. 11, 2023 · 9:26 a.m.
5 minutes Q&A - Privacy by Design Age Verification & Online Child Safety
Dr. Onur Yürüten, Head of Age Assurance Solutions and Senior ML Engineer in Privately
Oct. 11, 2023 · 9:41 a.m.
(Keynote Talk) Biometrics in the era of AI: From utopia to dystopia?
Dr. Catherine Jasserand, KU Leuven (Belgium), Marie Skłodowska-Curie fellow at Biometric Law Lab
Oct. 11, 2023 · 11:06 a.m.
5 minutes Q&A - Biometrics in the era of AI: From utopia to dystopia?
Dr. Catherine Jasserand, KU Leuven (Belgium), Marie Skłodowska-Curie fellow at Biometric Law Lab
Oct. 11, 2023 · 11:42 a.m.
AI and Privacy
Alexandre Jotterand, CIPP/E, CIPM, attorney-at-law, partner at id est avocats
Oct. 11, 2023 · 11:48 a.m.
5 minutes Q&A - AI and Privacy
Alexandre Jotterand, CIPP/E, CIPM, attorney-at-law, partner at id est avocats
Oct. 11, 2023 · 12:06 p.m.
Preliminary Pperspectives on the Ethical Implications of GenAI
Julien Pache, A Partner at Ethix and Venture Partner at Verve Ventures
Oct. 11, 2023 · 12:12 p.m.
5 minutes Q&A - Preliminary Pperspectives on the Ethical Implications of GenAI
Julien Pache, A Partner at Ethix and Venture Partner at Verve Ventures
Oct. 11, 2023 · 12:30 p.m.
AI & Media: Can You Still Trust Information
Mounir Krichane, Director of the EPFL Media Center
Oct. 11, 2023 · 12:32 p.m.
5 minutes Q&A - AI & Media: Can You Still Trust Information
Mounir Krichane, Director of the EPFL Media Center
Oct. 11, 2023 · 12:54 p.m.
(Keynote Talk) Unlocking the Power of Artificial Intelligence for Precision Medicine with Privacy-Enhancing Technologies
Prof. Jean Louis Raisaro, CHUV-UNIL, assistant professor of Biomedical Informatics and Data Science at the Faculty of Biology and Medicine and the head of the Clinical Data Science Group at the Biomedical Data Science Center
Oct. 11, 2023 · 1:22 p.m.
5 minutes Q&A - Unlocking the Power of Artificial Intelligence for Precision Medicine with Privacy-Enhancing Technologies
Prof. Jean Louis Raisaro, CHUV-UNIL, assistant professor of Biomedical Informatics and Data Science at the Faculty of Biology and Medicine and the head of the Clinical Data Science Group at the Biomedical Data Science Center
Oct. 11, 2023 · 1:50 p.m.
Genomics, AI and Privacy
Julien Duc, Co-Founder and Co-CEO of Nexco Analytics
Oct. 11, 2023 · 2:01 p.m.
5 minutes Q&A - Genomics, AI and Privacy
Julien Duc, Co-Founder and Co-CEO of Nexco Analytics
Oct. 11, 2023 · 2:18 p.m.
How trust & transparency lead the success of an Idiap student's Master's project in fraud detection
Raphaël Lüthi, Machine Learning and AI Lead at Groupe Mutuel
Oct. 11, 2023 · 2:22 p.m.
5 minutes Q&A - How trust & transparency lead the success of an Idiap student's Master's project in fraud detection
Raphaël Lüthi, Machine Learning and AI Lead at Groupe Mutuel
Oct. 11, 2023 · 2:38 p.m.