Transcriptions
Note: this content has been automatically generated.
00:00:00
oh on the night table which on as as possible um try to say
00:00:05
um so of course think globally professionals level lotion nazi from the university of geneva
00:00:12
um he's a professor with the department of computer
00:00:14
science and the head of the stochastic information processing
00:00:19
slowly slowly talking to possible privacy preserving machine learning
00:00:23
just don't give a little theoretical and practical considerations soul
00:00:29
um yeah well i'm just well so you have about thirty five minutes and then five minutes for questions okay
00:00:36
right okay thank you very much for your nice introduction
00:00:41
so for me it's a immense pleasure to be here thank you
00:00:43
very much for innovation and especially to speak about privacy and security
00:00:49
and the expression this place incision reputable place where if you
00:00:52
have a room full of professors the birth of my failed
00:00:55
was one of the face with european either either the main rising secure if you think you also quite over your patient
00:01:03
and today uh i would like to share with you some of our findings
00:01:07
and is the main and uh i will try actually got that
00:01:10
as this problem from both theoretical and practical perspective main imagine might challenge
00:01:16
to speak about this you area oh for you begin privacy and to give you an very short time in
00:01:21
thirty minutes all details so i will try to do my best trying to explain maybe not so much mathematically but
00:01:27
with my hands and hopefully if it works if it doesn't work don't hesitate to ask questions afterwards so first of
00:01:33
all where pushy they would like to acknowledge the contribution
00:01:37
of my great collaborators and and and and and students and
00:01:41
the first one actually a bit who contributed to boost the main the most of it's a lot of ideas
00:01:46
with all the contributions and innovation so that's that was now he's a member of for and which might be yep
00:01:53
the group also with animals and a and the menu menu resulted i'm going to
00:01:56
present here are actually you're right but no pain was you direct participation is his
00:02:01
initiative then that i'd critiques a so proper goals you or you get around that
00:02:05
would yak uh italian you know who is also here and should uh with the uh
00:02:10
so that is all commuter continue by yeah and also they have had great liberation bizarre where weeks from
00:02:16
harvard university that professor the like the game on one of the guys in privacy
00:02:21
especially when it concerns too radical aspect machine learning then is when windows from imperial college
00:02:26
and also cool because someone your it known will contribute or to the main over
00:02:31
a shorten coder and uh normalising flow that the that is it any longer than that
00:02:36
though is by the way that you could come into fell and now he is one of the researchers at the i did mine too long
00:02:43
okay so i don't really first with a very light justification is probably a
00:02:49
known for all of your about the need to have the ah by this
00:02:54
insecurity machine learning can i i and i will try to give it from
00:02:57
the perspectives that first introducing the general concept of a i or ensuring from observation
00:03:04
and just did you better understand from it which more
00:03:07
method which plays a privacy security comes to get and then
00:03:11
um once we understand that is really a very important
00:03:14
to have this aspect properly addressed we will come through schemes
00:03:19
one based on the optimisation or we can go to learn about obfuscation second one based on the jury secrecy
00:03:24
i deliberately will not talk about different crew to base approaches based on how more encryption because
00:03:30
anyway it will be not possible to covered your it more i spoke to several speakers will
00:03:34
address is so i only will focus on these two approaches and then i will give you
00:03:38
all non exhaustive list of different applications were deprives
00:03:43
insecurity uh do play a very important role and
00:03:46
few weeks use that may be of some of them are white and expect will you so
00:03:50
let's see what will be the outcome of that or so before i start i would like
00:03:55
to maybe to summarise something that is quite old news but nevertheless in the seniors you observe
00:04:01
several important tendencies and that the the standards is
00:04:05
a very important to emphasise for our presentation today
00:04:08
so the first and this is a view observes it evolution
00:04:11
from the convolutional network toward the can force that basically allows
00:04:15
whooping better presentations and the since we talk about the transformers
00:04:20
especially to three last year that is a very very impossible
00:04:24
it is also paying bills so cold mask image but i think that is the one of the type of accommodations
00:04:29
that actually produce a top results in especially in visual cost what we're mostly concerned about but metrics
00:04:35
here and that is actually deserves a lot of attention and still more studies already cost i guess
00:04:40
also allow me to mention what's happening right now in
00:04:43
generated by i that concern bowls natural languages and also images
00:04:48
and we're the observer air pollution from say oh what h. open call that it's gotten so close
00:04:54
normalising poles to work though decently booming framework such within within the fusion probabilistic model b. b. m.'s
00:05:00
that produce oh really impressive results and actually raises the same time little concerns
00:05:06
also inference models to be uh observe in evolution from uh
00:05:09
they're supervised learning masses to work to sell supervise working muscles dimension
00:05:14
and also that that are extremely important for privacy security yet they have their own issues i will speak about them hold it
00:05:20
and also uh what is important is that all this concept led to lose the appearance of new business models
00:05:27
so for example where you have learned model and basin did learn model you
00:05:30
can provide a service strikes or uh based on this is the foundation model
00:05:34
it might have also quite important parts for the privacy where you can train something very
00:05:39
large public database and then you try to fine tune it that to adopt it to
00:05:43
this sensitive or privacy sensitive it okay are just not too big an exhaustive here but
00:05:49
i think no need to convince use that drive security or are both extremely important magnification nowadays
00:05:56
whether you speak about surveillance kinda person like services by matt excels pure commerce
00:06:02
i will not go into the details of that i i'm pretty sure you read
00:06:05
a lot of for blocks and information it is and then the about this especially
00:06:10
just to emphasise that very important it's it's all right to do the roles was on that how we want to pose as an where
00:06:16
you want to drive the words to a digital economy the words
00:06:20
the digital security towards the digital twins words it in this you push
00:06:25
okay i think there are several points before we start to
00:06:29
be emphasise about their privacy insecurity there's actually considering three main points
00:06:35
the point is that when they have some they can be trying really the train some of the
00:06:39
models based on this data i'm only talking in the context of the machine learning in a i
00:06:44
basically we need to ensure said the data that we use
00:06:47
the for training about the model is the satisfying satin confidentiality uh
00:06:53
in that really concerns is that there are was not modified would not downpour and b. might
00:06:59
and they should be sure that actually there is nothing happened to this date
00:07:02
otherwise we might have very very big problem and biases was ever models so therefore
00:07:07
at the same time we need to or insurers that will issue related to the on off the
00:07:11
rice access to the data are in actually misuse of the data should be probably had once you have
00:07:16
the model trained and we have the user of the model of some service we need to ensure
00:07:21
that the information that comes and how to model
00:07:23
communiques or perform some services should be also properly satisfy
00:07:28
meaning that information fan from the user to the model to the server were
00:07:31
informations coming back or how this information is stored on the server on cartons that
00:07:36
there aren't calls should also probably product and the last but not least alone to
00:07:41
wait when we're talking about the machine learning we definitely need to take into account
00:07:46
not only simple abducted you typically address in the real world where is that
00:07:50
or some might be a mm computed identity or maybe we can try to
00:07:55
go on some specific features of the uh oh of personal object will contributed
00:08:00
in some way but recently we observe the puritans um along the parents of gowns
00:08:05
uh uh the the generator several networks we also observe the appearance
00:08:09
of some examples that's cool adversarial examples that might be very harmful
00:08:13
for some missionary markers so these aspects of privacy insecurity
00:08:17
from this perspective should be handled properly they speak to force
00:08:20
peace interrupt me but i think in general introductions to okay right okay so the as i said before i will primarily
00:08:28
focus on the two approaches to the price insecurity that is based on the learn able to speech and let's go
00:08:34
in looks at this term and also sherry secrecy then maybe not very common but i will try to explain what thing
00:08:40
what they meant that's so okay let's start with a little bit of mass i
00:08:45
hope it will be not so difficult about again it just just a small introduction
00:08:50
so i really uses diagram generate to speak about the issues was
00:08:54
bribes insecurity because the the ground gives an idea about the main
00:09:00
variables and about the main wanted does it get interested in so i will start
00:09:05
with the notion of utility data for example let's think about by math except maybe
00:09:10
the simple case for us the worry they see it can be some label in
00:09:14
the present some identity of the person right my name my my my my id number
00:09:18
and sit there and this is my my presentation in disk either presentations given like
00:09:24
it for the uh of the person but it can be uh any representational data or
00:09:29
maybe a a talking about the human can be voice d. n. a. or whatever
00:09:32
autistic once again of fingerprint or b. might think about also physical objects i will talk
00:09:38
about it later like it physical income but function but you need representations object as
00:09:42
well so it's very generic daughter presentation that we can acquire but some sense it's at
00:09:46
the same time what is to be very very important possibly talk about this since
00:09:50
sensitive data that it depends on the task in about this solution would that trying to
00:09:56
address it can be some sensitive representation like medical directors can get messy to you
00:10:01
can be my emotions any sort of data that i would like to preserve all that
00:10:06
would prevent the anthrax excess to and what we're trying to do so given this
00:10:12
visual presentation x. we're trying to use yet
00:10:16
another presentation that is enough innovation in z.
00:10:20
that might be in some sense more vastly different the the the two
00:10:25
different miss alignments mean probation like variation maybe more come part maybe more secure
00:10:30
and maybe more privacy a fancy to well maybe it from them shimmering perspective
00:10:34
can be in terms of the basic sufficient the basics it will contain all information
00:10:39
that is needed for us to sort of adoption cost down daunting task mean decompression authentication
00:10:44
eh indication classifications that they're so therefore would be do you see this
00:10:49
small circle this year that is in my notation will be it should be
00:10:52
learned looking ability training so we are trying to work design and marker
00:10:58
that please hold innovation so weeks is input that's conditions e. it is output
00:11:03
anti r. some parameters of this marker i will talk about them in a bit more detail so right now we're talking about that
00:11:09
like obstruct function that transform our input seeks to z. through some mapping
00:11:14
and basically once we have the z. there's the presentation we can think for
00:11:18
what it can be useful for cost it might be useful for exact asked
00:11:21
related to some inference maybe classification maybe we want to store it for indexing
00:11:26
for search it can be done i will talk about this later supervises also provides way
00:11:31
or it can be used for the compression so from busy begin just
00:11:34
the compress it and get this estimation of weeks okay so we're training and
00:11:40
coder your training the marker and depending what is our thoughts yet but i
00:11:43
mean the quarter for example decoder for the classification that mom easy to see
00:11:49
and for the reconstruction that mom this is you do acts like an output wooden
00:11:53
framework so therefore from one side to you want to satisfy the z. are sufficient power
00:11:59
a solution of of a daunting task target the dust and at the same time they come parked they
00:12:03
satisfy some say compression they do not exist at the number of bits that you wanted to look right
00:12:09
now everything is hypothetical it will go with details all but like now it seems to be uh quite
00:12:15
unrelated thoughts classification and compression but didn't introduce some b.
00:12:20
b. just beside doing some heuristics stuff can introduce some
00:12:24
intuition mathematical intuition somebody there was a now is is it the with the this problem from strictly from theoretical
00:12:30
point of view where you can see what the of a solution is optimal not optimal according to set and frameworks
00:12:36
it's actually breakthrough in their machine learning was a framework introduced
00:12:40
by the group of not that they should be was information bottleneck
00:12:44
probably a somewhat you heard about that and it was the entire take specially introduce
00:12:48
for the classification thought so how compress that that'll put a presentation and then classify it
00:12:53
and the essence of the framework for the classification task it was formulated before right so they
00:12:58
they are trying to take some function would be typical last but it's of objective function here
00:13:04
in our case it isn't which information so went to work with the motion information for the
00:13:08
people who do not know what if i write in weeks from one side z. from another sign
00:13:13
so if we show information is very large so we have to set foot using the very very strongly overlapping
00:13:18
given we show information very small we want to make some very small but the but the overlapping with the small
00:13:24
common area or if they independence the basically there is no overlapping at
00:13:28
all so basically if you have here is a minimisation does we're trying to
00:13:33
make it as much information work cocking between each sin see
00:13:36
as small as possible meaning get trying to suppress information in
00:13:40
z. about the weeks so can be interpreted sort of compassion
00:13:44
but not necessarily different ways how to minimise much information on
00:13:48
but of course we can you calculate and you can minimise it much information up to someone might
00:13:53
but it it objected to do we can make it even a simple typical 'cause you're not
00:13:58
we have the second term that that tells us that he can
00:14:02
go and we can try to minimise motion information between x. and z.
00:14:06
but to such a limited that they see any present wall necessary information
00:14:12
between is that n. c. c. it is our classification task so meaning
00:14:17
just if i just the interpret it just if you are given a very huge image in
00:14:22
this image you have only information that classic that character i the person about the i'll explain metrics
00:14:27
have you ever seen with this on the lake it for my don't feel does to my classification
00:14:31
and just keep it only in my presentation see it would have input before okay and this will be not a case that which is the first
00:14:37
so the first term that is minimisation here the minimising this but the second pair is with the sign mine is
00:14:43
we're trying to maximise it should be put into this permitted beater that is just play the working meeting to us
00:14:50
so that was original work of p. h. b. m. but usually only said i know how that would put inference
00:14:54
classification task and he didn't tell us anything i'll do like the framework for the compression tossed me out on products
00:15:00
and as you know open borders represent really quite important
00:15:03
part of uh oh different application so that's why starting from
00:15:07
miami and then our group him back to seventeen nineteen
00:15:11
we just extend in this and it's a time didn't have
00:15:14
accommodation now it's so simple and so beautiful with the jesse places see here that is a classification by the simple
00:15:21
peaks that is like a construction so basically if you see this that i'm guessing okay go from peaks to z.
00:15:26
but preserve certain information z. just to decompress so soon you have j. peg
00:15:32
like compressor you have an older people that you can compress it and then try to decompress oh but actually pays like that but uh
00:15:38
get back you have fix and put it with different here it is a laudable in political again i'm trying to on simplify it but it's
00:15:45
simple as that okay so that is that is it to frameworks what we have and then the main problem with this approach is how to
00:15:52
go and how to mess semantically expresses germ from which information is very
00:15:56
nice intuition right what i'm just saying but how to apply the brackets
00:15:59
and especially impact is the advice in several problems the problems that that from
00:16:04
some distributions we don't have many samples made training examples again not opening night
00:16:09
yeah not a a good mind and will be don't we don't have four hundred billion training examples right
00:16:14
we need to do something with thousands uh uh hundreds of thousands of ah so that's a very big challenge
00:16:20
secondly a dimensionality of data is very large right it's not in that that's not discrete data
00:16:24
where we can compute much information in the frequencies approach and probability sense it's quite so that's why
00:16:30
that several approaches i i just a split it three of them here but you can maybe
00:16:36
they can find more than one is very very famous that there's a contrastive approach and very
00:16:41
thing most work of under work that is evil and see probably you heard the about that
00:16:46
that is a basis but will contrive to learning methods and s. s. l. nowadays the second one
00:16:51
and don't forget about that cans that this come up that in the practical application of
00:16:55
the group of unusual bend your known mine mine stands were there this title the paper
00:17:01
this one much information overall estimation can finally the approach that was initiated by alex adam from
00:17:07
global brain and then developed by other group in this first paper and then by the rules
00:17:12
in in more extended version so that's what we have our own full of the variation approximation
00:17:17
that much information that is how to start from this very early days of approximations okay now
00:17:24
in order to bring us to the fundamental understanding of the problem antics of a a new machine
00:17:29
learning passes i would like them to mention reviews that that different the approaches how to train ave
00:17:36
i system how to train machine weren't so it's depends what kind of data we have and it's
00:17:41
depends where did we have basically a database labels without labels and what kind of labels meaning that
00:17:47
is labels image get box and sit there or we have the label basically it's sitting somewhere does
00:17:53
get is is happy and sit there and sit there okay so and there are several approaches that
00:17:58
supervised and supervise sell supervise specs labelled learning and
00:18:01
sam's upright especially interesting parts so instead of giving
00:18:05
the definition i i propose to go visit visual drawings from work will be clear with a docking about
00:18:10
so suppose that historically the first group of men so that there's um
00:18:14
supervised techniques if you're going to have the label data weeks and see
00:18:19
and these labels data are represented by the and training samples right so that is
00:18:25
them approach where you have mixed you to present the image c. e. represent it's like
00:18:31
okay for example get box incidence their own nature object or whatever and they hear this images you
00:18:36
know can be presented by this you cover components uh energy be or you could you be et cetera
00:18:42
and as i said the mainframe work is tool building in order this marker set excess map to see
00:18:48
and v. it's of a it's a lot under presentation and then for example for the classification task that they can
00:18:54
the whether parts that produce from busy like a estimation of the c. c. had it is estimation dislike so
00:19:00
once they have these label they that we're trying to train both in coder and decoder in such a way
00:19:06
that once our system supports some sort of the image it will
00:19:09
be able to produce the label that the schools as close as possible
00:19:13
for the training that the set even put it on sunday now as i promise i will just all the decipher what is inside here
00:19:19
so basically it mostly that speaking about didn't know networks implementations
00:19:24
whether it's finance work transformers but in not shells if you
00:19:27
have the data peaks you applying some transformation mostly yeah it
00:19:31
is a mapping as imagine you multiply this market space then
00:19:34
you have the bias you apply some nonlinearity again you apply incidence address and you do in many
00:19:39
many many times it's easy thing rectum and basically uh all the by me through the transformation matrices by
00:19:46
basis is matrices basis is the form policies barney that's he said i just added yeah right so
00:19:51
when you're talking about training and go during the quarter it means that we need to find all the
00:19:55
still through these matrices journalist bucks okay okay so but now i'm something interesting okay maybe it it's
00:20:02
just training for this supervised classification was important at the time and the bob whites next uh developed well
00:20:08
it doesn't tell but now out engine parts in terms of blame ethics and in particular to it is
00:20:13
interesting for the full in fact once we trained in colour we don't care anymore about the second part
00:20:19
d. have there's lot under presentation and it does not matter presentation we can apply even know learn learn
00:20:24
about functions for example just simple stressful working sign function and you can get some representation that is binary right
00:20:30
so it is binary presentation been short been informative been robust might be
00:20:35
interpreted as a sort of hash or template that is robust that this comp
00:20:39
ah and it's your okay so again it's not a cripple hash it
00:20:43
doesn't satisfy the same principles but not a community refer to this as a
00:20:47
abbas hash or perceptual hatch were as a template okay now about how often works
00:20:53
oh don't worry it's it's another framework that is open use when we have only
00:20:56
data without the labels so you see here we had like most european don't have labels
00:21:00
so the idea is that the uh trying to train and we didn't really quarter to process the same data and to market back
00:21:06
but we're not doing in that she will way otherwise will probably not far in mind should be just
00:21:11
controls the size of a lot of space but the imposing different constraint on the open space and it's
00:21:16
not unspoken constraints can be people started this v. a. you'd be the v. in for the u. d.
00:21:21
done done evil done it's utterance that there are there is little of the zoo all the different methods
00:21:27
but the the present them somehow to bring this loop right to some manageable interpreted interpret ability
00:21:34
most the form again in this first paper and then it was deviate an extended in many many papers again ever
00:21:40
two but it was like this paper you can find even more
00:21:42
informations is that something that is really right now nowadays very interesting and
00:21:48
attractive that is the cell surprise learn so the problem is very
00:21:50
similar information with this one so we have only data without any labels
00:21:54
but instead of asking too and poured into the quote they just having to importers and and how could possible
00:22:00
trains us to get taken the data x. b. create
00:22:04
two very similar accommodations very similar view of the same data
00:22:08
and then the asking pulse and they're coming from the same in h. so naturally they should be also close not only
00:22:14
in the image space but also and a lot of space soviet imposes similarity comes comes the constraint on the lot in space
00:22:20
pushing to images coming from the same to be very close and if they
00:22:24
have some other images to be a part way that is the principle of contract
00:22:27
of learning how new formulas it uh but prizes that's another question for different definition
00:22:33
of the contrast of losses but the most interesting and and use nowadays it's infancy
00:22:38
and just puzzles white interested i cannot give the references because there
00:22:41
is no place but if you will the names like seem still are
00:22:45
from him don't y'all as well uh mostly from baseball could be sage
00:22:49
uh you know version one but that shouldn't to battle planes me crack a
00:22:53
m. s. n. messenger snapper mask we that is they've already been four of them
00:22:58
so if you're interested you can find how these systems the trend nowadays
00:23:03
and then the last but not least i just say the name clip
00:23:07
and many people really see how that the main engine especially when we're talking about text image generated models
00:23:13
so they yeah is that instead of having to all the and others that that's the same for is images
00:23:18
one quarter is that you get with the images one input or that you get it for the text and we
00:23:22
try to make them names in the text describe the same object maybe you about that we try to make their
00:23:28
locker presentation record right semantically 'cause in describing all different languages different images and so they're not they should be cost
00:23:34
okay and finally the last but not least what i think quite interesting for the
00:23:38
price insecurity generalisation that approach uh would be published well recently it two years ago that
00:23:44
is the same supervised learning where you have a lot of a lot of for
00:23:48
a an label data and just few labelled examples okay that what we call sensible wise
00:23:53
and they yeah in actually is very simple that you would trying tool training base in quarter
00:23:59
for the block in space but from the slogan space useful mental task at the same time
00:24:03
all data without labels your asking to reconstruct the data
00:24:07
like working like elton border you have seen it before
00:24:10
and for the data that this was the labels it try to produce their match with
00:24:15
the label that a given and for example that amount given the label it should reduce
00:24:20
one label look whatever and the label because normalise of mark's produces output can be everything
00:24:25
inform but it should be only one cold in winning should be something that is on top
00:24:30
and the body surprisingly and will you be this simple trade gives super boosting especially when winter
00:24:35
a very small amount of labelled so you can imagine for the privacy can be very interesting setup
00:24:41
we didn't especially target the privacy by demonstrated the possibility of the system
00:24:45
nah i'm coming to the importance that actually it's clear
00:24:49
one the systems are trained to how they train it's important
00:24:53
to address privacy and or boston store the satellite box so i will talk about these aspects now and what else so
00:25:00
what is about the privacy privacy is conferred in the
00:25:03
sense when we have our best say train system so we're
00:25:06
trying to train such easy to do by the classification of
00:25:10
pacific so we're reconstruction from the set up today just explain
00:25:15
but in the hope we didn't compare is that parker machine learning never talk about that much less a commercial or learning
00:25:21
now to talk about the price but what i can do about that can come and observing this of the relations e.
00:25:28
might try to get docked looking for all information about
00:25:32
the other sensitive attributes again classical machine learning didn't address
00:25:36
it as such and where to get can come and and exactly seen way is you try to reconstruct today
00:25:43
so and actually the button on the construction of that so actually all
00:25:47
aspects of privacy you are editing right again classical machine learning didn't do
00:25:52
anything but isabel this a passive if i may say when inferential abductor
00:25:56
but it may be even worse keys a active whatever shot up here
00:26:00
then once you have any model from the list what i give you before all these fine models you have a train
00:26:06
i thought your and for example for the classification task so you get the image you pass it through
00:26:10
and quarter you present recorder and you have very correctly recognised object or person i thought get my say
00:26:16
okay i want to do the full i want to create an adversarial sample that is a perceptually very
00:26:22
very close to my original data so human cannot see it like what the market yes but wouldn't buy something
00:26:27
but i'm we find something in basically here size it in this
00:26:31
image comes to the input of my system it gives completely different result
00:26:36
i think we can different identity or maybe pointer tool
00:26:38
whatever forbidden probability author basically decide et cetera et cetera and
00:26:43
our system was not paying initially for that because we
00:26:46
didn't try to address such adverse other buttons but a party
00:26:50
our systems are very very boring riddled with this that alex and isn't it
00:26:53
shone with that even if you have already these recognition system banned it into some
00:26:59
fact generation that they recognise it you can modify it and respect generation system
00:27:04
can generate a completely different description to the image when we use you this
00:27:07
imagery even all beauty some noise or somebody vacation but it's a misinterpretation so
00:27:11
you can imagine all consequences and dangers of such kind of mixed okay so
00:27:16
therefore small somebody in conclusions that classical mush allow learning was not created where
00:27:22
actually didn't consider these two aspects of privacy and uh the set of robustness
00:27:26
initially so that's why they they want to bring 'cause aspect office now it's
00:27:32
six since the the th of this ask the of this argument as for privacy
00:27:37
i think uh the rules was one of the first guys
00:27:40
um who started to think about this from the information bottleneck perspectives
00:27:45
and we try to engage this attribute ass into thinking using the same
00:27:50
as medical products as informational so the question is how to design this optimal
00:27:56
transformation this marker that would keep all information about all the c. about
00:28:01
the classification about identity but try to remove all filter out all information about
00:28:08
other sensitive introduced from how the representations in these guys were really
00:28:12
separable you can do it into handcrafted way but if you do not
00:28:16
know how the two attributes between c. and asked are statistically depended it's
00:28:21
modern obvious task at all so that's why they call it is open
00:28:24
it optimisation problem and once you've station by the uh optimisation so therefore
00:28:29
what's that was done in them several papers from the who was it
00:28:32
was a an attempt to solve the problem of trade off b. t.
00:28:36
and you do what you were do you want to preserve classification accuracy
00:28:39
we want to go out yet um low complexity of compressor presentation at the same time with the
00:28:44
minimise leakage about security attributes i think it's clear so and here i would like to give to one
00:28:50
oh concurrent works that the work of that was that was this you radical justification in complete the optimisation
00:28:56
and the second one was more ad hoc at them to demonstrate that how it can be done well stocked too
00:29:01
implying this and uses like a different um
00:29:04
approximation notes they could not that other configuration approximation
00:29:08
which information but using their a contrast to learn approach so what about the first one claw so
00:29:14
clock that is the trade off between complexity kitchen utility bottleneck that is that division so basically what
00:29:20
was done in this work there if your recognise it again i i slightly modified respect original paper
00:29:26
so it's exactly these two terms this one the the uh trying
00:29:29
to minimise the link between you can see they're trying to compress data
00:29:33
yet we're trying to all these are in the three presentations
00:29:37
you hold information about the classification attributes yet so what we
00:29:40
want to reach you at the same time we that this is a loss and it is a minus idiot trying to minimise
00:29:47
any motion information any link between the presentation and asked about their uh sensitive
00:29:52
attributes right soviet compressing the data the clinical information in this presentation about this
00:29:57
but yet the observing around for much of course it will work very nice
00:30:02
and very cool is between past and see if there are no statistical depending
00:30:06
and those gives a lot of a lot of different diagrams explaining all
00:30:09
these combinations so i really uh uh just advice if somebody's interesting more detailed
00:30:13
story this paper and to see all the spiritual divisions yeah in the spectrum
00:30:17
so a and b. for but was was playing with them in different settings
00:30:22
for example very nice if you could take the colour nest and digits zero one two
00:30:27
three ninety percent and class and colour it's always sensitive attribute they just that for example
00:30:32
so it was possible go shows it began idea try
00:30:35
to compress but uh try to be in in general way
00:30:40
to avoid even reconstruction but was out very smart things but there's more to my strategist
00:30:44
you can see that you can keep you can preserve in the construction old age
00:30:48
is how they were in the original one but you can completely eliminate the colour information
00:30:53
right so all of them are the colour and it's obvious incentive get reviewed but i
00:30:56
cannot reconstruct on the second partial setting could be generalised to talk lane was a parameter either
00:31:01
but that also was possible demonstrated you can even prevent the reconstruction from the log in space
00:31:07
right but again this is should be treated okay what is all the loss in terms of
00:31:11
the classification right but this is also the option so this frameworks very general is very
00:31:15
nice a toolbox and uh it to play with as a framework so maybe liam advice i
00:31:22
same time this is if we're a framework that issuing that you can train and coder and
00:31:27
decoder send how come you might your presentations with very so radical but maybe in fact is
00:31:31
you might come to the foundation models what any of the foundation models so for example suppose that them
00:31:37
you have something go gravy train on very large corpus of data
00:31:41
maybe pretty trained by uh by face book for your for example
00:31:44
you know version one two or m. s. n. network or whatever and strain in the in the contract away so you have it
00:31:51
but it gives you this representations here right so but still we have seen that does not represent the it's not
00:31:58
baby see a injuring them and and eating so basically yeah the problem is it imagine here via at the uh
00:32:04
taking here the insert into the network a very small production at except just several layers
00:32:09
and you're saying okay let's train this network in such a way just a
00:32:12
post doctoral both training right in in in in pursuit would be this paper
00:32:16
that would that would preserve begin here all information and try
00:32:20
to filter out all information that might be present causes the privacy
00:32:23
issues okay so that's the difference and that put them astray that
00:32:27
this approach is also force will in terms of school ability at
00:32:31
that if you see the original data and this is was very construction from these their presentations so you see uh
00:32:37
it's not necessarily if you have already at some s. s. l. training training not or it will be privacy
00:32:42
secure get them people claim this it's not the case use you still can reconstruct that might make a very perfectly
00:32:48
some people are saying well we talk about the potential privacy or we
00:32:51
can maybe simply add noise to other presentation just fine as a way of
00:32:55
other confiscation so you see steel you can a little bit the construct maybe
00:32:59
not so nice but maybe if you push more training you can do better
00:33:03
and you see that the metrics i didn't want to go in detail so
00:33:06
we can see visually judge about that so slightly that is a performance accuracy preservation
00:33:11
without any obfuscation that is a visit visitation by edition
00:33:15
of the noise and if you train this way 'cause
00:33:18
i presented this pop you speaker here that tries to please their classification so easy almost the same almost what laws
00:33:25
but in terms of reconstruction yes person t. and a
00:33:29
construct and see that the it is the agenda that was
00:33:32
our say the okay to use your average image for
00:33:35
all ah email you there is an or person and all
00:33:39
and and mount so you can courtenay are correlated but basically you can see that we cannot reconstruct against
00:33:44
so basically imposing different constrains you can play with this and you can see the different interplay between them
00:33:50
okay so now as the c. yes it's possible to do it
00:33:54
but uh uh we need to take your whether the essence of
00:33:57
the the get the dudes and privacy uh uh and and the
00:34:00
attributes um up utility of how they like it won't be like that
00:34:04
and we need to train the system from the beginning to the and to ensure that so it's doable
00:34:09
it's it's you it's super cool but what about the scalability main b. and more practical separate multiplication the where
00:34:15
i will call the second part of my talk so it is possible to which i like to raise one
00:34:21
concern in the settings of classical machine learning problem from
00:34:24
relations when they're talking about the defender commercial learner normally
00:34:29
you never talk about the key even into the directed learning whenever talking have you ever heard about the key
00:34:36
like typically the first starting point if you talk about that
00:34:39
a particular application not because in this case it's always again that
00:34:44
a defendant the person who designed the system and that bikers
00:34:47
aeroplane basically in the same in the same conditions maybe someone has
00:34:51
more training data and we less somebody has and it is only yeah was more to people who have more computer was not
00:34:57
right but basically it is nancy a deal the classical definition
00:35:02
of good over if he adapted to the give of principle right
00:35:05
so the main idea where you want to address it from from this
00:35:08
principle then you need to ensure that you as independent you have that
00:35:11
information advantage of the attack yes and how to create and that is
00:35:16
it the brochure what i called a privacy preserving but sure rick secrecy
00:35:21
so let's start uh uh let's start with this uh thought that was good three
00:35:24
before so we train the system in the classical machine learning so you're trying to work
00:35:29
did that somewhat under presentation from the letter presentation you try to classify
00:35:33
or you you try to reconstruct but basically what about it they inject here
00:35:39
one module that will will have sure it's it's a medical sharing
00:35:43
the key between zap obfuscation model so that will produce from c.
00:35:48
and new i'll put the v. that will be in public domain and size
00:35:52
city they have the uh tiger and a packet has access to the c. d.
00:35:55
i thought there will be not capable to deduct this sensitivity every bit
00:35:59
as i'd like it will be not capable to classify i'd like a rubber
00:36:02
not a capable of reconstruct because attack it does not know this key
00:36:07
and the entropy of the key mean and amount of different combinations relatively large
00:36:12
and they said okay but not null this space and doesn't know the particular cute but we use
00:36:16
like incorrect occurred over get it will be very difficult to address so that's why they're the the
00:36:21
guys who has access to this will be capable of doing this this is a point that it
00:36:26
can be done to jointly or otherwise it can be also done jointly where the train this together
00:36:32
and this approach was can vary in yet another of a paper uh i think it would internal version of this paper so
00:36:38
and the the most rated said this approach can be also
00:36:40
useful not only for privacy but also for robustness persecuting because
00:36:45
in result only the ski attack in on that propagate the
00:36:48
the the distortions and either sorry about the not useful thing
00:36:52
okay so tripped over here does play a role for mitchell arnie still with just the beginning but it should
00:36:58
be scared of resort and i've seen already recently spoke to some publication from the group of a of a genocide
00:37:04
and i'm very glad that the vehicle here in this direction that
00:37:07
was on that d. r. introducing the seats so now in the
00:37:11
main part i would like to cover several application that okay i don't have time to talk about all of them maybe the most remarkable
00:37:18
and before i talk about by magic say saying again no need probably to
00:37:23
give all the overview about that in many aspects you know it better than me
00:37:27
but just to emphasise that it's consist of a for the people who
00:37:31
are not familiar with that so that consist of the two main stages
00:37:34
the page worry be you some features some back towards you and then we try to do something with the speech or is it
00:37:40
as a extraction to be sure there are many many modern acid supervised learning what you look recently then install
00:37:45
learning the for those people who were a little longer term that is ah face to face goes space et cetera
00:37:51
and no more more decently sell to provide a method alone and that will
00:37:56
tell loser pathetic basic you would be based on the concealed will buy magic's and
00:38:00
but metrics get the system means that we're not playing was by magic itself we're playing bass added a random
00:38:06
list at the end it key okay as a hash under construction like by the commitment possible help or they
00:38:10
can sit there okay bye might exist very well established bill there are requirements for uh put the blame ethics
00:38:17
in two words by magic should ensure that if something
00:38:20
is compromise should be easy to template replaceable either presentations you
00:38:24
we should not capable of reconstructing this from this
00:38:27
template back original data exactly what they demonstrated before and
00:38:31
in by metrics so we need to be sure that it's not easy to link the data
00:38:36
between different users and also uh basically we should satisfy the same time will before was as you
00:38:40
requested before so in anybody for a formulation we have two stages dependent that is doing and
00:38:47
all meant about that part and identity c. and maybe some privacy been told it to the template
00:38:53
and be stored in the database but they check a is that before this to when the database
00:38:58
he had the idea randomness to zap parameters so falling apart modifying some features
00:39:04
i've seen the several papers uh on this matter or you simply concatenating based
00:39:09
to the data is icky and navigation stage when you have why as a prop
00:39:13
you would assume that you have the same key here you have your
00:39:16
presentation and then you can do identification meaning say who is this person
00:39:20
or if this person is saying i am slot then it can confirm yes
00:39:23
it is lot need one more reject that slot is binary that litigation all but
00:39:29
it's in time we should not forget what i can do that i can
00:39:32
come and say oh from this popping version i might try to get out base
00:39:36
a sensitive attributes i might try to run the reconstruction about all i might perform there
00:39:41
not analysis and ling some user so all these s. that should be carefully taken
00:39:45
to cow okay so besides that we should not forget about the other side a lot
00:39:50
biker or maybe an uh people who will come and practically practically person yes and
00:39:56
physically present another one so these minor details should be also well distinguished and taken into
00:40:01
account when we design they both template extraction and complete productions keeps uh one thing
00:40:07
that what we encounter in many application when we started to work and is the main
00:40:11
that's about the medical records a working puzzles people that after that there's work at
00:40:15
a continuation without that liberation established by the rules there's a cuckoo a hospital in geneva
00:40:22
uh and also for storage of a very large collections in a regular twenty that's where
00:40:28
people also want to collect but not want to share a barbecue images a very huge
00:40:32
so they you know there's a presentation is based on the compress that
00:40:35
privacy preserving your presentation so i imagine you have in order to compress the
00:40:39
data and you will that that you compress the data but isn't whether
00:40:43
gives you hear just i'll put that this will continue as a a presentation
00:40:48
the apply here quantisation very special way that the output is pacify
00:40:53
specify if you have continues and use your here you have like the
00:40:56
zero so nelson will plus minus one each of these viewpoints so the
00:41:00
way oh computations apply here based on the key you add noisy components
00:41:05
plus minus ones in the secret locations so that i can if he
00:41:09
or she doesn't know where these uh um because nation added you cannot
00:41:12
reconstruct it or if you have the knowledge of this you can eliminate
00:41:16
this noise at yes as a vacation you can reconstruct so i'm cold
00:41:20
and and basically it again advantage if your fries user you can communicate
00:41:25
you can compress decompress if you're not of fries you target and you
00:41:28
cannot reconstruct and the last but nobody's application to talk only about physical
00:41:33
and we understand that it's important for the humans will buy metrics but
00:41:37
it's also important to the physical optics physical do we present different aspects
00:41:41
of our life yes i'm not talking about only lottery watches abroad it's
00:41:45
uh oh maggie a worry about some hard but is also important fact
00:41:50
electronics it's also important from t. s. and he didn't sit there are
00:41:53
and basically uh right now that is it and the wall so good like the
00:41:57
same privacy security principal to the protection of the physical objects so how it is done
00:42:02
i will say the in on string why classical mess it will not work basically
00:42:06
if humans they possess unique uh by magic spend a print iris and said that are
00:42:11
physical the big do possess the same by metrics but they
00:42:15
should be observed on their me crusty on on a scale okay
00:42:18
and then if you have these not muscular presentation can extract exactly the
00:42:22
same in the principal template like it wasn't by magic then you can start
00:42:26
so we're working also in this direction so we have automatic line somehow but
00:42:30
the text are the sneaker feature they stored in the database and then users
00:42:34
can send this information verify they have the whole record from which place on
00:42:39
the world is done it with time with the outcome incidence that there are
00:42:42
and basically it to summarise it is exactly the same seems exactly same scenario like four by metrics but as the input
00:42:48
of the system be using micro structures are the physical optics and that can be applied to anyone chickens well it used
00:42:55
to be you there's a lot of benefits clear 'cause mobile phone you can say it's a pentagon although taking you can
00:43:00
track trace you can have both chain records units can be
00:43:03
uh also on a for direct marketing and analyses and cetera
00:43:08
and in conclusion as you see uh it's interesting to have the interplay between this you into packets
00:43:13
and different applications and especially in you mind we talk about future
00:43:19
and futures that many machine learning techniques requires similar techniques
00:43:23
like using by metrics meaning some keys some cripple system not
00:43:27
only trying to do with that optimisation dreamworks how it is
00:43:31
done mostly right now and be a lot of new challenges
00:43:34
will come they already arrived with their uh a a i generated content to distinguish where this content to generate that