7 months ago - EVE Online - Direct link

Transcript (by Youtube)


2s foreign
9s so as you see we very much share this
13s positive vision of Hilmar
15s hello FanFest thanks a lot for having us
18s here it's great to be back in Reykjavik
20s after this rather lengthy covet break
24s it has been a long time
26s but we're not idling through these years
29s so much so that in 2020 we introduced
33s the third iteration of project Discovery
35s inside Eve and as you will see with
38s quite amazing results
41s today I'm here with two of my fellow
43s partners in crime with Jerome valdespool
46s from Michael University
47s and Ryan Brinkman from dotmetics slash
51s University of British Columbia
53s and these great brains will bring the
56s real scientific beef to our talk today
58s so I try to be rather short but I have
62s some really interesting news that I must
64s share with you
67s so
68s some of you might know me
70s um I'm the CEO and co-founder of a small
73s Swiss company that we named massively
75s multiplayer online science or MMOs we're
79s the ones who kind of seduced CCP on this
82s crazy adventure which became a very
85s successful and long-term collaboration
88s in the field of Citizen science
91s and the idea was to use this amazing
94s force of collective human cognitive
97s capabilities that you guys the eve
100s Community provides
102s to advance science
105s now I'm also an adjunct at McGill
107s University
108s and also I'm maybe an honorary Eve Dev I
113s mean I have an eve death shirt which I
116s are really appreciate but I think the
120s real utmost test will be whether I get a
123s Viking sword from Hilmar for the Thames
126s here so we'll that's to be seen
130s so project Discovery it was really a
132s hell of a nine years actually this story
133s started in 2014 with a call to and in
137s our grand ccpc gal and that call started
141s a series of events that led up to what
143s we know today as project discovery
146s it was not obvious it was not with a
149s risk or challenges to pull off such an
152s unprecedented collaboration
155s but really with the amazing help of the
159s teams at CCP and with your continuous
162s support which was
165s started right from the beginning right
166s in 2015 when we introduced our first wig
171s ideas about this com concept at Fan Fest
174s all these poor fruit and in 2016 we
177s finally released project Discovery the
180s first iteration was a collaboration with
181s the human protein Atlas a team led by
184s Emma Lundberg
185s and we asked players to classify to
189s improve the labeling on a massive data
192s set that they have of this beautiful
194s immunofluorescent microscopy images of
197s human cells
198s this is a data set that is used by tens
200s of thousands of researchers worldwide
204s now this project already broke records
207s so I
208s think you might remember that the
210s ultimate reward for project discovery
213s was the sisters of Eve combat suit which
216s was supposed to be handed out some in
219s some three or four weeks after launch
221s the first one was unlocked 17 hours in
224s 17 hours and the second one six minutes
225s later so yeah the eve Community proved
228s to be quite something
231s now in 2017 we switched to hunting for
234s exoplanets and we had the pleasure and
238s honor to have uh Nobel Laureate
241s Professor Michelle Meyer on the project
244s from the University of Geneva
246s he is the one who discovered the first
248s exoplanet
249s and I vividly remember that after his
253s talk
254s Professor Myers talk at Fan Fest
257s Burger CCP Burger came to me and said
260s that he out near the Deep players which
263s sounded like the ultimate Eve compliment
267s and in 2020 we jumped on a new project
270s in a time of urgency and need
273s this was to help improve our scientific
276s tool set in Immunology that can help
278s covid research and so much more in the
281s future
282s now this was a milestone project which
284s because we managed to show that
286s with everything in place
289s we can act as a rapid action force
292s and just to give you an idea from the
295s fir from the time that Berger sent me an
298s email
299s about you know is there anything that we
301s can help to the time when we launched a
304s project it took only six weeks that's
307s quite remarkable
310s now I mentioned to you that you will
311s hear about what uh you your work
314s collectively collectively has amounted
317s to
318s soon
320s and believe me it's nothing short of
323s ground baking
326s so this is where we are now everybody is
328s happily clustering blood data in Eve
331s but before I pass on the mic I want to
333s talk a bit about the future
338s sorry I couldn't resist so this might be
340s the only time
342s one more thing so there is a thing that
345s came up consistently during all these
348s years
349s from you guys and from other
351s gamer communities as well
354s and it seems that in 2024 finally we'll
357s be able to respond to this request
363s in 2024 we're launching a new mobile
366s platform a new mobile citizen science
368s platform
369s uh the place science IOS and Android app
372s and it's supporting API that will
375s substantially increase our capabilities
377s to serve the scientific Community with
380s this tool set that we're building for
382s almost 10 years now
384s now for you primarily what it means is
387s that
388s when you're not
390s in front of Eve at your PCS you're on
393s the go you can still play with your
395s favorite Eve minigame project discovery
398s and you can help analyze scientific data
401s you can collect points and with all
403s these points you can get get back to Eve
405s and reclaim your Rewards
407s sort of like a project Discovery
410s companion app if you will
413s but it's more than that
415s we have not will have not only project
417s discovery on this platform but it will
419s act as a host of for different citizen
422s science minigames some of these will be
425s co-created with our partners in the game
427s industry some of them will be just plain
429s old citizen science app
432s these games will
435s vastly improve our capabilities to to
438s open up this platform for other research
441s initiatives
444s and what's important for you is that
447s regardless of which project you
450s contribute to
451s you get the same achievements same
453s points and the same sweet rewards inside
456s Eve
459s now with all this we really hope that
462s this will become sort of a platform for
465s like-minded players players for whom
469s doing good by playing is not an oxymoron
473s and the reason why I believe in this is
475s that well partly because
479s these last years I think I talked to
481s over 100 100 of game developers
485s none of them discarded this idea on the
488s contrary everybody was super inspired
491s the real reality is that
495s the fact that we pull this off project
497s Discovery CCP or Borderlands science
499s with gearbox
501s is almost a miracle
503s game developers are super swamped
506s and I guess you know better than me
510s this endless
512s things you want to see in the game to be
516s fixed to be introduced to be changed an
520s endless wish list that you could curate
522s for Eve dabs so
525s with all the dedication that CCP have
528s for this project there is only 24 hours
531s a day and seven days a week so
534s with this step we believe that we can
536s relieve a lot of pressure on the game
539s devs in-house
541s and the game we can make it much more
544s accessible for other games and other
547s gamer communities to join forces with us
549s and be part of this story
553s and finally
555s we at MMOs will do our best to steer
559s this app in a good direction and to do
561s all the development and to serve as many
563s research projects as possible but one
566s thing we want to avoid is to
569s for us becoming a bottleneck in this
572s process
573s so what we'll do is we'll take this API
576s that we're developing and we'll open
579s this up for partnering research
581s institutions
582s so and this is the Grand Vision that
585s in the future you might be playing with
588s some of the well-known citizen science
591s projects out there and you're still
593s contributing to the same achievements
595s and points and Eve rewards and this is
598s not just wishful thinking I've been
599s talking to many of the major citizen
602s science projects out there and everybody
603s is super excited about this possibility
609s so this is what we're planning and I
612s believe if we're doing a good job
614s we'll probably write citizen Science
616s History again together so I hope you're
619s excited as excited as we are
624s there is one caveat that that I have to
628s say is that um
630s of course at this point nothing is
632s carved in stone there's a lot of things
633s to sort out so things might and can
637s change in the following months as we
639s progress with this project also look out
641s for the eve newsletter because we'll
644s send out a link to sign up for beta
646s testing in the coming month
651s and finally I cannot stress enough that
655s without your support continuous support
658s enthusiasm and encouragement project
661s Discovery would not exist
664s so a huge thank you for all of you who
666s became a citizen scientist through you
669s online
670s thank you and please welcome Jerome
672s valdispil
683s foreign
686s I'm a professor of computer science at
689s McGill University
691s I'm working in bi-informatics and human
693s computer interactions
695s part of my research is to blend together
697s computer and human brain to solve the
701s most fundamental impressing scientific
703s challenges
705s so you may want to ask why do we need a
708s human brain at the age of computers and
710s Ai and I try to partially answer your
712s question here
715s so in science we try to solve
717s fundamental problems like protein
719s folding finding patterns into DNA
722s you never access to the ground truth
725s you can only measure some different
727s parameters that potentially can estimate
730s the quality of the answer you're
731s providing
733s the problem here is like these
735s parameters are not comparable directly
738s so the best you can do is to compute a
741s set of solution that are all optimizing
744s these parameters and that's what we call
746s the Pareto front
750s so if you look at this a graph here you
754s can measure one parameters you do size
756s or the density and what you try to
757s compute is this green line here the
759s problem is like the parameters we're
760s estimating are not perfect
763s so in truth what you want to compute is
765s the region near this part of the front
767s the solution are near optimal
770s and this arguably computer can do it
772s very well but this is where things start
774s to be complicated
775s because eventually in this set of
777s solution we want to find
780s the true Solutions and how do we do that
784s we do that by calling out to scientists
788s so scientists seat look at different
791s solutions in different parameters and
794s all agreeing together about where the
797s ground truth should be
800s and what it generates is something that
803s computer cannot do it generates Trust
807s the problem we have is science we have
809s massive amount of data ridiculous amount
812s of data
813s so it's not possible for a small
815s research team to go for this data even
817s if we know it's fundamental it's
819s essential for good research
822s and this is where cities and science aim
824s to solve the problem
826s the principle of teaching science is to
828s gather a lot of people
831s that will look at the data and then we
833s rely on the human patent Recreation
835s skills to identify the most promising
838s Solutions and the agreement generated by
841s by the citizen scientists we generate
843s the ground we identify the grounds with
845s solutions that experts usually find
848s and then suddenly we can generate trust
851s again for massive amount of data
855s one problem that is fascinating me that
858s is ubiquitous in science is clustering
861s in a small abstract form it's about
864s finding patterns within distribution of
868s data
870s and identifying these patterns is not
873s simple it cannot be embedded into a
875s single mathematical function often
878s sometimes we want to identify something
880s that is done sometimes that uncertain
881s rules of continuity there's many
883s different parameters and the only way to
885s really robustly identify the solution
888s is to do it by having many people humans
893s agreeing together where the solution is
895s so we had work doing on that in my lab
897s for for a large time
900s uh uh for solving the clustering problem
903s and in creating robust solution
904s involving crowdsourcing but we had one
907s major issue engagement
909s we never managed to have enough people
911s sitting here on the table looking at the
912s same data and agreeing together
915s so how do we solve that
917s this is where project Discovery enters
920s in 2016 when I heard for the first time
922s about project Discovery I was amazed for
925s the first time in my life I understood
927s that something can change science for
930s the first time in my life I understood
931s there was
933s thousand tens of thousands hundreds of
936s thousands of people that could be here
938s looking at the same data and solve this
940s this problem and most importantly what
943s convinced me it was the key for that is
946s the enthusiasm generated by the
948s community in general
949s so I gave a call to Attila and say I'm a
951s fan your project can change science I
955s just want to hear more about it and
957s eventually we discussed and we became
959s friends and then
962s pandemic Hunters
964s 2020 Attila calls me
967s and tells me well we have an issue yes I
971s know uh and I'm in touch with CCP and
975s TCP wants to help they want to do
977s something we have a fantastic tool
979s project discovery that that already
981s achieved uh scientific Milestones but
985s they want to help and they want to make
987s something fantastic do you have an ID
990s yeah I do have an IDE I know a problem
993s that can make a difference in biomedical
994s research we know that it works and most
997s importantly
998s I know an expert of the field that can
1001s help us to make it real and this is
1004s where Ryan brings man's Ender
1016s hi I'm Ryan I'm a scientist I'm also a
1019s gamer a pretty hardcore gamer I beat Eve
1022s about three years ago by not playing
1025s it scares the crap out of me
1028s um because I'm a data scientist and I
1031s like my life
1033s um so I do Photoshop bioinformatics
1035s informatics and as part of that I go
1038s around the world I've given hundred
1039s talks in front of scientists and I go up
1041s here and tell them the cool I've
1043s done and at the end of it they applaud
1046s um I feel good
1048s um and you and this is for you if you're
1049s listening in Soviet Russia
1053s I'm here as a presenter telling you
1056s about the cool stuff that you've done
1057s and I just want to applaud you
1062s for the last Thousand Days thousands of
1065s you have been playing Project Discovery
1067s you've analyzed millions of plots
1071s and the data that you're generating is
1073s going to change the world it's going to
1075s help us find cures for cancer it's going
1077s to help us find cures for all kinds of
1079s diseases and it couldn't have happened
1081s without the people in this audience who
1083s played project discovery
1087s thank thank you
1090s thank you so much and not just for me
1092s all the data that has been submitted is
1094s going to be released for any use
1096s whatsoever by other scientists around
1098s the world and I know this is coming and
1100s they are so very excited
1102s it's incredible
1105s um
1106s the reason why this is so important is
1109s because the data you're analyzing is
1111s going to help us understand the immune
1113s system so right now as you're sitting
1115s there
1116s your bone marrow is pumping out stem
1118s cells these are these white blood cells
1120s differentiate into many many different
1122s types of cells
1124s we can understand the functions of these
1126s cells by looking at the proteins on the
1128s cell surface
1130s and these these proteins on the cell
1132s surface tell us what these cells are
1134s doing right now in your blood there's
1136s some cells in your blood called t cells
1139s and these are the one these are the
1140s cells in your blood that help us or when
1142s if you've gotten Cova before
1145s um we or help mount your immune response
1148s to the disease there's other cells in
1150s your body that can be re-engineered to
1152s cure cancer
1154s there's a and then twenty thousand
1156s eighteen there's a Nobel Prize given out
1158s for the discovery of car T therapies
1160s there's now six FDA drugs approved that
1163s can cure cancer we're not going to find
1166s a single cure for cancer but car T
1168s therapy is probably the brightest thing
1170s that's ever happened in in the history
1171s of understanding this disease there's
1173s other cells in your body that can detect
1176s bacteria that have come into your into
1178s your system and Target them so other
1180s cells called natural killer cells can
1182s find them and attack them and get rid of
1184s them if we didn't have a functioning
1186s immune system we'd all either be living
1187s in bubbles or be dead
1189s and flow cytometry is a technology that
1192s helps us understand this
1194s so how this technology works as we come
1196s up to somebody sneak up on them stab
1197s them
1198s take their blood and then what we do is
1201s we label proteins on the cell surface
1203s with antibodies that are conjugated to
1206s fluorochromes such that when these cells
1209s pass one by one pass a laser in a flow
1211s cytometer they glow
1212s and the amount of light that these cells
1214s give off is proportional to the amount
1217s of protein they have on that cell
1218s surface that we've labeled them with and
1220s we're label them with all kinds of
1221s different uh fluorescently conjugated
1223s antibodies so we can see what's there
1226s so if you played project Discovery
1227s you've seen plots that are shown here
1229s like on the right hand side of the
1231s screen so the cells on the very right
1233s have 10 100 a thousand times more of
1235s that approaching on the cell's surface
1237s and because of that we can infer the
1239s function of those cells while the cells
1241s on the left have ten hundred thousand
1243s times less and then by drawing the boxes
1246s around these Gates that the scientists
1248s are doing right now all over the world
1249s and enumerating how many sit in that box
1252s we can infer you know if how that much
1255s that percentage changes from sick versus
1256s healthy treated versus untreated and
1259s this is how we understand how the immune
1260s system works flow cytometry is the only
1262s technology that allows us to do this in
1264s the proper way
1266s it sucks
1267s it sucks for lots of reasons the
1269s analysis part sucks
1271s um it's very complex we're looking at 40
1273s dimensional data on a two-dimensional
1275s screen and the scientists have to try to
1277s figure out the right Pathways to go
1279s through that data and find all those
1280s cell populations it's time consuming to
1283s analyze one sample from a patient now
1286s there's many plots they have to look at
1288s that uh to make that analysis not just a
1290s single one it could take 30 minutes to
1292s two and a half hours that a scientist is
1294s looking at that plot to understand for
1296s example if that patient has minimal
1298s residual disease and going to die of
1300s cancer we're making diagnosis life and
1302s death decisions based on Flow cytometry
1305s data analysis and it's so important
1308s it's not cheap to pay these people
1310s doctors and scientists not me but get a
1313s lot of money to do their work or or you
1315s can Outsource it to another country and
1316s wait three months to get the answer back
1318s and that sucks and between scientists
1321s it's it's a challenging thing if you
1323s play the game you know where exactly do
1325s these do we want to put those Gates and
1326s even professional scientists people who
1329s do this for a living have trouble and
1331s they can get about to 32 percent
1332s difference in between two scientists
1335s and there's just so much data out there
1337s that there's no way they can possibly
1338s look through it all
1340s so obviously if you've been awake for
1342s the last three years AI machine learning
1345s is the way to go to solve this problem
1346s so all we have to do to do that is go
1348s out and get the data that the scientists
1350s have analyzed and then use that to train
1352s an algorithm because you need data to
1354s train machine learning it's in the name
1356s right machine learning well the problem
1358s is scientists don't share their data of
1362s how they've analyzed that and there's
1363s lots of reasons for that and we won't go
1365s into it in the today's talk but there's
1367s no way we can get the amount of data you
1369s need to train machine learning
1370s algorithms to make this possible
1373s that's where project Discovery is CH oh
1376s wait I didn't have this but it's a good
1377s one this is where project discovery
1378s changed the game
1380s yeah
1381s yeah
1384s um
1385s but you as smart as all of you are and
1387s there's some really really smart people
1388s that play Eve online I get that but
1391s there's no way that we could teach all
1392s of you the biology that the scientists
1395s have in their head for how to trace
1397s through these patterns of data and the
1399s reasons why they're getting certain
1401s populations
1402s it just there's just too much
1405s information in there to do but um it's
1408s kind of like Shakespeare and not that
1410s you guys are monkeys but with enough of
1412s you we can write Shakespeare
1416s and that the hypothesis that we had at
1419s the at the beginning of this whole story
1421s when covid came out is that you if I
1424s give you guys plots like this you can
1427s probably figure out what the things are
1430s without actually understanding anything
1432s about the biology
1433s and you guys did it it's incredible and
1437s not only is that I just have to thank
1438s the CCP people who are here who are at
1441s the very very start of covet in the
1442s space a few weeks re-engineered the same
1445s kind of technology and the same kinds of
1447s plots that professionals who do this for
1449s a living
1451s do within Eve online and it looks just
1454s like this and the Hope was that if I
1456s give you guys some pictures that look
1457s like this and say can you draw circles
1460s around the important bits you would and
1462s you do again and again and again 24
1465s hours a day tens of thousands of you are
1467s doing this and you're doing a really
1469s fantastic freaking job and please don't
1471s stop
1472s at the lowest level in this plot is 6.5
1476s million plots per month
1479s it's incredible it's amazing
1482s it was so much data that we had to write
1485s new algorithms you guys are just going
1487s through it so fast we're going to figure
1489s out better ways to give you data so we
1490s had to invent new algorithms and I'm
1493s sorry for those of you are playing this
1494s right at the beginning of covid you
1495s probably saw a lot of the plots that
1496s looked the same because we didn't have a
1499s way to find heterogeneous plots so you
1502s would see things that were different so
1503s because of this we had to invent whole
1505s new ways to figure out how to show you
1507s data so you could give us data and so we
1509s just published a paper on this and
1510s hopefully we've been playing the game
1511s recently you've started to see more
1514s patterns in the data and that's really
1516s important for us to be able to change
1517s train machine learning algorithms
1519s because it's that variety that's going
1521s to allow us to do more and more with
1523s that
1525s so for those of you who played a game
1527s that now my chance to tell you something
1529s that some things that we've learned that
1530s maybe can up up your game
1533s is what I'm showing here is the same
1536s plot that's been analyzed by five
1538s different people and yeah you have to
1539s apologize where you see those lines
1541s overlapping that's not really the way it
1543s is it's just when you have 400 million
1544s plots you can't do things perfectly it's
1547s just we don't have enough compute time
1548s so they make some we had to make some
1549s shortcuts in how we drill those
1550s boundaries just assume those aren't
1552s overlapping so you can see on the left
1554s hand plot somebody drew two then
1556s somebody else do three and four and five
1559s and six these aren't the droids you're
1561s looking for
1563s um here's another plot same kind of
1565s thing somebody's drawn three and then
1567s somebody else Drew four five six seven
1568s Gates around the data this is also not
1571s really what you're looking for some of
1573s you people are amazing
1577s this and I've shown I've given talk like
1578s I said give talks all over the world on
1580s this and I show scientists what you guys
1582s have done this I'm not making this up is
1585s better than expert scientists do there's
1588s lots of reasons for that the main one is
1590s they're so focused on the biology and
1591s the parts that are interesting they
1593s ignore everything else
1595s and that's sad because there's there's
1598s many things we can do with
1599s photosynometry data we can make
1600s diagnosis on patients and in order to
1603s make that diagnosis we have to find
1604s specific cell populations and see how
1606s much those have changed and the reason
1608s why we look at those specific
1610s populations is because some professor
1611s and their grad student and their grad
1613s student and their grad student for tens
1615s of tens and tens of years have studied
1617s specific cell populations I've
1618s understood those are important
1621s there's lots of information in this data
1623s there's so much information that they
1626s can't look through at all but because of
1627s the way you are analyzing this you're
1629s putting boxes around all the things and
1632s from that not only can we do diagnosis
1634s of patients we can do discovery of new
1637s things and that's the way we're going to
1639s not only find new drugs but also to make
1641s sure that drugs that you that people are
1643s going to get are safer so they're not
1645s just targeting the things that we know
1647s about but there's nothing that that we
1648s call off-target effects or the new drug
1651s that we give you hits something in the
1652s immune system that we never knew to look
1654s at but then you're going to end up
1655s growing a new toe or something and
1657s that's going to make you sad
1659s so what you see here for example this
1661s this person this is where I shine the
1662s laser in somebody else's eye you can
1665s I don't know where it's it's the laser
1668s doesn't really work you can see the
1670s person on the very bottom right what
1672s they've done is they've drawn a a gate
1674s around every feature so you have to
1677s remember the the order on these axes is
1679s a log order so the cells on the left are
1682s have tens of thousands of amounts more
1685s of that protein and the cells on the
1687s other side that I said left versus right
1689s so because it's so much more protein on
1692s that we're inferring those cells have a
1694s different function there's no truth in
1696s this there is no right answer there is
1698s no way that these cells can say hey this
1700s is the function I have until you sort
1701s them out in a tube and do some kinds of
1703s experiments on them but you can't do
1705s that until you put the Box around them
1706s so you can put them into a well and do
1708s experiments and so this is amazing
1712s and lots of you are doing this and we I
1714s don't know how you knew to do this
1715s because we didn't really tell you and
1716s one things that we're that we're hoping
1718s to do is start giving better
1719s instructions to allow you to do this but
1721s enough of you are doing this it's
1722s amazing
1724s so not only just one or two or three
1727s people are giving us the right answers
1729s what we're doing is we're enough of you
1731s are doing this so we might get 15 or
1733s more people that are doing the really
1735s kind of perfect Gates that we want and
1738s then what we can do is we can make a
1739s consensus out of all those people to get
1742s closer to the truth than any one of you
1745s guys can do by your by yourself and
1747s that's this the consensus of multiple
1749s Gamers that were using in part to train
1752s machine learning algorithms
1756s so not only are there fan fests for
1758s gamers who play Eve there's Fan Fest for
1761s scientists we call them conferences
1764s and
1765s um we go there and I we go there and
1767s give talks about science and I get on
1769s stage and say we're doing some cool
1771s stuff and look what I've invented
1773s two years ago we played Eve online in
1777s Philadelphia and we set it up in a booth
1779s and we had professional flow cytometry
1781s people there is about as many of them
1783s there as there are people here
1786s and we had them play so you can see them
1787s they're playing the same game you guys
1789s played
1791s and so we did something we we compared
1794s the professional scientists
1796s Against The Gamers now we kind of gamify
1799s the game that they were playing they
1801s only had a minute and 30 seconds to do
1803s as many plots as they could so you guys
1805s can stare at it as long as you want if
1807s we allowed the scientists to do that
1808s they would still be there today because
1810s they want to get things exactly right
1813s it's not an exact comparison but the
1815s punch line is
1817s the statistics that p-value that we got
1819s on that results of you versus the
1821s scientists you see where this is going
1823s you did better
1825s statistically better
1831s and I got Shivers Shivers I have I have
1834s Shivers right now it's amazing because
1836s that gave us the confidence that the
1838s data you're doing is going to allow us
1840s allow us scientists me and others who
1843s are going to use this data to do amazing
1845s things
1847s so that's the data that we're using to
1848s train the machine learning algorithms
1850s and this is this is great this is really
1851s simplified break it down data that we
1854s can just feed thousands and thousands
1856s and millions of plots and train Michelin
1858s learning algorithms and we've done it
1861s we have two proof of concept machine
1863s learning algorithms that we developed
1864s using completely different machine
1866s learning approaches
1868s that can identify every feature into
1871s edible polygons the same way that they
1873s were done it by hand down all possible
1875s hierarchies and like like I said this is
1878s going to enable two things it's going to
1880s enable diagnosis of patients we hope if
1882s it gets good enough at least in a very
1885s broad sense more importantly it's going
1888s to enable Discovery at scale in a way
1891s that is simply not possible with the
1892s Technologies and the software that we
1894s have available today
1896s and we can put this into software
1899s so that somebody can click a button and
1901s Gates appear that fast and that's
1904s trained exactly on data that came out of
1906s you guys they can get the gates that
1908s fast remember how fast it so they might
1910s be 20 or so plots that they have to go
1912s through we can generate ones that quick
1914s it's a game changer I've showed this to
1917s exec senior level management at
1920s pharmaceutical companies and they cannot
1922s wait to get their hands on this software
1924s because again this is going to allow
1925s them to look at off-target effects and
1927s drugs and get drugs faster and cheaper
1929s that cure so many diseases of the immune
1931s system
1933s so this is the result that we can get
1935s out of machine learning algorithms
1937s trained on Project Discovery data I have
1939s shown this to scientists again they
1941s can't believe how good this is It's
1943s amazing
1945s now is it always that good so this is
1948s one other example of how the machine
1950s learning algorithms can pump out of data
1952s and it looks pretty good I'm really
1955s happy with this result but it's actually
1958s one of the worst performing results
1960s scores that we had it's important that
1962s we show the bad data so the F1 score
1964s that I'm showing here you guys love math
1966s maybe it's a harmonized mean of
1968s sensitivity and specificity you can kind
1970s of figure it out as a percentage it's
1972s not a very good Mark 57 you kind of want
1975s it to be like one or 0.57 you want it to
1979s be one so why is it so bad
1981s because scientists can't make up their
1983s minds of what they want
1985s so the challenge here in what you show
1987s what I'm showing on the left hand side
1989s of the screen is the gold standard and
1991s the right hand side is the result of
1992s machine learning algorithm that was
1994s based on Project Discovery and what's
1996s shown in red is where the difference is
1998s between the truth and
2002s what the algorithm came up with
2005s and so the challenge here is humans are
2008s doing two different kinds of things at
2010s the same time
2011s the humans
2014s um in at the bottom of the plot whereas
2017s that that smear smears are really hard
2019s where there's no definable
2021s information about where something ends
2024s and something else begins and so on the
2026s bottom of that plot
2028s the scientists or what what we what they
2031s believe they want is different than
2033s what's shown on the left hand column
2035s because in the on the bottom they've
2037s kind of drawn the line between the
2039s populations on the left and the right
2040s and on the left hand column they said
2042s that smear in the middle goes into its
2044s own population and so in the same kind
2046s of context based on biology
2049s based on their understanding of the
2051s disease which is something you guys
2052s don't have a handle on but because this
2054s is cd14 versus cd16 we have to make
2057s different choices about what to do with
2058s those different bits they're making
2060s different choices about how to separate
2061s that stuff out you guys can never do
2063s that
2063s but
2065s um what we're what we're learning from
2066s this process is what we can do is have
2069s the scientists give us a template just
2071s give us one sample that you've done the
2073s analysis on and we'll use that to drive
2077s the machine learning approach to their
2079s version of Truth
2081s and and that seems that that's going to
2084s work for us in cases where we make
2086s um they want to make more refined
2088s decisions and that that's going to still
2089s help us a lot because then they don't
2091s have to get 300 samples they might only
2093s have to get two or three and they'll
2095s have to do a few because not every
2097s person who's sick is looks the same in
2100s the same way in their immune system and
2102s so they might have to get one healthy
2104s person one sick person and maybe one
2106s person with three heads and then we can
2108s use those templates and match them with
2110s the sample that we're trying to analyze
2112s and say which of these templates that
2114s they've analyzed do we want to use on
2116s top of the machine learning driven
2118s approach that has been coming that's
2120s come out of project Discovery and that
2122s looks like it's going to work really
2124s really really well
2126s not only are we going to help do the
2128s analysis of the of the data to find
2131s those cell populations
2133s because you guys are so awesome that's
2135s given us the idea that we can do more
2137s so the other problem that scientists
2139s have is once they identify those cell
2141s populations we have to talk about them
2144s we have to have names that we can
2145s converse in which is why we have things
2147s like T cells and B cells and natural
2150s killer cells that we can say oh that
2152s natural killer cell population is 22 so
2156s we have to have these labels on these
2157s cells well it turns out that's also not
2159s standardized the when the when the when
2162s the scientists talk about them they
2164s don't have some they have some words but
2167s people aren't using them in a
2168s reproducible or even computational way
2171s so we could do massive analysis
2174s so we can find everything but what did
2178s we actually find
2179s so the next stage that we're doing is
2182s not only are we doing this breadth of
2183s data that you've been doing so far is
2186s we're going to go down specific Pathways
2188s of data that for example in the
2189s diagnosis of cancer or a diagnosis of
2192s some other disease of the immune system
2193s we're going to go deep down these paths
2196s and you guys are going to analyze the
2197s data and because you have a full view of
2199s the data and you can put boxes around
2201s all these dots that's going to allow us
2204s to make maps computational maps and
2207s those computational Maps again we're
2208s going to give out to everybody and I
2210s talked to a whole basically every
2212s software company that does this
2215s we're all going to work off the same map
2217s that's going to come out of the data
2218s that you guys are providing and we're
2219s going to put the labels on there through
2222s as a second step that we can get from
2225s literature and we're going to get all
2226s the scientists in the room to agree on
2228s those labels and that way everybody in
2230s the world is going to be coming off the
2231s same map and using the same words and
2234s that way we can compute on data in a way
2236s that we just can't do today in any way
2239s very large big data analysis it's going
2243s to be amazing
2245s and you have to understand this is going
2247s to completely change the workflows that
2251s in one company alone at BMS there's 800
2253s people who do nothing but analyze flow
2255s cytometry data that's just one big
2257s pharmaceutical company that not taking
2259s part all the other foreign all the big
2261s Pharma companies That's not including
2263s all you know thousands of people in all
2265s the academic Labs not just in one site
2268s and so what has to happen is many many
2270s different kinds of things and through
2272s the work that has been done through
2273s project Discovery we're going to change
2275s that workflow and automate that such
2277s that
2279s um once the data comes off the machine
2280s we can suck that up into the cloud we
2283s can use the ml gating that's come off
2285s project Discovery to gate all everything
2287s that's in there we can put labels on
2290s those populations and then there's other
2291s stuff we can do that's not really tied
2293s to project Discovery but it is enabled
2295s because of this technology coming online
2297s such as we can do all the statistics
2299s that people want to do and then we can
2300s pull out the things that are different
2301s for example this green this green line
2304s is a patient who's had a very different
2307s reaction to a drug than everybody else
2309s this is how science work this is how
2312s discoveries are made this is how we
2314s prevent people from dying from drugs
2316s that we give them when we didn't know
2317s that there's some kind of effect because
2318s we just haven't looked at enough data
2320s this is kind of fundamentally change how
2323s science is being done because it allows
2325s scientists to do science rather than all
2328s that you guys are doing right now
2330s and that not only can we do that against
2333s the data that people are generating
2334s today once we have those tools in place
2336s we can go back and apply them to all the
2339s data that these pharmaceutical companies
2341s have in their databases it's just
2343s sitting there they collect it they don't
2345s throw all this out but they can't do
2347s anything with it because it's not been
2348s analyzed in a harmonized way so we can
2350s scale this technology across all their
2352s past and future data and we can put this
2355s in a standardized platform and this
2357s digitization of data is going to
2360s fundamentally change how discoveries get
2363s done it's going to fundamentally change
2364s and then the investment that anybody has
2369s done on data so we can return that and
2372s this is the rri the return on investment
2374s of that Discovery dollars
2376s worth nothing
2377s extra effort because it's all automated
2379s and the beauty of this
2381s is that if somebody somewhere discovers
2383s this new cell population that I don't
2386s know
2387s toe fungus or something right and we
2389s found something that nobody's ever
2390s looked at before great it cures toe
2393s fungus
2394s um what else does it do
2397s and because of this Automation and
2399s standardization of the pipelines and the
2401s analysis and the way we can now push all
2403s this data through and have it all
2404s sitting there waiting as soon as
2406s somebody finds something
2408s we can automatically see well what else
2410s has this been seen in that we haven't
2412s looked at it before because we didn't
2413s know to look there
2416s that's crazy cool
2419s so
2420s it's because of you guys this happened
2423s there's so many people have contributed
2425s 20 000 people did the really really
2427s really good job
2429s is basil here
2431s Maybe
2432s no what we did is we ranked everybody
2436s from one to twenty thousand who's given
2438s us data for project discovery that ended
2441s up in the really good consensus
2443s go to tinyurl.com project discovery
2446s see if your name is there email me and
2449s if you happen to be the top ranked
2451s person who's here at Fan Fest
2453s swag
2457s that would be awesome and really all of
2460s you even if you're not basil or Zarina
2462s or aberrend Oren or the vipes
2466s from me from a behalf of the community
2468s thank you again so very much
2471s I also want to thank CCP for making this
2473s happen
2474s um especially David and Julia if you
2476s guys are here
2478s um really the love that they have for
2480s this project discovery that allowed this
2481s to happen
2482s couldn't have happened without them
2484s couldn't have happened Without You 710
2487s 000 different players accounts and
2488s Counting lots of people who've been
2490s involved many many scientists are
2492s interested in this data many scientists
2494s have been
2495s um working to get this done have to
2497s thank our funders and now again thank
2499s you all so very much
2505s foreign