over 2 years
ago -
EVE Online
-
Direct link
Transcript (by Youtube)
1s | uh hi guys so my name is nick herring |
---|---|
3s | i'm the technical director of |
4s | infrastructure for eve online at ccp |
6s | games which is a very long title for |
8s | cosmic plumber um we're going to be |
10s | going over kind of eve online |
13s | the last 20 years of development what |
15s | that looks like what we had to work on |
17s | to kind of modernize the eve online |
20s | tech stack at least from the perspective |
21s | of the server side and the network |
23s | portions of the client side |
25s | um |
27s | and we'll go over more of what the |
29s | topology is in the evolution of that |
31s | topology kind of how you've originally |
33s | started and how it's gone from |
35s | uh |
36s | from 2003 to what we have now |
38s | and |
39s | we're going to talk about how we tried |
41s | to fundamentally or are fundamentally |
43s | changing how we actually work on eve |
45s | online |
46s | there's multiple pieces to that but the |
48s | two biggest ones being a technical |
49s | aspect and a cultural aspect |
51s | and the cultural aspect is a pretty big |
53s | part of it |
55s | and hopefully |
57s | we don't have to go too fast here |
58s | because right after this we have a round |
59s | table um but the round table is more for |
62s | anything else so any kind of quasar |
64s | specific stuff we can talk about here |
65s | hopefully at the end of this if there's |
67s | time for questions uh and then |
70s | afterwards in the round type we can talk |
71s | more about other things like quasar and |
73s | easy and how they interact and what |
74s | makes sense there and the any kind of |
77s | other technology that we're using on the |
78s | server side |
80s | so we can start with |
81s | 20 years of eve development uh it was |
84s | released in 2003. you guys all know this |
88s | right now there are roughly over 2 |
90s | million changeless in perforce |
92s | that number is probably growing faster |
94s | and faster as we add more and more |
96s | automation into the ecosystem so there's |
98s | less and less humans actually making |
99s | changes to the code base |
102s | and |
102s | we've kind of added a little reference |
105s | of how much code there is and i've added |
107s | this silly reference of the ue4s so if |
110s | you take the code base of unreal engine |
112s | 4 |
112s | you can kind of get an idea of how much |
115s | of that code is is being used there it |
117s | means absolutely nothing it's just fun |
118s | to think about |
119s | um |
121s | and so if we think about like the cec |
122s | plus plus that's where a lot of the |
124s | rendering code is that's where the |
126s | simulation code is |
127s | and that's a lot of where the the glue |
129s | is from uh |
131s | c to python marshalling and back and |
133s | forth |
134s | so that's roughly 1.7 million lines of |
137s | code |
138s | then the next up would be sql so a lot |
141s | of eve is run by basically sql procs a |
144s | lot of the logic is |
146s | unfortunately um |
149s | and so we can see that |
150s | just our sql code alone is the size of |
153s | ue4 |
156s | and then if we continue on |
158s | where a bulk of the logic was written uh |
161s | in python in stackless python |
163s | um we can see that there's 3.4 million |
165s | lines of code a little bit more than a |
167s | ue4 |
169s | and then if you weren't worried enough |
170s | yet |
171s | we have roughly |
173s | 53 million lines of yaml |
176s | um |
177s | this is 24 unreal engines uh worth of |
180s | code code's a strong word but um |
185s | this this is what holds everything to do |
187s | with uh how the universe is authored any |
190s | anywhere from how the spaceships are |
192s | made uh and authored as far the |
194s | attributes are concerned and there's a |
196s | lot of work done there and and this this |
198s | would have looked a lot differently uh |
200s | from |
201s | probably about three years ago i think |
203s | where we had a team internally that was |
205s | jamming on moving all of that out of sql |
208s | into a binary file system that we could |
210s | work with an author with that way we |
211s | didn't have to deal with things like |
213s | promoting between databases and instead |
215s | the files traveled with the actual |
217s | branch itself |
219s | so that's just a little bit of idea of |
220s | kind of |
221s | the momentum that is eve online and what |
224s | we have to deal with when we want to |
226s | make a foundational change to how the |
228s | system works |
230s | and so looking at the beginning here |
232s | it is deceptively simple |
234s | uh most of you guys know about this i've |
236s | seen this in a different form |
238s | uh we have the concept of soul nodes |
242s | that is part of the monolithic code base |
243s | that we have soul nodes can take on any |
247s | role as it were it could be servicing |
250s | corporation requests alliance requests |
253s | wallet uh the actual literal location of |
256s | the solar system the simulation within |
257s | that so we can swap in and out of those |
259s | roles and a lot of the orchestration |
260s | usually happens on the soul node level |
262s | uh then we also have the proxies which |
264s | kind of dedicated to |
267s | managing the connections coming into the |
269s | system and that's important because |
271s | ultimately that wasn't the like |
273s | that wasn't the original version of this |
276s | and and we have some other ccps here who |
279s | actually just rejoined that were in the |
280s | original team that worked on this they |
281s | can probably pick this apart a little |
282s | bit more but proxies didn't originally |
284s | exist they were built out of necessity |
287s | basically as soon as people went over |
288s | 100 people on eve they were like we have |
290s | to do something about how this connects |
292s | because ultimately |
295s | this represents kind of the the topology |
297s | of carbon io so everything is |
300s | is a mesh network so it's a guaranteed |
302s | one-hop mesh network which becomes a |
304s | quadratic problem almost immediately |
307s | when you're trying to deal with resource |
308s | management and those kind of things but |
310s | it's very powerful in the sense that a |
312s | lot of how eve works is about |
313s | deterministic routing |
316s | that's contrary to modern technologies |
318s | around things like rabbitmq or nats or |
321s | kafka those types of things where you |
322s | have more dynamic routing slightly |
324s | different but |
325s | eve does a lot of things where it knows |
327s | your character id |
328s | and by virtue of knowing your character |
330s | id it doesn't need to ask the ecosystem |
332s | where to go it can make a very educated |
335s | guess and get to the right node for the |
337s | right information and that's very |
338s | powerful early on |
340s | and all of this was kind of |
343s | tackled around |
345s | carbon io which is kind of the glue for |
346s | all of it at this point in time |
348s | and this is a homebrew protocol written |
350s | in in python so a lot of the networking |
352s | calls are all pure python and that's |
354s | where all the traffic is going and being |
355s | shaped and those kind of things |
357s | and that goes back into the desktop |
358s | client over the internet and those types |
359s | of things |
360s | another big part of this is the |
362s | combination of this with with i o |
364s | completion ports |
365s | and io completion ports are important in |
367s | this regard |
368s | because with stackless python stackless |
370s | python only does one thing at a time |
373s | and it's something that i have to remind |
374s | every engineer it can only do one thing |
377s | at a time |
378s | engineers try to do modern techniques of |
380s | like distributed locks or mutexes |
382s | whatever the case may be but it doesn't |
383s | actually matter stackless python only |
385s | does one thing at a time the other |
387s | terrifying part is that it's not only on |
388s | a single core it's on a single logical |
390s | processor |
392s | so we're reinforcing fleet fight nodes |
394s | for example we're basically just trying |
396s | to throw as much clock speed at that |
398s | node as possible it doesn't even |
400s | actually matter how many cores we put on |
401s | it |
402s | and that's kind of one of the ultimate |
403s | limiting factors that had us start the |
405s | conversation about |
406s | what became the idea behind quasar and |
409s | how we start teasing that problem apart |
412s | um but i o completion ports are also |
415s | important in the sense that it it it |
417s | makes a step towards |
419s | deferring the management of those |
421s | sockets from python because the more |
423s | stuff that we get out of python the more |
425s | one thing at a time that we can do |
427s | um and so that defers to a kernel and |
430s | auto completion force is a is a nifty |
432s | trick it's similar to um |
435s | polling uh socket polling in linux if |
437s | you're familiar with that um there's a |
439s | push pull paradigm asymmetry there um |
443s | but that's how it works on windows oh by |
444s | the way all of this is on windows |
446s | um |
451s | um and so |
453s | as this evolved and and these players |
456s | grew and more and more things started |
458s | being built inside of this is that part |
460s | of like the python that we were talking |
461s | about grew and grew and grew under a |
462s | nice little folder called script which |
465s | basically became a large portion |
467s | of of the of the actual code base |
470s | what would end up happening is during |
471s | releases |
473s | um people started noticing that like |
475s | forms would go down or web pages would |
477s | go down like i think the eve wiki at |
479s | that time would would go down |
481s | because every release would publish |
483s | information about the details of eve so |
485s | i think back then we didn't have the sde |
487s | and we didn't have any of the apis |
489s | and so the reflex to that was to build |
491s | the xml api they're like hey this is |
494s | this is taking down our web servers and |
495s | that's bad because when we do releases |
497s | we want everybody to see all the new |
498s | information everything we're doing |
500s | so they built up the xml apis to kind of |
502s | protect from that |
503s | and what i |
505s | i have i haven't really found the exact |
507s | person or the exact reason this came |
509s | into existence that's the original |
511s | reason for it but i don't think people |
513s | realized at that point in time that what |
515s | they were doing was effectively making |
516s | one of the biggest retention mechanisms |
519s | in eve because it allowed you guys to |
521s | build on top of that then things like |
523s | evmon were born |
525s | importing things through like eft and |
528s | all those kind of things when we had all |
529s | that static data but |
531s | ultimately |
533s | this was all done xml over http um and |
537s | it was it was basically a read cache |
539s | nothing nothing fancy there and i think |
540s | if anybody remembers the xml api the |
542s | cache timers on that were horrendous |
545s | it was i think multiple hours on the |
547s | actual liveness of the data |
551s | and so |
553s | this got into things like uh |
555s | managing skill plans through through |
557s | even |
558s | and and you can kind of see an echo of |
560s | that with skill plans in the game right |
562s | now which oddly enough is connected to |
563s | quasar so |
565s | it's |
566s | serendipitous that one of the first |
568s | full features that was 100 on quasar is |
572s | actually the same third party developed |
575s | feature that was built outside with the |
578s | original xml api |
581s | and so as this kept growing we kept |
584s | adding more and more things to the |
585s | ecosystem we started getting |
588s | oauth 2 because then we had added a |
590s | launcher and we needed other websites to |
591s | federate with other information that |
592s | might be there |
594s | and so you know then we started getting |
596s | oauth 2 over http and that ecosystem |
598s | started to grow |
600s | and i've kind of simplified this so that |
601s | monolith services kind of represents |
603s | souls and proxies that's the the og |
605s | cluster if you will all services started |
607s | growing into a suite of dot-net |
610s | applications um for managing |
614s | uh payment information or various other |
616s | things in this suite of uh of services |
618s | was actually |
620s | technically speaking the first |
622s | external service to the original cluster |
625s | and so they started dealing with a lot |
627s | of the problems without understanding |
629s | or not they understood it without really |
632s | knowing about kind of the paradigm that |
634s | a lot of people talk about today when |
635s | people talk about |
637s | monoliths to microservices like that is |
638s | a thing that almost everyone has heard |
640s | something about at this point in time |
643s | only a handful of people knew about this |
645s | or had completed it right the entire |
647s | world was at this point in time dealing |
650s | with this problem and nobody had a name |
651s | for it yet |
653s | and so this was roughly the first |
654s | cluster of services that we had live |
656s | outside the actual eve ecosystem in |
658s | their own sustained way and |
661s | also to note all of this stayed inside |
663s | of our data center so this was still |
664s | inside of |
666s | basically metal boxes right next to each |
667s | other |
669s | but the problem with those is and this |
671s | is one of the problems with people that |
673s | implement micro services |
675s | is that ultimately like yeah we got a |
676s | bunch of micro services and they're all |
677s | connected to the same database oh that's |
679s | not how that's supposed to work |
681s | like you've just created another |
682s | monolith for your database right |
684s | um and this is ultimately the trap they |
686s | fell in right |
687s | that was one of the tricky things |
692s | and so |
694s | as we started looking more into kind of |
696s | like the surface area of how these |
698s | things were growing |
700s | this then came on to introducing crest |
703s | and my history here is a little blurry |
706s | because i was actually coming into ccp |
708s | when dust was winding down |
710s | um but ultimately crest was built |
712s | because of the concept of dust |
714s | ultimately the the orbital bombardment |
716s | was done through a crest call because |
718s | that went from the psn |
719s | uh uh network into our network and |
722s | that's what basically coordinated the |
724s | the orbitable margin strikes um |
727s | the interesting thing about crest is |
729s | that it was then trying to adopt |
731s | mentalities at that point in time and in |
733s | this case it was very very academic |
735s | mentalities this was a hypermedia |
737s | restful json if anyone remembers what |
739s | that is it's basically a |
740s | self-documenting self-referencing api |
744s | and the idea behind that was that a |
746s | robot could then navigate the api and do |
749s | what it wanted to do |
752s | humans unfortunately were doing that not |
753s | robots |
754s | so that became a lot of paperwork just |
757s | to use the api and then on top of that |
759s | dealing with all the changes and the |
761s | breakages there |
762s | the implementation here got more |
764s | interesting those |
766s | those nodes for crest were effectively a |
768s | soul node with more logic on them and so |
771s | they were susceptible to the same |
773s | scaling problems of basically the one |
775s | hop mesh network and also implemented on |
778s | top of stackless python |
781s | so |
782s | it also paved the way for right |
784s | endpoints i think crest is the i can't |
786s | remember the first right endpoint it |
787s | might have been like autopilot |
789s | or eve mail i can't remember exactly the |
792s | first one |
793s | but this also paved the way internally |
795s | for this because this was a big deal |
798s | not many people cared and i used quotes |
800s | there because i'm not saying that nobody |
802s | cared about it but they weren't really |
803s | bothered by the fact that people were |
805s | scraping data and not taking down the |
806s | service sounds great when we started |
808s | building up crescent like hey we're |
809s | gonna allow players to |
811s | automatically affect things through the |
814s | api |
815s | and then it became an entire civil war |
817s | internally on like what that meant and |
819s | how we should go about doing that |
820s | obviously the cat's out of the bag but |
822s | at that point in time |
824s | it was more about |
826s | isolating it to the difference between |
828s | what we could affect in the universe and |
829s | what was localized to the player |
832s | so that still stays to this day there's |
834s | not anything you can do even in esi |
837s | where you can affect the universe |
840s | it has the illusion of that but you |
842s | don't actually affect the universe until |
843s | we introduce things like |
845s | actually manipulating market calls that |
847s | affect inventory then things start |
849s | affecting the universe but most |
850s | everything if i remember correctly |
852s | is about endpoints that can only affect |
854s | the state of your character like |
856s | autopilot contacts eve mail |
860s | i can't remember all of them um |
863s | and so this set that precedence there |
866s | and this kind of introduced yet another |
867s | point that we were |
869s | building on and so |
871s | the problem with this growing surface |
873s | area |
874s | ultimately became performance like we |
876s | were talking about this is all built on |
877s | top of stackless python we could only |
879s | scale vertically not really horizontally |
882s | because the more nodes that we scaled |
884s | horizontally the less connections we |
886s | could deal with up front and that became |
888s | a problem that equation basically didn't |
889s | work out i think when we did the math on |
892s | that originally that number came to |
893s | around a hundred thousand |
895s | um |
896s | now we haven't been to that number yet |
898s | um but uh ultimately that was the the |
901s | proponent of that like that was what was |
904s | powering those decisions |
906s | um |
908s | and this gets into cyclist python oh the |
910s | gill right so python in general rather |
912s | stackless or not doesn't matter has what |
914s | they call a global interpreter lock this |
916s | is what forces it to do the one thing at |
917s | a time but it also makes it very |
919s | powerful in the sense that you don't |
921s | have the complications of any |
923s | concurrency paradigms or primitives that |
925s | you then have to coordinate there's no |
926s | synchronization ultimately because it's |
928s | only doing one thing at a time |
932s | the database was also a problem here |
935s | um |
936s | a big part of why we implemented a lot |
938s | of the tools that we have today is |
940s | because when we started introducing easy |
943s | it basically became the scapegoat for |
945s | any problem that came up at any point in |
947s | time |
948s | to the point where i had a tally board |
949s | in the office of not easy or easy |
954s | and so the database becomes a bottleneck |
955s | for this because ultimately it's the |
956s | same problem |
958s | we have all of this concurrency |
960s | happening at a single location that can |
962s | only scale up to a certain degree and |
964s | that's that's why you read all the dev |
965s | blogs that we have even the recent one |
967s | about the hardware upgrades where we |
969s | literally have to throw metal at it to |
971s | solve some of those problems because the |
973s | complexity or the density of the actual |
975s | operations being done in the database |
976s | can only be mitigated by |
979s | faster light in this case |
983s | maintenance uh is another big one here |
987s | in order to change anything |
989s | in xml api that was great because it was |
990s | a standalone service we didn't have to |
992s | worry about |
993s | tranquility going up or down however |
996s | it had the side effect of if anyone |
998s | changed anything in the database the xml |
1001s | api didn't know about it so there was a |
1003s | lot of thrashing in the sense that |
1005s | endpoints would go down and break and |
1007s | various other things would mismatch with |
1009s | certain attributes or whatever the case |
1010s | may be we still have this problem with |
1012s | easy right now in various different |
1014s | places that we're still combating but in |
1015s | a different way |
1017s | um |
1018s | this gets into deployments um |
1020s | crest could not be affected unless we |
1022s | change like brought down tq ultimately |
1024s | uh that's one of the other big pieces |
1026s | about what we're modernizing it |
1028s | um |
1029s | uniform criticality this gets back to |
1033s | what me and ccp tuxford were talking |
1034s | about in vegas and this this talk is |
1036s | basically |
1037s | a status update of the |
1040s | talk we had in vegas |
1041s | where we were talking about the concepts |
1043s | behind this and more the technology that |
1044s | we're using |
1045s | and the developer experience that we're |
1047s | targeting uh less about where we're at |
1049s | now |
1050s | and and ultimately what the cultural |
1052s | changes need to be to achieve that |
1054s | um and uniform criticality in this sense |
1056s | means that |
1058s | everything is priority one |
1060s | and that's a problem |
1062s | hey email's not working |
1063s | well email could not work to a point |
1065s | where it starts cascading failures |
1067s | inside the cluster |
1069s | well now email is definitely priority |
1071s | one but that's the silliest thing to |
1073s | have as priority one we would rather |
1074s | just turn off email instead and deal |
1077s | with that problem and then turn it back |
1079s | on when it's ready to go oh |
1081s | unfortunately eve's not built in this |
1082s | way however our teams have become |
1084s | exceedingly efficient |
1086s | at |
1087s | building and working in this way but |
1089s | that is an immense slowdown into how |
1091s | they build into what they can |
1093s | build and then we get into the |
1094s | development aspect of this |
1097s | domain boundaries became a huge part of |
1099s | what we started talking about because |
1100s | ultimately |
1102s | when you start building something on top |
1103s | of a soul node domain boundaries |
1105s | instantly get blurry and if i could |
1109s | oversimplify what has happened over the |
1111s | last 20 years with the eve code base |
1113s | you combine things with |
1116s | a dynamic language like python |
1119s | maybe some of my little personal biases |
1120s | there you have the same you have the |
1122s | same database you have a single |
1123s | deployment mechanism and what happens |
1125s | over time is it doesn't matter how well |
1127s | you organize or build the code base |
1129s | because that's the thing about eve is |
1131s | all the core components are are well |
1133s | designed in the sense that they're like |
1135s | what people used to call service |
1136s | oriented architecture which is now |
1138s | microservices we've been doing the same |
1140s | thing since the 70s everybody just keeps |
1141s | calling it a different thing |
1143s | [Laughter] |
1146s | we found the old guy |
1151s | and so |
1152s | ultimately it didn't allow you to |
1153s | actually build those boundaries |
1155s | everything kind of blurred together and |
1157s | it became this thread that you had to |
1158s | pull at which caused all these side |
1160s | effects |
1161s | these side effects that no one did |
1163s | intentionally it's just kind of how they |
1164s | happen because if you can't isolate the |
1166s | domains that you're actually working on |
1168s | you can't really take responsibility for |
1170s | just that piece |
1171s | i mean we can talk about how many |
1173s | different types of mission systems that |
1174s | we have in eve online |
1176s | that's because when you go to look at |
1177s | them like i'm going to add this thing |
1178s | and you look at it and go nope i'm not |
1180s | touching that |
1181s | because it's connected to so many other |
1183s | things and it's almost always easier to |
1186s | build something in a separate corner and |
1187s | then some other pieces build on that |
1189s | eventually tentacles come out of it and |
1190s | everything gets woven together right and |
1193s | keep in mind this is over the course of |
1194s | 20 years right like this is not |
1196s | something somebody went you know what |
1197s | i'm going to do i'm going to connect |
1198s | every domain into a massive monolith |
1200s | that's not what people were doing |
1202s | it's much a natural evolution of things |
1206s | and then this gets into data ownership |
1208s | and this goes back into the database and |
1209s | how the database works |
1211s | when we have things like hot tables or |
1213s | poorly planned queries |
1215s | it's because of some other services and |
1217s | what they might be querying about that |
1218s | information |
1220s | but that's kind of broken because they |
1221s | shouldn't be sharing that that shouldn't |
1222s | be part of the problem |
1224s | and data leaking out or being consumed |
1226s | by anything else should be a problem |
1227s | either like for example |
1229s | when we have the |
1231s | the off services they were actually |
1232s | dipping into the same db and connecting |
1234s | character information and user |
1235s | information |
1236s | well that meant that anybody that wanted |
1237s | to do anything crazy with characters |
1239s | couldn't because now it affected another |
1240s | system that they didn't have any actual |
1243s | agency over |
1244s | and this is why now you're seeing more |
1246s | and more changes come through like with |
1248s | what ccp nomad was saying earlier about |
1250s | like the skills and what we want to |
1252s | change in effect there |
1253s | there's we're trying to make it easier |
1255s | to define those boundaries so that |
1257s | people can make more surgical |
1259s | foundational changes instead of just |
1261s | kind of adding on and trying to sidestep |
1264s | what's already there |
1266s | and then we get to the cognitive load |
1268s | which is more about how the developer is |
1270s | working |
1271s | and this is what i mean by |
1273s | we've conditioned our engineers |
1275s | to keep all of this in their head |
1278s | when they work on anything |
1280s | and if any of you have worked at the |
1281s | different like worked on a project that |
1283s | has automated testing versus doesn't |
1285s | have automated testing there's a |
1287s | radically different mental experience |
1289s | and motivating factor if it doesn't have |
1291s | automated testing |
1292s | i'm not incentivized to make it better |
1296s | if it does have automated testing i'm |
1297s | more incentivized to make it better and |
1299s | even make broader sweeping changes |
1301s | and going back to the millions of lines |
1303s | of code that we talked about earlier |
1305s | there's a lot of missing automated |
1306s | testing in that but that's also |
1307s | something that we're working on right |
1308s | but it's still to create a system or |
1310s | connect those systems or |
1312s | make them manifest in the |
1314s | the experience or or |
1317s | i regret saying this word but like |
1318s | illusion of gameplay right because |
1321s | that's what we're kind of getting at |
1322s | right we're we're all living in this |
1324s | wonderful fantasy of flying a spaceship |
1325s | and those kind of things but to make |
1326s | that really connect you have to then |
1329s | deal with all of these other pieces when |
1332s | it really should just be hey let's |
1334s | change the way the spaceship flies |
1341s | so |
1344s | this kind of led us all of these |
1346s | different pieces kind of led us into |
1347s | what we were talking about |
1348s | with the original idea of quasar we |
1350s | didn't we didn't know we were going to |
1351s | build quasar by the way this was kind of |
1353s | an evolution of how things went |
1356s | um |
1358s | ultimately the origin was the eve |
1359s | swagger interface or the open api |
1362s | implementation |
1364s | and |
1365s | when we started working on that the the |
1367s | vehicle for that was the actual mobile |
1369s | application |
1370s | uh the eve companion app or eve portal |
1372s | right |
1374s | and when eve portal came out it was |
1375s | mostly the fear |
1378s | was that number one we had all of these |
1379s | new devices that would come online that |
1381s | weren't necessarily connected to eve and |
1382s | it was much easier to connect to all of |
1383s | this so we needed a way to protect the |
1384s | cluster which meant we couldn't scale |
1387s | horizontally with crest because that |
1388s | would heat up resources and it |
1390s | definitely wasn't going to be an xml api |
1392s | because |
1393s | xml um |
1395s | and so |
1397s | we kind of |
1399s | discovered |
1401s | a way to sidestep |
1403s | decades of technical debt by introducing |
1406s | a message bus and that's kind of the |
1407s | core piece of where quasar started and |
1410s | we didn't know this yet but |
1411s | ultimately you know if we look at you |
1413s | know going back to talking about eve's |
1415s | original design like at the core of it |
1417s | how it's designed there's roughly about |
1420s | if you if you |
1421s | so i'm trying to speak about this in the |
1423s | sense of like a restful api but if you |
1425s | take the core monolith of eve and try to |
1427s | actually dissect what's going on there |
1429s | there's roughly about a little over 300 |
1431s | services internally to just the python |
1433s | code base it's talking to itself |
1435s | this is roughly 6 000 endpoints compared |
1438s | to the 190 that we have for easy and |
1441s | that's just to power everything that you |
1443s | see in the actual eve client |
1445s | and that's hard to keep track of |
1448s | when you don't have anything |
1450s | dictating what the domain boundaries are |
1452s | what the data ownership is |
1454s | and so that was the big reason why we |
1455s | chose things like swagger spec which |
1457s | eventually became open api because we |
1459s | wanted people to be able to have the |
1460s | conversation about |
1462s | what is it that you actually own what |
1463s | are you building against what's your |
1465s | contract that you're going to maintain |
1466s | for everyone else |
1472s | then ultimately we got into |
1474s | kubernetes in the cloud space with this |
1477s | um |
1478s | what ended up happening was |
1481s | we were trying to build things against |
1483s | our data center against heart like we |
1485s | at one point in time we were like pixie |
1487s | booting machines into ibm blade centers |
1489s | and running cube before v1 |
1492s | um it worked it worked but it was not |
1496s | sustainable unfortunately |
1497s | um and then we kind of just one clicked |
1500s | into gke uh inside of google cloud |
1503s | that's where we kind of started our |
1504s | journey with with kubernetes |
1507s | um and that just allowed us to provision |
1510s | resources that we would never have |
1511s | access to |
1512s | to wield a lot of power that we would |
1514s | never have access to with |
1516s | things like |
1517s | we wouldn't have to worry about the link |
1519s | speed |
1520s | of what's coming into our data center uh |
1522s | versus just making a load balancer and |
1524s | everything coming in we eventually |
1525s | landed on amazon for various other |
1528s | reasons but |
1529s | that's kind of the journey that took us |
1531s | there |
1532s | and then ultimately the message bus was |
1533s | the core piece of this |
1535s | and |
1536s | we chose the message bus over a service |
1539s | service mesh architecture because the |
1541s | ideas that we had about how this would |
1543s | evolve |
1546s | up front a service mesh is very |
1549s | difficult to get all the right tooling |
1551s | in place to help people debug and |
1553s | maintain whereas a message bus gives you |
1555s | a bottleneck which seems |
1557s | counterintuitive in the grand scheme of |
1558s | things but gives you a dedicated |
1560s | bottleneck to own all the pieces that |
1562s | are flowing through there and allows |
1563s | your like your upfront cost as far as |
1565s | getting other teams on board |
1567s | to go faster sooner um so this is the |
1571s | distinction that we made originally this |
1572s | is also while the world was still |
1573s | figuring out things like istio |
1575s | linker d |
1577s | envoy ambassador all the other cool |
1579s | things that are out there now |
1582s | and we still we still talk about this |
1583s | heavily because we're now to the point |
1585s | where we're emulating a service mesh to |
1587s | a degree |
1588s | but we ultimately wanted the teams to |
1590s | not have to worry about what the ingress |
1592s | looked like we only wanted the teams to |
1594s | worry about their domain |
1595s | and the data they owned so how do we |
1598s | make it so that they only care about |
1599s | inputs and outputs that was our primary |
1601s | goal |
1603s | this led us to protobuf |
1605s | after |
1606s | doing everything in eve portal and with |
1609s | esi and there's still esi endpoints that |
1610s | do things through json sorry let me be |
1612s | clear |
1613s | all of you guys see json on the back end |
1616s | we see some endpoints that are doing |
1618s | protobuf and we see some endpoints that |
1619s | are doing json |
1621s | when we started to build and just |
1622s | basically blitz through the easy spec |
1625s | and started building everything we built |
1626s | it all in json |
1628s | we learned real fast that was going to |
1630s | be a problem when we didn't have a |
1631s | schema to really deal with wrangling in |
1633s | all the data and that's kind of why we |
1635s | started looking at things like protobuf |
1637s | and we started looking at things like |
1638s | protobuf to deal with |
1640s | performance as well once we realized oh |
1642s | protobuf has this nice |
1643s | uh c-plus plus uh mechanism where you |
1646s | can generate native code that can also |
1648s | do the serialization for protobuf and |
1650s | what that means is |
1651s | we're basically moving everything from |
1654s | our do one thing at a time stackless |
1656s | python of writing down messages |
1658s | and then just throwing that memory at c |
1660s | plus plus and saying you do this instead |
1663s | while python can go do the next single |
1664s | thing it can do which is a huge |
1666s | performance benefit for us |
1669s | naturally |
1670s | this led us to grpc |
1673s | because ultimately when we started doing |
1674s | this we started connecting to this as a |
1676s | server everything was going great this |
1678s | is how we established like |
1679s | a lot of what you're seeing from the |
1681s | data teams a lot of the newer pipeline |
1683s | around the definitions of those events |
1685s | what's being basically fire hosed out of |
1687s | the system is coming through |
1689s | protobuf into the message bus ecosystem |
1692s | but ultimately when we started talking |
1693s | about this more and more we realized uh |
1695s | we need a way to actually connect |
1696s | between these systems what makes sense |
1698s | there we didn't want to maintain a |
1699s | protocol for this there was no point in |
1701s | that there were so many to pick from |
1703s | and protobuf became kind of the the |
1705s | anchor for this because it was just a |
1707s | hop skip and a jump away and we could |
1708s | generate grpc endpoints |
1710s | um |
1712s | then we realized oh we can put this in |
1713s | the client |
1714s | and that's where the idea of quasar |
1716s | started when we realized wait we can |
1718s | close the loop on the entire ecosystem |
1720s | and sidestep the entire legacy code base |
1724s | and keep everything inside of cube |
1726s | inside of golang instead of the message |
1727s | bus |
1728s | and not have to deal with anything |
1730s | that's going i mean we do have to deal |
1731s | with it all the time like |
1732s | it's not all sunshine and rainbows we |
1734s | still have to go in and make sure things |
1736s | connect and actually manifest in the |
1738s | universe |
1739s | the way they're supposed to |
1741s | and this then got us into domain |
1743s | services |
1744s | and |
1745s | i'm i i really don't like the word micro |
1748s | services number one because no one knows |
1750s | what it means uh but also number two it |
1752s | defines an arbitrary scope to what |
1755s | you're designing |
1757s | and this is why we talk about domain |
1758s | services because it ties it back to the |
1760s | actual data model like what are you |
1762s | actually building and what should you |
1764s | own |
1765s | an example of this and one of the first |
1767s | kind of domain servers that we built for |
1768s | eve was skill plans |
1770s | skill plans owns everything that it does |
1772s | and it never touches the monolith at all |
1775s | other than sending out by by proxy of |
1778s | sending out like other events of like |
1779s | hey they want to train this skill now |
1781s | from from the skill plans |
1783s | um even that might be debatable it might |
1785s | be going through the client point is all |
1787s | of that data that you're sharing with |
1789s | your corporation with all those skill |
1790s | plans all those different pieces |
1792s | that's all completely going through |
1794s | quasar |
1795s | and we have some other services before |
1796s | that where they're going through quasar |
1798s | the activity tracker is another one of |
1799s | them but it wasn't quite doing the same |
1801s | thing |
1802s | and we can kind of point that out here |
1805s | um |
1806s | oh wait a minute |
1809s | yes |
1811s | um |
1813s | so this is kind of where we're at now |
1815s | those are tiny words um |
1819s | yeah |
1820s | and so this kind of represents where |
1821s | quasar is in in kind of the cloud |
1823s | provider that we have uh we ultimately |
1825s | have a service gateway which is the |
1827s | first piece of this puzzle and that is |
1829s | our authoritative domains this is like |
1831s | if there's an event inside of this |
1833s | uh domain it is a fact of the universe a |
1836s | ship exploded uh this guy bought |
1838s | something on the market whatever the |
1840s | case may be and this is what's normally |
1841s | referred to as east-west traffic in the |
1844s | terms of kind of your network topology |
1846s | this is usually within owned |
1849s | networks uh for that for that company |
1851s | um |
1853s | and you can kind of see here where we |
1854s | introduced the mobile client all these |
1855s | other pieces that |
1856s | eventually got pieces of quasar it |
1858s | wasn't known as quasar at that point in |
1859s | time but eventually got in |
1861s | the public gateway then represents our |
1862s | north-south traffic uh which is |
1864s | basically anything that egresses or |
1866s | ingresses between controlled networks so |
1869s | basically your guys's machine versus our |
1871s | guys's machine |
1873s | and those we treat radically differently |
1875s | because if we |
1876s | emit an event on the service gateway |
1878s | it's a fact if your client emits an |
1881s | event it needs to be statistically |
1882s | significant |
1884s | and what i mean by this like when we're |
1886s | tracking like how people use certain |
1888s | things within the client like opening or |
1890s | closing windows or the case may be we |
1892s | can't trust any of that data it's coming |
1894s | from an untrusted source and you know |
1898s | clients get modified every now and then |
1900s | it seems so we have to take into account |
1902s | like what is true and what is not so |
1903s | they have to be statistically |
1904s | significant |
1907s | and this is the part where we started |
1909s | talking about internally |
1911s | where ultimately the desktop client |
1913s | isn't the only client and this started |
1916s | opening the door for how we talk about |
1918s | the future of |
1919s | of how eve works and what happens and |
1921s | then what we build |
1923s | um where we started talking about eve |
1924s | portal the websites the third-party apps |
1927s | that you guys are constantly building um |
1929s | all of those pieces it it means that you |
1931s | could play potentially play eve |
1934s | from more than just the desktop client |
1937s | so part of this was proprietary to |
1939s | standards we're talking about things |
1940s | like our original like carbon io that |
1943s | proprietary python |
1946s | protocol going into things like protobuf |
1948s | grpc those kind of things over amqp or |
1950s | google pub sub or nats or whatever the |
1952s | case may be i skipped ahead that's the |
1954s | message bus one |
1956s | but that ecosystem is also the big part |
1958s | of this when we talk about things like |
1960s | what ccp no man was talking about with |
1962s | the air career program right |
1965s | all of those pieces that we're doing |
1966s | there aren't specifically for the air |
1968s | career program right now they are |
1971s | but all of those extra events all of |
1972s | those things that are being tracked |
1973s | those are all pieces that we can reuse |
1975s | within that ecosystem so the more and |
1977s | more pieces that we have this is kind of |
1978s | the original uh |
1980s | ignition of of the activity tracker |
1982s | while the activity tracker didn't act on |
1984s | these things it still tracked all of |
1985s | them and then it gave us all of this |
1987s | extra information on how to react and |
1989s | how to build upon that information |
1991s | that's already been throwing around the |
1993s | the uh the ecosystem |
1995s | and and we prototyped these a while back |
1996s | i think |
1997s | i think there was even a fan fest where |
1999s | we put up arbitrary data or that |
2001s | arbitrary data we put a prototype data |
2003s | on a kill mail system and everybody lost |
2005s | their minds over logic or logic info on |
2007s | the uh on the kill mails um |
2010s | that is also something that we're |
2011s | looking to do and proceed to but like |
2013s | that's part of this evolution and part |
2014s | of the performance pieces that we're |
2016s | talking about here |
2018s | this also gave us a ubiquitous language |
2020s | this was one of the biggest problems |
2022s | that we had internally |
2023s | you could build a service or any of |
2025s | those pieces and go to a separate team |
2027s | and then go look at that and go i can't |
2028s | use that when they really could but |
2030s | there was no ubiquitous language to |
2031s | communicate that so protobuf gives us |
2034s | that ubiquitous language in the entire |
2035s | ecosystem where we can go and say hey |
2037s | i'm going to make a call here and that |
2039s | service doesn't care who it is or what |
2041s | it's for it doesn't have to care about |
2043s | something inside of that python module |
2045s | mutating it to something else or |
2047s | changing something that shouldn't need |
2048s | to or somebody else deploys a different |
2049s | version of that our teams are now |
2051s | building around the concept of you own |
2053s | this api you need to keep this api |
2055s | working and if we want to change that |
2057s | that's a conversation around the actual |
2059s | language and the domain which then gets |
2061s | us to our domain services |
2063s | which is the piece more around |
2065s | what do you own |
2067s | what do you iterate on those kind of |
2068s | things an example of this is skill plans |
2070s | again |
2071s | where |
2073s | we were talking about modifying how |
2076s | skills work where it's not no longer a |
2078s | cue you're dumping skill points into |
2080s | uh like you're accruing skill points and |
2083s | then you do with those what you want you |
2084s | don't have to actually plan that out |
2086s | and the evolution of skill plans might |
2088s | be |
2089s | that it just becomes the domain service |
2090s | for skills that might be the natural |
2092s | evolution of that |
2094s | we've yet to see that because we're |
2095s | still learning these pieces and again |
2097s | these are the services that are still |
2099s | kind of the first ones of their kind |
2101s | inside of quasar |
2104s | so ultimately kind of what did we learn |
2105s | from this |
2106s | it gets into the micro versus domain |
2108s | um |
2109s | and this kind of gets into |
2111s | the delineation be |
2113s | because like |
2115s | the the biggest problem like if you |
2117s | think about it |
2118s | abstractly when you have a game engine |
2120s | involved in anything |
2122s | that is instantly a monolith the client |
2123s | for that for that game is a monolith |
2125s | there's not much you can do around that |
2127s | there's a lot of talks around that you |
2129s | hear about micro uis or micro front ends |
2132s | or those kind of things that might be |
2133s | the next evolution that we'll see |
2136s | but this is basically the difference |
2137s | between what we like we couldn't use any |
2139s | of these technologies in eve because |
2141s | all of those things were detached like |
2142s | if you use spotify the little bar at the |
2144s | bottom of it was its own |
2146s | http call that went to a separate |
2147s | service whereas an eve that connects to |
2150s | the proxy which goes with the soul nodes |
2151s | routes over information goes to the same |
2152s | database and comes back through |
2154s | everything was connected right and so |
2156s | that's the big difference for us and we |
2157s | want to concentrate on the domains uh |
2159s | not the individual mechanisms |
2162s | and then learning the difference between |
2164s | a message bus and a service mesh |
2166s | kind of getting to the nuances of |
2168s | dealing with connectivity ingress how |
2170s | players connected and kind of getting |
2173s | that off the table so our devs could |
2175s | concentrate on other things |
2176s | and then getting to api |
2178s | representing the team boundary |
2180s | not like i you know we kind of have |
2182s | evolved from this uh building features |
2185s | in the sense of i need to build all of |
2187s | these pieces because i need all the |
2188s | pieces of this feature in order to make |
2189s | this thing which the side effect over |
2191s | that over time is you have a lot of |
2193s | things that are very similar and you |
2195s | don't evolve the existing ones |
2197s | as opposed to we own the api for |
2199s | characters do you need more data need to |
2201s | change the way that something works then |
2203s | there's a team that can have a |
2204s | conversation with that and usually |
2205s | that's over a pr over protobuf |
2208s | and ultimately |
2210s | new technology is easy culture is not |
2212s | and i and i say that it's a relative |
2214s | statement like there's a lot of complex |
2216s | things that we're doing with technology |
2218s | but the thing that surprised us the most |
2220s | was kind of people's reaction to that |
2221s | new technology |
2223s | some people jumped right in other people |
2224s | it kind of reflected some deficiencies |
2226s | that we had and kind of the processes |
2228s | that we were doing were again going back |
2230s | to automated testing where we were |
2231s | pulling people into the spotlight of |
2233s | like cool where's your test and i'm |
2234s | going i don't have any you can't you |
2236s | have to add those things in this |
2238s | ecosystem and so evolving that culture |
2241s | to understand like what the progress of |
2242s | those types of things would be |
2245s | and this is the question i've gotten on |
2247s | different podcasts and streams that i've |
2248s | talked on |
2250s | why why concentrate on this why not |
2252s | build more features when i do all of |
2253s | this like |
2255s | this is a holistic approach to how we |
2258s | need to fundamentally fix a lot of |
2260s | different things in the ecosystem over |
2262s | time over 20 years of teams isolating |
2265s | and what the features that they only |
2266s | need to build |
2267s | and kind of the the turbulence and |
2269s | natural ups and downs of a company and |
2272s | people and people's lives in real life |
2274s | and those kind of things |
2275s | ultimately we need to fundamentally |
2277s | change |
2278s | how we're working and in order to do |
2280s | that we need to change the technology of |
2282s | what we're actually building upon |
2284s | because if we need to fundamentally we |
2286s | need to fundamentally change how eve |
2287s | works and we can't do that unless we |
2289s | change how we work |
2291s | um and so quasar is kind of the |
2292s | fundamental stepping stone uh that we're |
2295s | using to build more and maintain more of |
2297s | the the eve universe |
2300s | the end |
2301s | thanks |
2302s | [Applause] |
2313s | do we do questions here |
2317s | ah |
2319s | i couldn't tell if that was somebody who |
2320s | had the authority to say that or not |
2326s | yeah go ahead the old |
2329s | guys daddy |
2331s | [Laughter] |
2340s | ah |
2341s | yeah |
2343s | yeah so the question is like nadia the |
2344s | new graphical editor |
2346s | uh that's using to build a lot of the |
2348s | content is it using quasar it is not |
2350s | specifically using quasar because the |
2351s | majority of that is client-side |
2353s | experience mechanisms that's going on |
2355s | but it is hooked into a lot of the event |
2357s | loops that are flying into quasar so we |
2360s | can observe a lot of what's happening |
2362s | there and so as that team does more and |
2364s | more that's kind of outside a |
2366s | unique uh experience for a single player |
2370s | because that's ultimately like so far |
2372s | the np is that um once it starts it |
2375s | kind of going outside of that scope it |
2377s | will probably wander more into our |
2379s | territory as far as what we need to |
2380s | support |
2384s | you talked about uh |
2396s | i wouldn't say blameless |
2398s | no so i mean we so we try to do we try |
2401s | to do retros for that and i would argue |
2403s | that a lot of the the team that works on |
2404s | quasar and the infrastructure teams in |
2406s | general um |
2408s | there are |
2409s | elements of sre there |
2412s | where |
2413s | so we do we do rotations on call |
2415s | rotations but we kind of combine that |
2417s | with like if you're on call |
2419s | i'm not gonna care if you don't get your |
2421s | primary project done i want you to |
2422s | concentrate on like answering people's |
2424s | questions of course if there's alerts |
2425s | something melts down |
2426s | all those kind of things but if |
2428s | everything is quiet it's kind of one of |
2430s | those things of what's making the most |
2431s | noise make it stop making noise |
2433s | so we kind of have that sre mentality in |
2435s | that sense |
2437s | the other aspect of that is we're big |
2438s | fans of slos |
2440s | um and trying to keep track of those |
2442s | things and seeing things before they |
2444s | catch fire like being able to see the |
2445s | smoke before the fire is pretty powerful |
2448s | and that has to do with a lot of the |
2449s | tooling that we've introduced the |
2450s | ecosystem not only just quasar but the |
2452s | the original code base of people in line |
2453s | with things like |
2455s | sentry honeycomb grafana prometheus |
2458s | there's tons of stuff like that yeah |
2461s | so the |
2463s | simulation obviously still needs to run |
2471s | no thankfully not |
2473s | i checked |
2475s | so |
2477s | what like what |
2479s | is plan in terms of |
2481s | splitting out more and more of those |
2483s | non-simulation |
2485s | services into those domain services |
2487s | right so the question is is what is kind |
2489s | of the forward plan of simulation |
2493s | based services versus |
2495s | non-simulation-based services ultimately |
2498s | this was the original idea |
2501s | people don't necessarily agree with me |
2502s | on this but i don't |
2504s | usually call eve complex it's very dense |
2507s | there's just a lot there |
2508s | like if you look at email it's not |
2510s | complex |
2511s | but in the grand scheme of things it's |
2513s | it there's just a lot more going on what |
2515s | it can interact with and those types of |
2516s | things |
2517s | so the general idea was that we clear |
2519s | the table |
2520s | of all of these services so that people |
2522s | could think a bit bigger about what they |
2524s | could actually build and even that is |
2525s | kind of the core of what our team is for |
2527s | we're we're supposed to be a force |
2528s | multiplier for the developers and the |
2531s | more specifically the feature teams that |
2532s | are building all the stuff that you guys |
2534s | actually do on a daily basis like this |
2536s | is why i say cosmic plumber if you know |
2538s | that it's a problem then you don't ever |
2540s | think about the plumbing in your house |
2541s | until it's busted right |
2542s | um |
2544s | with the simulation pieces |
2546s | we have some theories about that that |
2547s | we're very interested in testing |
2549s | because when we talk about the |
2550s | performance piece of quasar that's kind |
2552s | of one of the one of the big pieces |
2553s | about grpc |
2555s | when we look at fleet fights |
2556s | like we're estimating around 30 percent |
2559s | of the performance there is spent on |
2561s | multiplexing serialization and |
2563s | transmission well iocp because it defers |
2566s | it to the kernel gets rid of a bit of |
2568s | that but it doesn't because it still |
2569s | needs to interact with the socket |
2571s | serialization is still in python which |
2573s | is very slow |
2575s | and multiplexing meaning |
2577s | 7000 people in a system one person goes |
2579s | bang and there's 6999 other messages |
2582s | that we need to send |
2583s | what we've done with quasar with grpc |
2586s | mechanisms in the server is that it's |
2588s | offloaded to a separate thread so |
2589s | basically because of the python to |
2593s | c-plus plus mechanism in protobuf that's |
2595s | just comes stock with it we just have to |
2597s | marshal |
2598s | memory over and then we have a separate |
2600s | thread |
2601s | crazy eve has a separate thread to do |
2604s | something where it's actually doing the |
2606s | serialization of the transmission so we |
2608s | get it for free |
2611s | and then we lean heavily on the message |
2613s | bus ecosystem which is where the dynamic |
2616s | mechanisms come in and there's a bigger |
2617s | conversation that we could have around |
2618s | like |
2619s | if you'd be surprised the the features |
2622s | that require the most complex routing |
2624s | mechanisms one of them that highlights |
2626s | this is shared bookmarks |
2628s | from a routing perspective that becomes |
2629s | a nightmare real fast |
2631s | and it's one of the few things that are |
2633s | actually implemented on the proxy side |
2634s | because it needs all of that information |
2637s | i don't answer your question |
2641s | i think more of it |
2645s | is that like would it make sense for a |
2647s | team to say you know what i want to get |
2648s | |
2653s | right |
2654s | so would we would we preemptively move |
2656s | things over into quasar um yes if |
2659s | there's a vehicle for it we're just not |
2662s | there to go and like we're gonna |
2663s | refactor everything |
2665s | no one's gonna sign up for that right um |
2669s | so it comes with a vehicle like what are |
2671s | we doing like for example a lot of the |
2673s | work that we've done under the hood for |
2675s | chat to get it out of xmpp in the |
2677s | current state that it's in |
2679s | is that behind the scenes we've had to |
2680s | build a present service |
2683s | which has to know where people are in |
2685s | eve at all times authoritatively which |
2688s | hilariously we found is very difficult |
2691s | um so but we need something to motivate |
2693s | those types of changes then we'll go |
2694s | back into them like if if the if the |
2696s | skills revamp starts touching on things |
2698s | like characters and there's enough |
2700s | traffic there we might want to pull |
2701s | characters into a service instead of |
2702s | quasar but that would be |
2705s | significant open-heart surgery right |
2707s | so what you're saying is |
2710s | our services |
2714s | yes go that way |
2719s | uh i saw a talk a few years ago |
2722s | similarly titled |
2723s | and uh |
2724s | the speaker talked about adopting |
2727s | my memory |
2730s | yeah was it did it have an orange bus |
2732s | icon in this in the |
2734s | yeah that was being tuxford in vegas |
2749s | oh i could ramble about this for a while |
2750s | but the short version of that is |
2751s | basically beam is cube before cube was a |
2754s | thing |
2755s | um and the only difference really is |
2757s | kind of the api |
2759s | and this is kind of the trend that i'm |
2761s | seeing in technology in general is that |
2763s | the implementation doesn't really matter |
2764s | it's the apis that matter so prometheus |
2766s | for example everyone loves the apis for |
2768s | prometheus and how to aggregate data and |
2770s | how to transmit remote data those kind |
2772s | of things everyone also hates the |
2774s | implementation of prometheus because it |
2776s | eats all the ram and most people don't |
2778s | take into consideration cardinality and |
2780s | those types of things so for things like |
2781s | erlang elixir and beam like that whole |
2783s | ecosystem is actually quite amazing but |
2785s | it's not compatible with anything |
2788s | current in that sense and it's also |
2790s | doesn't provide a good |
2792s | external control plane with kind of what |
2794s | the rest of the world is used to i think |
2795s | that's the big difference like we tried |
2797s | running you know |
2799s | uh beam inside of cube but doesn't make |
2800s | sense because beam wants to own the |
2802s | hardware and then it clusters itself and |
2804s | all those nice things but that it |
2805s | requires your entire ecosystem to be |
2807s | inside of erlang or elixir and that's |
2809s | kind of where the success of cube came |
2810s | from because it gave it gave everyone |
2812s | primitives to to have that ubiquitous |
2814s | language to have a conversation across |
2816s | basically the entire globe that's why it |
2818s | caught |
2829s | fire one system and i know there's been |
2832s | significant hardware upgrades |
2836s | so much as the technology needs to |
2838s | develop what have you guys done or what |
2841s | you guys think needs to be done to |
2849s | so how do |
2850s | i'm oversimplifying but how do we make |
2852s | things go faster with quasar |
2854s | um |
2854s | ultimately this kind of goes back to |
2856s | we're talking about earlier when we were |
2857s | talking about the the effects of |
2860s | transmission serialization multiplexing |
2861s | those types of things um |
2864s | what i'm trying to do and we're toying |
2866s | with and then playing with the idea of |
2868s | is |
2869s | sending simulation frames over quasar |
2871s | because we know one that's already |
2873s | significantly faster just over the wire |
2876s | it's significantly faster |
2879s | theoretically we know we can then free |
2881s | up 30 of the processing time during a |
2884s | massive fleet fight that that is our |
2885s | upper bounds of what we could |
2886s | potentially bring to the table but that |
2888s | is a |
2889s | non-trivial project |
2891s | and literally reassembling the train as |
2893s | it's going down the tracks um so |
2896s | we haven't engaged in any of this yet |
2898s | and again this comes with the clearing |
2899s | the table concept of like we'll keep |
2901s | moving things off the table which |
2903s | in effect will give us some certain |
2905s | percentage of there's less things that |
2906s | this needs to do |
2908s | but in the grand scheme of things you at |
2909s | the end of the day you still wind up |
2911s | with a node dedicated to jeta |
2913s | and and it doesn't matter how many |
2915s | services we take away at that point |
2916s | there's still a node dedicated to jita |
2918s | even if it's only for the uh simulation |
2921s | aspect of it |
2923s | um so that's kind of why we're toying |
2924s | with the ideas behind like we could send |
2926s | simulation frames over this and get a 30 |
2929s | bump in in how we're doing things and |
2931s | there might be some other things in |
2932s | there like the things that we've talked |
2933s | about in the past and this is all |
2934s | theoretical |
2936s | eve is 100 accurate |
2938s | but it doesn't necessarily need to be |
2939s | and i know that's a terrifying statement |
2941s | um |
2942s | because when you have seven thousand |
2944s | people shooting that one guy at some |
2945s | point time you gotta go he's dead like |
2947s | stop |
2948s | stop counting the bullets he's gone |
2951s | uh but eve keeps going yep still dead |
2954s | still still dead |
2955s | um so there's there's maybe some other |
2957s | like philosophical things that we could |
2958s | take on like how we deal with the rules |
2960s | engine and the simulation in that regard |
2962s | but this is all theory crafting because |
2964s | again it comes back to |
2966s | the the vehicle that we have to move |
2967s | forward with those things but i'm i am |
2970s | personally chomping at the beds to find |
2971s | something to hook that two that we can |
2973s | toy with that idea |
2974s | it might the first iteration of that |
2976s | might be something like we don't send |
2979s | like the data that comes in for the gate |
2980s | holograms |
2981s | like the state on the other side of the |
2983s | gate |
2984s | we might toy with the idea of routing |
2986s | that over quasar like start simple there |
2988s | instead kind of again the clear the |
2990s | table mentality of like how far deeply |
2992s | can we go into that that space |
3006s | right so what domain services have been |
3008s | put into quasar um |
3009s | so |
3010s | there is a chat service that we haven't |
3011s | rolled out yet it's kind of been a |
3013s | shadow service |
3014s | some players have already found that |
3017s | um |
3018s | and skill plans is another one activity |
3021s | tracker was the original one but |
3022s | activity tracker is not necessarily a |
3024s | 100 quasar service in the sense that |
3026s | uh it smuggles data through the original |
3029s | carbon io connections uh because it was |
3031s | built before we had the connectivity to |
3033s | the client |
3034s | so we're like oh we can consume and |
3035s | track all these events that are coming |
3036s | in but we can't tell anybody so we just |
3039s | sent it back through the server itself |
3040s | down to the client that's something that |
3041s | we could uh probably renovate or that |
3043s | will come with the changes that we're |
3045s | doing for the air career program um the |
3047s | air career program will be another one |
3049s | that's 100 uh quasar uh the one we're |
3051s | talking about earlier which isn't really |
3052s | player-facing but the |
3054s | um |
3056s | presence management which we normally |
3058s | we're doing in xmpp |
3060s | which |
3062s | fun fact |
3063s | 90 of the traffic in xmpp for us is |
3065s | presence not chat |
3067s | it's just telling everyone where they |
3069s | are that's that's the biggest |
3070s | multiplexing problem that we have |
3072s | um there's probably some other ones i'm |
3074s | forgetting but those are the |
3076s | i think uh |
3079s | data |
3081s | oh yeah for like the data pipelines for |
3084s | uh for data and analytics um |
3086s | do we still do the recommendation stuff |
3089s | yeah |
3090s | the recommendation like so the |
3091s | recommendations that you get the three |
3093s | recommendations that you get if you get |
3094s | into is that still feature flagged i |
3096s | can't remember |
3098s | it's for everybody so yeah so you those |
3099s | three recommendations that come in |
3100s | that's actually closing the loop from |
3102s | the client to quasar to |
3105s | the data cube if you will warehouse lake |
3108s | i don't know any of the data terms um |
3110s | and then that's coming back through |
3111s | quasar the client of saying hey you want |
3112s | to do one of these three things based on |
3114s | what you've been doing in eve |
3116s | uh i think that's it you know my guys in |
3119s | here that i can |
3120s | yeah we'll just stop there |
3122s | last question |
3124s | that guys |
3149s | right |
3150s | so the question is like if we have all |
3152s | these things emitting events and parts |
3153s | of them come down and go back up how do |
3155s | we deal with integrity in that regard so |
3158s | there are massive papers that you can |
3160s | read on that that are really boring but |
3161s | event sourcing is the answer to that |
3163s | question |
3164s | um |
3165s | a lot of how we deal with that is mostly |
3168s | a little bit more than best effort |
3170s | delivery and what i mean by that is best |
3172s | effort is usually like it's on the |
3173s | socket good luck um |
3175s | so we also have a little bit more than |
3177s | that where we do a lot of disk queuing |
3178s | and mechanisms for like publishing |
3180s | confirms with rabbitmq so we basically |
3182s | say hey rabbitmq send this to people and |
3184s | tell me when the first guy got it and if |
3187s | that doesn't happen it goes to disk and |
3188s | we retry so this usually manifests in |
3191s | that fail state as as a thundering herd |
3193s | or a stampede basically um where i think |
3197s | we've talked about this on twitter |
3198s | during certain interesting situations |
3200s | where it's like yep we're now draining |
3201s | 50 million events because something fell |
3204s | over |
3205s | but that's the big difference between |
3206s | like i was saying earlier with uh the |
3208s | the events in the universe that are |
3210s | facts |
3211s | those are the ones that we treat uh uh |
3214s | with more respect i guess if you will uh |
3216s | those are the ones that we trust if that |
3218s | thing comes through it's true whereas |
3219s | the events that are coming from the |
3220s | client you have to be statistically |
3222s | significant because if it falls over we |
3224s | don't care |
3227s | they do that's true yes yes they do yeah |
3234s | indeed slo is service level objective uh |
3237s | there's also slis which are indicators |
3240s | and these are different from alerts in |
3241s | general this gets into sre stuff but |
3244s | ultimately it's more about |
3246s | knowing that your system is trending |
3247s | poorly versus something terrible has |
3249s | already happened |
3251s | yeah i think you said that was the last |
3252s | one all right thanks guys appreciate it |
3254s | we also have a roundtable after this |