CCP_DeNormalized

CCP_DeNormalized



10 Mar

Comment

I’m told VXRail was looked into some years ago. At the time the idea of having to buy entire compute+storage nodes to expand didn’t make sense for our needs.

It would be very cool to see TQDB as a virtual sql cluster on one of these though - 4tb of mem and 8 tb of storage - spread over multiple compute nodes.

Comment

Love and hate is right :slight_smile:

A few years ago, just months before super devs fixed the our numa issues for good nojinx, we were evaluating new DB boxes and had 2 single socket servers in the line up for testing.

An Intel 24/48 core box and the EYPC 64/128 core. Due to whatever was going on within the code base at the time, we could not get the cluster to start with the single numa node. Blocking chains would tie up all the worker threads, max the CPU and threadpool waits would starve the simulation out - sols unable to heart beat. We so hated it…

The EYPC on the other hand made short work of it - was it the fact that it had 128 cores vs. the 48 on the Intel box? Or was it due to windows being unable to address all cores in a single numa node and instead split them into 2 numa nodes.

Numa to the ...

Read more

09 Mar

Comment

I mix up terms sometimes, so it depends on who you are talking to and the context :slight_smile:

SOL Server would typically be either the physical or virtual server that our code runs on. A sol node would be one instance of the application code running on the actual SOL Server

Comment

Typically we use SOL as a interchangeable term for solar system node - a windows OS based server that can run one or more in-game solar systems/services.

For TQ, all sols are physical servers, most test servers run sols as virtual machines

Comment

We’re planning a SQL only blog for a later date where we can share more info on the various configs we have going on.

The param sniffing stuff, have not gotten deep into that yet, but that’s basically where the db team is at now - sql 2022 testing and excitement - it’s got a ton of great features that’ll be useful not only to us dba’s but our data engineering team as well.


08 Mar

Comment

My current negotiations with Ops have us at a new DB VM with 512GB of ram to start with. We’ll see how it goes from there :slight_smile:

They don’t generally like it when I start a convo haha

We do have several virtual db’s in production though and always looking to migrate away from bare metal where it makes sense (one of the new vm’s is a remote AlwaysOn read-replica)

Comment

We have a few test servers running windows/sql 2022 - all running fine at the moment, but I do recall hearing some issues with a windows patch causing reboot loops


20 Apr

Comment

Hmm, looking back at my notes I don’t think we actually managed a full start!

We knew we could (we thought we could) if we disabled ESI to allow for a softer startup but I don’t think we went forward with that test.

sept 28, 2020 - * TQ failover testing on Intel28 - no luck - still fails after manual soft numa
sept 23, 2020 - * TQ Failover to Intel 28 - failure

Comment

I’ve not done any real in-depth checking into which is more stable.

They both seem fairly reliable though and I’d guess instability comes more from us changing things and bugs being fixed/introduced than actual underlining platform issues.

It’s difficult to baseline and compare things over time when the code base changes daily

Comment

we’re indeed heavily invested in stored procs - not to mention a few enterprise edition features of MSSQL - in particular table partitions for swapping out entire blocks of data for deletes.

Dev Teams are and have been using other database tech for their features for some years now however, so we’re far from ONLY using MS SQL.

There’s at least CosmosDB/PostGres and a few other cloud based managed DB services being used.

Horses for courses and such

Comment

Our next blog will focus on the software side, but you are right, it is Windows 2019 and it’s Standard.

I don’t think the DC version gives us anything unless its being used as a hypervisor host.

Comment

Next blog will focus on the software side of the database config with all that fun stuff.


19 Apr

Comment

Peak is around 15,000 while our average is around 8k.

Comment

We had another amd dual socket 8core cpu box in to play with but it ended up having weird hba issues so we had to rule it out.

There was also a single socket 28 core Intel box we tested - at the time EVE’s code base overran it mercilessly on cold starts. We could barely get the cluster started :slight_smile:


28 Sep

Comment

Hi all, last week and today we were testing a new Intel 28 core single socket server for the DB. It didn’t go well last week and today’s test was no better sadly. We’ve likely ruled this box out for now.

For the last few months we have been running on a monster amd server with a single socket and 128 logical cores (CPU 7742) - and we’re finalizing our test of other hardware. Looking to be all done this week!

For a very brief time last week, the TQDB cluster was made up of 5 servers with a combined resource total of:

280 logical CPU cores
8 TB of memory

:slight_smile:


29 Jun

Comment

Most of the DB is on SSD already, but we have some secondary files with low frequency data on a pool of spindle disks (100+ disks). More fun news on that to come as well!


27 Jun

Comment

We are planning on upgrading soon (tm! ha!) - And you are right, this is one of several DB boxes we’ll be evaluating to see which direction we should be going.

Do we want hundreds of low clock speed cores, or mere dozens of high clock speed cores.

And speaking of speed… today was our first automated DT since putting this AMD box into production mid week, and not surprising, it is our fastest DT in over 4 years!


24 Jun

Comment

We’ll do a devblog at a minimum and have had other people mention a stream… will look into it further!

CCP RAM and I did a presentation a few years ago at fanfest - it’s a mix of how we came to CCP and daily Operational tasks/tech talk at the end: https://www.youtube.com/watch?v=w8rGZCj6rgQ

Comment

we’ve dropped from 30% CPU to well under 10% - at this time of day things are typically calm - CPU is rarely balanced across cores due to the nature of how SQL works - it could using a single core for a call or a parallel plan that would hit multiple cores.

I can say that we’re seeing much more balanced NUMA nodes (groups of cores) which is one of the major things we were after here compared to our old setup

image1653×890 42.6 KB

Comment

This is actually just a proof of concept… it’s a loaner box to see how we like it and how the AMD architecture will work for EVE (I love our vendors!). But I can’t imagine we would ever fill the other socket, at least not with another CPU like this.