5999 stories
·
0 followers

Three Best Practices for Building a High-Performance Graph Database

1 Share
graphic with profile of person and brain activity

CrowdStrike® employees like to say that there is big data, huge data and our data. To date, we have collected, analyzed and stored more than 15 petabytes of data, generated through hundreds of billions of daily security events. At the center of this massive data repository is CrowdStrike Threat Graph®, our custom-built, cutting-edge security analytics tool that collects high-fidelity telemetry from millions of endpoints around the globe and indexes them for quick and efficient access. Threat Graph currently manages approximately 2 trillion vertices, a truly staggering number.

When creating CrowdStrike Threat Graph, the sheer volume of data wasn’t our only concern. Unlike many organizations, our data flow is not tied to predictable behavior patterns or the normal rhythm of business hours. As a result, our system needed to anticipate and absorb unexpected bursts of traffic in a very short period of time. Compounding matters was the fact that no existing tool on the market could perform at the level we needed — which suggested that what we were trying to achieve was at best unprecedented, and at worst, impossible.  

Spoiler: It was possible. In this article, we discuss how Threat Graph was designed to operate with tremendous speed, flexibility and agility, and we share three best practices for engineers who may be tasked with building something similar.

1.     Overcome Limitations Through Innovation

As part of our evaluation process, we tested multiple off-the-shelf graph database solutions, all of which failed to live up to our team’s needs. We also consulted with many of the developers of these graph databases, trying to determine if there was a way to adapt them to deliver at our scale. This process did not identify a solution, but it did uncover a winning approach — to identify key elements of existing tools that, when taken together, could create the next-generation graph database we needed.

For example, one of the concepts that we adopted was an “append only” approach to data collection. As the name implies, this means that records are never updated — only added. Rather than incur the higher cost of a typical read-modify-write operation, we instead decided that any modifications to a record would create a new record. This property helped reduce our database’s overall latency and consequently increased our throughput.

For our purposes, this approach made sense because we needed a system that could support large bursts of writes. Meanwhile, our rate of read requests was relatively low and much more controllable. We place tremendous value on identifying malicious behavior quickly, but we need to have the data in order to make this determination. As such, we design systems like Threat Graph to favor collection and storage, equipping the necessary “dials” to keep the higher-cost analysis of this data at a predictable rate. In building Threat Graph this way, we identified a valuable trade off — one that aligns the priorities of the technology with the business strategy. 

2.     Complexity Is The Enemy of Scale

At the outset of this project, the main issue we needed to address was managing an extremely large volume of data with a highly unpredictable write rate. At the time, we needed to analyze a few million events per day — a number that we knew would grow and is now in the hundreds of billions.

The project was daunting, which is why we decided to step back and think not about how to scale, but how to simplify. We determined that by creating a data schema that was extraordinarily simple, we would be able to create a strong and versatile platform from which to build. So our team focused on iterating and refining until we got the architecture down to something that was simple enough to scale almost endlessly.

In programming, one of the classic ways of representing a graph-like structure is a simple adjacency list, and that is essentially how we designed the schema for Threat Graph.

chart with yellow, cyan and red rows

Figure 1. Example schema

Above, Figure 1. shows a representation of what a simplified Threat Graph schema could look like. One of the first things to notice is the use of a single table to represent both vertices and edges. This has the ability to retrieve all data about a desired vertex, including vertex details and edge relationships, by executing exactly one query in exactly one table. Furthermore, that query should be able to do a sequential read as the data is indexed to be contiguously based on the common vertex ID which is shared with vertex details and edges. This is possible because of the presence of a column (Type in this example) that allows us to distinguish what each row of data represents.

The highlighting of the rows represents a few key concepts of what this schema is able to achieve:

chart yellow rows

  • Yellow rows highlight the “append-only” nature of this data model. Note that two rows have the same Vertex ID and are both of Type ProcessVertex. Also note that the timestamps differ between the two rows. Imagine that we receive one request to mutate data about proc:iA3dffeb on 2019-03-31T07:13:42, which is the first record we’ve ever received about this process vertex. Later, we receive another request to mutate the data about proc:1A3dffeb. In this scenario, rather than overwrite the first row, we instead append a new row. If someone wants to know the current state of this vertex, we need to read both rows of Type ProcessVertex and have a strategy for handling collisions when two rows both want to mutate the same property.

chart rows in cyan

  • Cyan rows represent how we store all of the data for a vertex, including its details and edge relationships clustered together so they’re available to read efficiently. The UserVertex row with a Vertex ID of user:jsmith represents the detail data for the vertex itself and would store this detail in the Data column. Immediately under it are a UserHostEdge and UserProcessEdge that have the same Vertex ID user:jsmith. Notice, however, that unlike the vertex rows, the rows that represent edges have a value in the Adjacent ID column. The adjacency column is what makes this an adjacency-list, and it is what enables us to represent relationships between entities. Another cool callout is that the edges can also have their own data present in the Data column, which can be used to help inform traversal paths or just provide additional context.

chart red rows

  • Red rows demonstrate how we account for “deleting” data. You’ll notice that the very first red row has a type DeleteVertex for Vertex ID host:SDC-65113. Threat Graph guarantees that a delete marker will be the first row read, if it exists, when querying for anything about a vertex. This guarantee comes as a result of the way sort data in the index, ensuring delete records have a lower sort value than other records. Now we can short-circuit and stop reading any additional rows since this vertex is now considered deleted. This also demonstrates how we handle deletes in an append-only data model. We actually represent a delete by creating a new record.

What we’ve noticed over the years is that the simplicity of our design has really stood the test of time. One of the reasons why CrowdStrike has such a stellar scalability record is because our architecture is able to adapt to the needs of the market. Being clever and complicated might seem like an achievement in the moment, but it is much more difficult to manage over the long term.

3.     Shrink the Feedback Cycle

One of the guiding principles of Threat Graph is that we need to have visibility into change. At CrowdStrike, we’re dealing with the security of our customers — and by extension, we are protecting millions of their customers by stopping breaches — so it’s critical that we know what will happen as a result of every action we take.

One of the simplest ways we gain visibility is by leveraging our metrics package. We use this information to make performance projections about our expectations before we deploy. Then, once we release, we monitor the results in real time, gather feedback on the changes introduced into production and confirm the outcomes are in line with our expectations. This tight cycle of feedback is incredibly useful when operating at this scale.

In a general sense, this concept is nothing new. However, what makes CrowdStrike different is that we have made significant investments in the tooling around our metrics collection and visualization systems. We have comprehensive tooling that helps us understand our key performance indicators (KPIs) and anticipate what they mean for our customers.

In following these three best practices, our team was able to create a solution that addressed our need to process a staggering amount of data without compromising speed or accuracy. Perhaps more importantly, the simplicity of our design enhances our ability to adapt and evolve Threat Graph to meet the demands of the future. Finally, the processes we have in place help ensure that we maintain visibility into how those changes will affect our system performance. As we see it, Threat Graph is a cutting edge-application — and our job is to make sure it stays that way.

Does this work sound interesting to you? Visit CrowdStrike’s Engineering and Technology page to learn more about our engineering team, our culture and current open positions.  

Additional Resources

The post Three Best Practices for Building a High-Performance Graph Database appeared first on .

Read the whole story
tain
12 hours ago
reply
Share this story
Delete

Former JAIC commander explains his philosophy on leadership

1 Share

Lt. Gen. Jack Shanahan in one of his last acts as the JAIC commander joins Leaders and Legends with host Aileen Black.

Interview Highlights:

Lt. Gen. Shanahan shared his thoughts on leadership the day before he took his terminal leave after a 36 year distinguished career in the military. He shared his insight on his approach to managing through a crisis to include his experience on Project Maven.

Shanahan pointed out that “crisis is a fact of life, it is going to happen”. He shared these core points that you must possess as a leader during a crisis.  The first point he stressed is communication.  “You can’t over communicate in a crisis”.  The second point is the need for honesty, “Tell people what is going on and what is going well as well as what is not. People need to hear the truth and how you plan on handling it”.  He said that people “thirst for leadership” in a crisis.  Leaders need to be action oriented, authentic and have a steady hand he stressed.     “People are resilient and optimistic if you provide people with a vision that there is a way through the crisis” he shared.

When asked about who or what was the biggest influence on his leadership style he said he has learned from all the leaders he has worked with over his career.  “I learned the good and the bad from every leader I have worked with but my core leadership I learned from my parents.  From a historical leader, I would site Winston Churchill.  He was a leader that had resilience, wit with an action oriented style.”

When asked about his leadership style and how he approaches key decisions he shared that “How you approach decisions change as the size of an organization grows”.  Shanahan has given this advice to leaders at every level of his organizations saying, “Only make the decisions you need to make.  If you make decisions for people that are below you should be making or for your superiors you are wasting your time. You have to ask yourself, if you gave them the proper guidance then they should be making the decision.  This is also called commanders intent”.   He shared his favorite advice he got from a leader, Gen. Robert Neller, USMC,  “No Friction, no traction.  I don’t expect you to agree on everything I want to do.  I want to run the decision through the crucible of friction to find the best answer”.

When asked about his key philosophy on leadership he said “People first, mission always”.  If you live by this approach people will know you care and the mission will always be in focus he stressed.

Shanahan closed the show with his advice for the next generations,

  • Be persistent and work hard
  • Be willing to adapt and be a life long learner
  • Keep the passion and flames going. Don’t loose sight of your passion.
  • Strive everyday to be the best and learn everyday how to be even better.

Lt. Gen. John N.T. “Jack” Shanahan is the director, Joint Artificial Intelligence Center, Office of the Department of Defense Chief Information Officer, the Pentagon, Arlington, Virginia. General Shanahan is responsible for accelerating the delivery of artificial intelligence-enabled capabilities, scaling the department-wide impact of AI and synchronizing AI activities to expand joint force advantages.



Read the whole story
tain
14 hours ago
reply
Share this story
Delete

A Look At AI Benchmarking For Mobile Devices In a Rapidly Evolving Ecosystem

1 Share
MojoKid writes: AI and Machine Learning performance benchmarks have been well explored in the data center, but are fairly new and unestablished for edge devices like smartphones. While AI implementations on phones are typically limited to inferencing tasks like speech-to-text transcription and camera image optimization, there are real-world neural network models employed on mobile devices and accelerated by their dedicated processing engines. A deep dive look at HotHardware of three popular AI benchmarking apps for Android shows that not all platforms are created equal, but also that performance results can vary wildly, depending on the app used for benchmarking. Generally speaking, it all hinges on what neural networks (NNs) the benchmarks are testing and what precision is being tested and weighted. Most mobile apps that currently employ some level of AI make use of INT8 (quantized). While INT8 offers less precision than FP16 (Floating Point), it's also more power-efficient and offers enough precision for most consumer applications. Typically, Qualcomm Snapdragon 865 powered devices offer the best INT8 performance, while Huawei's Kirin 990 in the P40 Pro 5G offers superior FP16 performance. Since INT8 precision for NN processing is more common in today's mobile apps, it could be said that Qualcomm has the upper hand, but the landscape in this area is ever-evolving to be sure.

Read more of this story at Slashdot.

Read the whole story
tain
1 day ago
reply
Share this story
Delete

Rust Enters 'Top 20' Popularity Rankings For the First Time

1 Share
Programming language Rust has entered the top 20 of the Tiobe popularity index for the first time, but it's still five spots behind systems programming rival Go. ZDNet reports: There's growing interest in the use of memory-safe Rust for systems programming to build major platforms, in particular at Microsoft, which is exploring it for Windows and Azure with the goal of wiping out memory bugs in code written in C and C++. Amazon Web Services is also using Rust for performance-sensitive components in Lambda, EC2, and S3. Rust has seen its ranking rise considerably on Tiobe, from 38 last year to 20 today. Tiobe's index is based on searches for a language on major search engines, so it doesn't mean more people are using Rust, but it shows that more developers are searching for information about the language. Rust was voted for the fifth year straight the most loved programming language by developers in Stack Overflow's 2020 survey. This year, 86% of developers said they are keen to use Rust, but just 5% actually use it for programming. On the other hand, it could become more widely used thanks to Microsoft's public preview of its Rust library for the Windows Runtime (WinRT), which makes it easier for developers to write Windows, cross-platform apps and drivers in Rust.

Read more of this story at Slashdot.

Read the whole story
tain
1 day ago
reply
Share this story
Delete

Coronavirus Patients Lose Senses of Taste, Smell -- and Haven't Gotten Them Back

1 Share
An anonymous reader quotes a report from The Wall Street Journal: Clinicians racing to understand the novel disease are starting to discern an unusual trend: one common symptom -- the loss of smell and taste -- can linger months after recovery. Doctors say it is possible some survivors may never taste or smell again. Out of 417 patients who suffered mild to moderate forms of Covid-19 in Europe, 88% and 86% reported taste and smell dysfunctions, respectively, according to a study published in April in the European Archives of Oto-Rhino-Laryngology. Most patients said they couldn't taste or smell even after other symptoms were gone. Preliminary data showed at least a quarter of people regained their ability to taste and smell within two weeks of other symptoms dissipating. The study said long-term data are needed to assess how long this can last in people who didn't report an improvement. Anyone who has had the sniffles knows a stuffy nose impedes smell and taste; the novel coronavirus's ability to break down smell receptors is puzzling because it occurs without nasal congestion. One theory is that the "olfactory receptors that go to the brain -- that are essentially like a highway to the brain -- commit suicide so they can't carry the virus to the brain," said Danielle Reed, associate director of the Monell Chemical Senses Center. "It could be a healthy reaction to the virus. If that doesn't work, maybe people do get sicker," she said. "It might be a positive takeaway from what is obviously a devastating loss to people."

Read more of this story at Slashdot.

Read the whole story
tain
1 day ago
reply
Share this story
Delete

Listen to the Terminator.

1 Share
Listen to the Terminator. submitted by /u/Redditor23005 to r/nextfuckinglevel
[link] [comments]
Read the whole story
tain
2 days ago
reply
Share this story
Delete
Next Page of Stories