When you first start learning about data governance, it often seems like a hairball of tightly knit ideas where you can’t understand any one piece until you’ve studied and learned the whole thing. I’m not an expert by any stretch, but I’ve wrestled with learning about data governance long enough to find a way ofContinue reading “Building Your Data Governance Toolbox”
Category Archives: Platform Design
Layers of Data Infrastructure 3: Storage
In my last two posts I’ve explored the high-level design decisions related to two of the three layers that define each pipeline stage of each category of data use cases: Control and Compute. The Control layer defines how the user interacts with the system, while the Compute layer defines how the system does the work.Continue reading “Layers of Data Infrastructure 3: Storage”
Data Infrastructure Layers 2: Compute
In my last post I described how you can think of your organization’s data infrastructure as a grid of blocks defined by category of use case and stage of the pipeline. Each block can be further broken down into three layers: Control, Compute and Storage. Last time I briefly described these layers, then discussed differentContinue reading “Data Infrastructure Layers 2: Compute”
Data Infrastructure Layers 1: Control
In my last two posts, I started to break down the types of areas where an organization might need to deploy data tools/infrastructure along two axes: the categories of common use cases and the stages that you’ll encounter in most of these use cases. You can think of these as defining a grid of functionality.Continue reading “Data Infrastructure Layers 1: Control”
Common Stages of Data Workflows
I want to start going into more details of the categories of data use cases that I introduced in my last post. When you think of each use case, it’s easy to focus on a fairly narrow piece of it – typically the most interesting parts. But within each use case there are a numberContinue reading “Common Stages of Data Workflows”
Categories of Data Use Cases
As the head of software engineering at a small startup with ambitions to grow much larger, I think a lot about how to design data infrastructure that will both address our immediate needs and adapt to future needs. I’ve seen what happens at large companies when each team has their own set of data infrastructure:Continue reading “Categories of Data Use Cases”