~/devreads

#data-engineering

14 posts

9 Jun

Patrick Lam 9 min read

How Airbnb’s data engineers and analytics engineers built a consistent and flexible data modeling framework to support the expansion into Homes, Experiences, and Services. By : Patrick Lam , Namrata Lamba , Jamie Stober With the May 2025 Summer Release, Airbnb redesigned its app, relaunched Experiences, and debuted Services, pushing us beyond our traditional Homes focus. For the data teams,…

data-engineeringanalytics-engineeringtechnologydata-modelingdata-architecture

3 Jun

Poorva Patil 6 min read

Photo by Corinne Kutz on Unsplash Before we knew better Our orchestration system started as a simple internal solution to manage event pipelines and trigger downstream jobs. Over time, as more workflows and dependencies were added, it gradually evolved into a tightly coupled monolithic scheduler that became increasingly difficult to understand and maintain. Understanding how a workflow executed often meant…

etlapache-airflowawsdata-engineeringsoftware-architecture

5 May

Mahendran Vasagam 13 min read

Excerpt By 2024, Slack’s data platform had accumulated 700+ SSH-based operators orchestrating critical data pipelines. We’re talking daily search indexing that processed terabytes of data, analytics jobs powering business intelligence, the whole shebang. Every single one of these jobs required direct SSH access to production AWS Elastic MapReduce (EMR) clusters. We had a massive security…

uncategorizedairflowawsbig-datadata-engineering

2 May 2025

Sameeksha Bhatia 7 min read

Load Testing API’s on Redshift & Snowflake — A Quick POC Overview At Helpshift, our data platform follows a Lakehouse architecture , combining the best of both data lakes and data warehouses . This architecture allows us to store and analyze large amounts of raw data in a structured and organized manner, while also providing the scalability and low-cost storage…

load-testingdata-engineeringsnowflakeredshiftperformance

2 Jul 2024

Nilanjana Mukherjee 9 min read

Slack Data Engineering recently underwent data workload migration from AWS EMR 5 (Spark 2/Hive 2 processing engine) to EMR 6 (Spark 3 processing engine). In this blog, we will share our migration journey, challenges, and the performance gains we observed in the process. This blog aims to assist Data Engineers, Data Infrastructure Engineers, and Product…

uncategorizedanalyticsawsbig-datadata-engineering

8 May 2024

Lakshmi Mohan 8 min read

The Data Engineering team is responsible for Slack’s data lake, analytics dashboards, and other data services. The team’s mission is to empower users to leverage data to make decisions quickly, accurately, and easily. Slack’s data lake grew in size from sub-petabyte to over 100 petabytes in recent years and it now spans millions of tables.…

data-engineering

8 Jan 2024

Bisman Sodhi 4 min read

Hi my name is Bisman and I studied Computer Science at University of California, Santa Barbara. During summer of 2022, I had the most amazing experience working as a Software Engineer Intern on Strava’s Data Platform Team. In the first fews weeks, I learned the tools my team uses and then spent the rest of the time working on my…

software-engineeringdata-platformsdata-engineering

9 Oct 2023

28 Apr 2023

Lou Kratz 7 min read

(cover image from ThisisEngineering RAEng) Let’s face it: software is easier to write than maintain. This is why we, as software engineers, prefer to just “rip it out and start over” instead of trying to understand what another developer (or our past self) was thinking. We seem to have collectively forgotten that “programs must be […]

uncategorizedartificial intelligenceawsaws sagemakerdata engineering

13 Sept 2022

19 Oct 2021

Joe Minichino 6 min read

A real data lake. Traditional Data Engineering relies on products such as Airflow, Hadoop, Spark and Spark-based architectures, or similar technologies. These are still viable solutions for a number of reason, not least the fact that Data Engineers are few and far between, and the vast majority of them will be familiar in the above technologies or similar products/frameworks. Go…

golangaws-lambdaserverless-architecturedata-engineeringaws-athena

17 Aug 2021

Samuel Bock 8 min read

Reinventing how the world does work inevitably creates a lot of data. Each year, Slack’s scale has increased and the volume of data ingested and stored has kept pace. To make it possible to understand relationships within our data, we’ve invested heavily in an automated data lineage framework. This facilitates producer/consumer coordination, improves risk mitigation,…

uncategorizedbig-datadata-engineering

28 Jul 2021

Sarah Henkens 10 min read

With the release of Slack Connect, people can now collaborate both with internal employees and external organizations in the same channel. To make this as smooth as possible, Slack does predictive email analysis to classify and recommend the best way for a user to work with people they want to collaborate with. To accomplish this,…

uncategorizedalgorithmsdata-engineeringinfrastructure

16 Nov 2020