~/devreads

#aws

42 posts

9 Jun

Vinorth Varatharasan 1 min read

REX FinOps : en ouvrant AWS Cost Explorer sur un data lake à 6 500 $/mois, trois anomalies apparaissent. AWS KMS à 4 000 $/mois, des licences fantômes pour des développeurs partis, 10 pipelines CI/CD redondantes. Optimisation des coûts cloud : 100 000 $ d'économies annuelles. Sans outil, sans migration. Juste de la curiosité. Checklist incluse.

cloud and platformbonne pratiquedevopssreaws

3 Jun

Poorva Patil 6 min read

Photo by Corinne Kutz on Unsplash Before we knew better Our orchestration system started as a simple internal solution to manage event pipelines and trigger downstream jobs. Over time, as more workflows and dependencies were added, it gradually evolved into a tightly coupled monolithic scheduler that became increasingly difficult to understand and maintain. Understanding how a workflow executed often meant…

etlapache-airflowawsdata-engineeringsoftware-architecture

28 May

Shaurya Kethireddy 14 min read

In early 2023, Slack faced a foundational challenge: serving Large Language Models (LLMs) at enterprise scale with the security, reliability, and performance our customers expect. Over three years, we evolved from basic infrastructure to orchestrating a sophisticated multi-cloud architecture. We didn’t just want shiny new models; we needed a system resilient to regional outages and…

uncategorizedawsbackendcloud-computingcollaboration

13 May

Susannah McCloskey 3 min read

On a recent software development project that already planned to use AWS, we used AWS Cognito for authentication. Cognito is Amazon’s managed identity platform for web and mobile apps, offering features like MFA, password reset flows, and sign-in. On paper, it’s a strong fit for projects already using AWS. In practice, the rough edges cost […] The post 3 AWS…

awsauthenticationaws cognitomfa

5 May

Mahendran Vasagam 13 min read

Excerpt By 2024, Slack’s data platform had accumulated 700+ SSH-based operators orchestrating critical data pipelines. We’re talking daily search indexing that processed terabytes of data, analytics jobs powering business intelligence, the whole shebang. Every single one of these jobs required direct SSH access to production AWS Elastic MapReduce (EMR) clusters. We had a massive security…

uncategorizedairflowawsbig-datadata-engineering

3 May

17 Dec 2025

12 Dec 2025

Jin Kim 7 min read

At the recent AWS re:Invent, Docker focused on a very real developer problem: how to run AI agents locally without giving them access to your machine, credentials, or filesystem. With AWS introducing Kiro, Docker demonstrated how Docker Sandboxes and MCP Toolkit allow developers to run agents inside isolated containers, keeping host environments and secrets out...

communitycompanyproductsawsdocker mcp catalog

23 Oct 2025

Archie Gunasekara 13 min read

Last year, I wrote a blog post titled Advancing Our Chef Infrastructure, where we explored the evolution of our Chef infrastructure over the years. We talked about the shift from a single Chef stack to a multi-stack model, and the challenges that came with it – from updating how we handle cookbook uploads to navigating…

uncategorizedaws

28 May 2025

Alistair Forrester Burrowes 5 min read

Supporting developers to debug and resolve issues with datastores in the Self-Service ecosystem. Welcome to the third blog post of our Self-Service Datastore series, where we share our journey towards creating a more efficient and reliable way to manage datastores at Zendesk. Previous blog posts: Unlocking Efficiency: A New Era for Datastore Provisioning Simplifying Datastore Provisioning with Kubernetes Operators We…

self-servicekubernetes-operatorstorageaws

21 Dec 2024

2 Dec 2024

Bruno Marques 6 min read

Introduction Welcome to the second blog post of our Self-Service Datastore series, where we share our journey towards creating a more efficient and reliable way to manage datastores at Zendesk. In today’s dynamic application development landscape, the ability to swiftly provision datastores is crucial for maintaining agility and delivering exceptional user experiences. Provisioning encompasses all steps involved in requesting a…

awskubernetes-operatorself-servicestorage

17 Sept 2024

Archie Gunasekara 12 min read

At Slack, we manage tens of thousands of EC2 instances that host a variety of services, including our Vitess databases, Kubernetes workers, and various components of the Slack application. The majority of these instances run on some version of Ubuntu, while a portion operates on Amazon Linux. With such a vast infrastructure, the critical question…

uncategorizedawsinfrastructure

2 Jul 2024

Nilanjana Mukherjee 9 min read

Slack Data Engineering recently underwent data workload migration from AWS EMR 5 (Spark 2/Hive 2 processing engine) to EMR 6 (Spark 3 processing engine). In this blog, we will share our migration journey, challenges, and the performance gains we observed in the process. This blog aims to assist Data Engineers, Data Infrastructure Engineers, and Product…

uncategorizedanalyticsawsbig-datadata-engineering

24 Apr 2024

Someswar Bhowmick 5 min read

Bazaarvoice notification system stands as a testament to cutting-edge technology, designed to seamlessly dispatch transactional email messages (post-interaction email or PIE) on behalf of our clients. The heartbeat of our system lies in the constant influx of new content, driven by active content solicitations. Equipped with an array of tools, including email message styling, default […]

software architectureawscloudengineeringscalability

18 Apr 2024

Kelly Moran 6 min read

At Slack, we’ve long been conservative technologists. In other words, when we invest in leveraging a new category of infrastructure, we do it rigorously. We’ve done this since we debuted machine learning-powered features in 2016, and we’ve developed a robust process and skilled team in the space. Despite that, over the past year we’ve been…

uncategorizedawsengineeringinfrastructuremachine-learning

12 Dec 2023

Archie Gunasekara 10 min read

We are heavy users of Amazon Compute Compute Cloud (EC2) at Slack — we run approximately 60,000 EC2 instances across 17 AWS regions while operating hundreds of AWS accounts. A multitude of teams own and manage our various instances. The Instance Metadata Service (IMDS) is an on-instance component that can be used to gain an…

uncategorizedawscloud-computinginfrastructuresecurity

5 Nov 2023

28 Apr 2023

Lou Kratz 7 min read

(cover image from ThisisEngineering RAEng) Let’s face it: software is easier to write than maintain. This is why we, as software engineers, prefer to just “rip it out and start over” instead of trying to understand what another developer (or our past self) was thinking. We seem to have collectively forgotten that “programs must be […]

uncategorizedartificial intelligenceawsaws sagemakerdata engineering

18 Apr 2023

Tinder 7 min read

Authored by: Rojan Rijal, Tinder Security Labs | Johnny Nipper, Sr. Director | Tanner Emek, Sr Engineering Manager Summary In 2021, GitHub released support for OpenID Connect (OIDC) for GitHub Actions (GHA), allowing developers to securely interact with their infrastructure resources in Amazon Web Services (AWS), and other major cloud service providers. The OIDC support allows GHA jobs to retrieve…

securitygithubawsoidc

24 Jan 2023

Archie Gunasekara 9 min read

Slack launched GovSlack in July 2022. With GovSlack, government agencies, and those they work with, can enable their teams to seamlessly collaborate in their digital headquarters, while keeping security and compliance at the forefront. Using GovSlack includes the following benefits: Supports key government security standards, such as FedRAMP High, DoD IL4, and ITAR Runs in…

automationawsinfrastructure

25 Oct 2022

Archie Gunasekara 14 min read

At Slack, we use Terraform for managing our Infrastructure, which runs on AWS, DigitalOcean, NS1, and GCP. Even though most of our infrastructure is running on AWS, we have chosen to use Terraform as opposed to using an AWS-native service such as CloudFormation so that we can use a single tool across all of our…

uncategorizedautomationawsinfrastructure

26 Apr 2022

30 Mar 2022

Johnathan Ishmael 7 min read

BBC Online — A year with serverless Its been a little over a year since I published my last two blog posts, in which I outlined the process we went through to choose the technology for BBC online and the steps we took to optimise serverless for our use. Recently my colleague Graeme has published a blog post on the…

awsaws-lambdacloudserverless

1 Nov 2021

Luciano Mammino 8 min read

This post explains how to conditionally create resources in AWS CDK using CfnCondition. It provides a practical example of creating an S3 bucket based on an SSM parameter value. The post covers defining a condition, attaching it to a low-level CDK construct, and importing the conditionally created resource.

awscdkjavascripttypescript

29 Oct 2021

Saurabh Jain 9 min read

Pinion — The Load Framework Part-2 This post is the 2nd part of the “Pinion — The Load Framework” series. In case you have not read the 1st post, you can read it here . In this post, we are going to cover the following topics. How does Pinion use Delta Lake for SCD operations? Small file problem with Delta…

cloudsparkawsdelta-lakeoptimization

20 Oct 2021

6 Aug 2021

22 Jun 2021

Luciano Mammino 12 min read

The boto3 Python SDK allows intercepting requests before they are sent to AWS through an event handler system. This article shows how to use it to gzip the payload of PutMetricData requests sent to CloudWatch.

pythonaws

17 Jul 2020

Joe Minichino 10 min read

You need a Data Lake. The Context Teamwork has been around for more than 10 years. Starting out as a project management and work collaboration platform and later expanding into other areas, such as help-desk, chat, document management and CRM software. As the company has grown and evolved, data has grown, changed, expanded, diversified, fragmented, then changed again. Analytics in…

awsdata-lakebig-data

6 May 2020

24 Jul 2019

Alex Smolen 3 min read

At Clever, we lock down code access to customer data using AWS IAM roles with session policies. In Clever’s microservice AWS architecture, each service has a unique IAM role with access to the AWS resources it needs: S3 buckets, DynamoDB tables, and so on. Our services are multi-tenant and customer data is separated via logical […] The post Using IAM…

awssecurity

21 Oct 2018

Luciano Mammino 17 min read

The AWS Solutions Architect Associate exam covers a wide range of AWS services. This post shares helpful notes and tips for studying key concepts like EC2, S3, VPC, DynamoDB, and more. It provides advice on the exam mindset and lists official and unofficial preparation resources. The notes summarize important details around provisioned throughput, instance types, database replication and more that…

aws

5 May 2018

6 Feb 2018

Alex Smolen 4 min read

At Clever, one of our tenets is “Always a Student”, and in that spirit of learning we wanted to share the changes we made to fix memory allocation issues in AWS Elastic Container Service related to swappiness. Swappiness is a Linux Kernel setting that specifies how likely it is for a page in memory to be […] The post Swappiness…

awsdebugging

16 Dec 2017

14 Sept 2017

11 Aug 2017

19 Jun 2017

21 Apr 2017

11 Apr 2014

Nathaniel Eliot 1 min read

Cloudformation is a powerful tool for building large, coordinated clusters of AWS resources. It has a sophisticated API, capable of supporting many different enterprise use-cases and scaling to thousands of stacks and resources. However, there is a downside: the JSON interface for specifying a stack can be cumbersome to manipulate, especially as your organization grows […]

open sourceuncategorizedawsclicloudformation

22 Jun 2013

Ernest Mueller 3 min read

Greetings all! In the world of SaaS, wiser men than I have referred to Operations as the “Secret Sauce” that distinguishes you from your competition. As manager of one of our DevOps teams, I wanted to talk to you about how Bazaarvoice uses the cloud and how we engineer our systems for maximum reliability. You […]

talksawsinfrastructure