~/devreads

#devops

62 posts

9 Jun

Vinorth Varatharasan 1 min read

REX Tech Lead : reprendre un data lake AWS en perdition. Dette technique de 5 ans, pas de tests, pas de monitoring, équipe de 2 juniors sans PO. 4 leviers : assainir (monorepo, Terraform, CI/CD, Python), coaching, pair programming, collective ownership. Résultat : 5 000 tests, datamart Snowflake/dbt, 100 000 $ d'économies FinOps, équipe autonome.

data and aibonne pratiquedevopscoachingtransformation

Vinorth Varatharasan 1 min read

REX FinOps : en ouvrant AWS Cost Explorer sur un data lake à 6 500 $/mois, trois anomalies apparaissent. AWS KMS à 4 000 $/mois, des licences fantômes pour des développeurs partis, 10 pipelines CI/CD redondantes. Optimisation des coûts cloud : 100 000 $ d'économies annuelles. Sans outil, sans migration. Juste de la curiosité. Checklist incluse.

cloud and platformbonne pratiquedevopssreaws

3 Jun

1 Jun

Basile du Plessis 1 min read

Les DSI ont multiplié outils, automatisation et cloud pour accélérer le delivery. Mais le vrai frein n’est plus le code : ce sont les processus autour : validations, sécurité, architecture, production. Le Platform Engineering émerge pour répondre à cette tension, avec une approche produit, organisationnelle et stratégique.

cloud and platformdevopsstratgietransformationplatform as a service

28 May

24 May

10 May

SitePoint Team 1 min read

Comprehensive guide covering Maxim AI vs DeepEval vs LangSmith vs QA Wolf: Which AI Agent Testing Framework Should You Trust With Production in 2026? with practical implementation details. Maxim AI vs DeepEval vs LangSmith vs QA Wolf: Which AI Agent Testing Framework Should You Trust With Production in 2026? on SitePoint.

aiprogrammingdevops

2 May

20 Apr

SitePoint Team 1 min read

Advanced guide to deploying Claude Code as a fully autonomous agent for software engineering tasks. Covers agent scaffolding, multi-turn reasoning loops, error recovery, and integration with existing CI/CD pipelines. Includes real-world examples of agents handling full feature development cycles. Claude Code as an Autonomous Agent: Advanced Workflows (2026) on SitePoint.

aiprogrammingdevops

28 Aug 2025

Kamal Kumar 1 min read

In production environments, debugging alerts can sometimes feel like finding a needle in a haystack. Over the years, I’ve found the OSI (Open Systems Interconnection) model to be a reliable guide during Root Cause Analysis (RCA) of production issues. What is the OSI Model? The OSI model is a conceptual framework that standardizes the functions of a telecommunication or computing…

production-debuggingsrercadevopsosi-model

25 Aug 2025

Kovi 9 min read

Discover how Bazaarvoice migrated millions of UGC records from RDS MySQL to AWS Aurora – at scale and with minimal user impact. Learn about the technical challenges, strategies, and outcomes that enabled this ambitious transformation in reliability, performance, and cost efficiency Bazaarvoice ingests and serves millions of user-generated content (UGC) items—reviews, ratings, questions, answers, and […]

databasedevops

30 Jun 2025

8 Jun 2025

Jeffrey Theobald 6 min read

Are you ready for more self-service datastore adventures? If you haven’t already, have a look at our previous entries in this series: Unlocking Efficiency: A New Era for Datastore Provisioning Simplifying Datastore Provisioning with Kubernetes Operators Resolving Incidents With The Remote Incident Console They’re a fun read. The story so far Last time, in Simplifying Datastore Provisioning with Kubernetes Operators…

storagecredentialsself-servicedevopskubernetes

14 Apr 2025

Dan Carton 5 min read

In the world of DevOps and Developer Experience (DevXP), speed and efficiency can make a big difference on an engineer’s day-to-day tasks. Today, we’ll dive into how Slack’s DevXP team took some existing tools and used them to optimize an end-to-end (E2E) testing pipeline. This lowered build times and reduced redundant processes, saving both time…

uncategorizedci-cddeveloper-experiencedeveloper-productivitydevops

16 Dec 2024

Zhengyu Shen 12 min read

Overview The past few months have been exciting times for Slack’s CI infrastructure. After years of developer frustration with Jenkins (everything from security issues to downtime to generally poor UX) internal pressure led us to move a majority of Slack’s CI jobs from Jenkins to GitHub Actions. My intern project at Slack this summer involved…

uncategorizedci-cddevopsdevtoolsmachine-learning

31 Aug 2023

Edgar Trujillo 4 min read

On the racetrack of building ML applications, traditional software development steps are often overtaken. Welcome to the world of MLOps, where unique challenges meet innovative solutions and consistency is king. At Bazaarvoice, training pipelines serve as the backbone of our MLOps strategy. They underpin the reproducibility of our model builds. A glaring gap existed, however, […]

artificial intelligencebig datadevopsopen sourcesoftware architecture

28 Jun 2023

Joyce Lin 2 min read

It’s not just developers who rely on APIs. DevOps engineers and data engineers also use APIs for many reasons, including to manage cloud infrastructure. For example, you can programmatically manage resources, configure services, and perform operations using APIs. Let’s review other reasons to use cloud APIs. Reasons to use cloud APIs In addition to providing a management console and SDKs,…

infrastructuredevopssoftware-developmentgcpapi

21 Mar 2023

Tricia Bogen 9 min read

This blog post discusses the strategies that Slack uses to manage the lifecycle (development, support, and eventual retirement) of infrastructure projects, through the lens of the migration through three successive internal “platform” offerings. Our challenges Circa 2020, our Cloud Engineering team (now evolved into multiple teams responsible for narrower aspects) was responsible for managing our…

uncategorizedcloud-computingcollaborationdevopsinfrastructure

16 Mar 2023

Jacob 3 min read

A conversation with engineers who help run Blinkit Chinthakunta Sumanth Kumar Reddy is an SDE 3 at Blinkit. He joined us in March 2021 and has since helped us build a resilient application platform at Blinkit. He currently works as a part of Software Resilience Engineering (SRE)–enabling scalable database migrations for Blinkit’s applications. Tell us about your background and your…

quick-commercedevopssite-reliability-engineerculturepeople-at-blinkit

23 Feb 2023

Jacob 3 min read

A conversation with engineers who help run Blinkit Jay Dihenkar is a Staff Engineer at Blinkit. He joined us in December 2020 and has helped different teams manage and streamline their build and release processes. He is currently working towards continuously improving the reliability, scalability, observability, developer productivity, and other such aspects of a software system critical for ensuring that…

people-at-blinkitculturedevopsquick-commercesite-reliability-engineer

24 Oct 2022

Tamas Kadlecsik 9 min read

This case study shows how we reformed a scale-up's dev processes after uncovering severe discrepancies between the official and real way of getting things done. The post Do your engineers do what you think they do? appeared first on RisingStack Engineering.

case studydevops

29 Aug 2022

12 Jul 2022

20 Dec 2021

Junmin Liu 9 min read

— Building Felix, the Design System for Groupon Several new features have been released on Groupon.com recently, such as the QR code in the navigation bar to download the app, and a banner carousel to display multiple banner messages within a single view. In the past, similar product features might take 2–3 sprints to complete, but now all of these…

design-systemsdevopsdesignerdeveloper

20 Oct 2021

19 Oct 2021

10 Aug 2021

Nathan Leiby 5 min read

Why multi-region sessions? Each year leading up to Back to School (our busiest season), Clever’s engineering team invests in our highest traffic systems to make sure we can handle user growth and new traffic patterns. During 2020–2021, SAML auth at Clever grew from <10% of our login related traffic to about 40% of our traffic! For this […] The post…

devopsresiliency

7 Jul 2021

RisingStack Engineering 13 min read

The main drawback of a Ceph storage is that you have to host and manage it yourself. In this post, we'll check two different approaches of deploying Ceph. The post How to Deploy a Ceph Storage to Bare Virtual Machines appeared first on RisingStack Engineering.

devopsedited

25 Jun 2021

2 Jun 2021

7 Jul 2020

Janos Kubisch 6 min read

Learn how to distribute and run Jmeter tests along multiple droplets on DigitalOcean using Terraform, Ansible, and bash scripting - to automate the process. The post Distributed Load Testing with Jmeter appeared first on RisingStack Engineering.

devopsedited

20 Jan 2020

24 Aug 2019

Schakko 1 min read

Shortly after I had started the work on nerdhood.de I built a deployment pipeline. The bash-based build script for my Laravel application was easy but triggering the deployment itself turned out to be more difficult than expected. In the end I built something with two AWS Lambda function, SNS, an […] The post Deploying with SSH using GitHub Actions appeared…

ci cddevops

9 Apr 2019

Schakko 2 min read

We are currently in the process of migrating our alerting infrastructure from OMD to Atlassian’s OpsGenie. Most of the features (SMS, phone call etc.) worked out of the box but we struggled with pushing alerts back into our on-premises Jira instance. Enable logging of POST requests OpsGenie does not provide […] The post Using Atlassian OpsGenie with a localized on-premises…

devops

12 Mar 2019

24 Jan 2019

21 Jan 2019

Schakko 2 min read

I am currently working on a Jenkins declarative pipeline to connect the Jenkins builds with Kubernetes, Helm and Netflix Spinnaker. One of TODOs has been to deploy different artifacts (e.g. a helm chart my-chart-0.0.1.tar.gz) to an AWS S3-compatible bucket inside a Minio installation with help of pipeline-aws-plugin. When running withAWS(endpointUrl: […] The post Receiving “com.amazonaws.services.s3.model.AmazonS3Exception: Not Found” when using Jenkins’…

devops

18 Jan 2019

patrick.sullivan 5 min read

Are you working on an agile team? Odds are high that you probably are. Whether you do Scrum/Kanban/lean/extreme, you are all about getting work done with the least resistance possible. Heck, if you are still on Waterfall, you care about that. But how well are you doing? Do you know? Is that something a developer […]

culturedeveloper portaldevopsopen sourceagile

16 Oct 2018

Janos Kubisch 8 min read

This post highlights some git features that might be less used/known, but can end up saving you when things go south in the codebase. The post Git Catastrophes and Tips to Avoid Them appeared first on RisingStack Engineering.

devopsedited

5 Sept 2018

14 Aug 2018

Tamas Kadlecsik 7 min read

In this post, I’m going to teach how you can debug a Node.js app in a Docker container to catch bugs that cannot be revealed in any other way. The post How to Debug a Node.js app in a Docker Container appeared first on RisingStack Engineering.

devopsedited

10 Apr 2018

16 Mar 2018

Ben Adida 8 min read

On Tuesday, Wednesday, and Thursday, March 6th-8th, 2018, Clever logins failed for all customers: 1h on Tuesday, 1h15 on Wednesday, and almost 5h on Thursday. This was Clever’s single worst outage ever in length, repeatedness, and impact. This postmortem is the first of many public steps we’ll be taking to ensure Clever is a service […] The post Postmortem on…

devops

13 Mar 2018

7 Mar 2018

28 Feb 2018

14 Feb 2018

18 Dec 2017

12 Dec 2017

Tamas Kadlecsik 10 min read

In this article I’ll walk you through how we perform consumer driven contract testing in our Node.js microservices architecture with the Pact framework. The post Consumer Driven Contract Testing with Pact appeared first on RisingStack Engineering.

devopsedited

11 Dec 2017

Nathan Leiby 6 min read

tl;dr: Try out microplane! It’s a CLI tool to make changes across many repos. The Problem At Clever, we’ve embraced microservices. They promote modularity, which leads to simpler code bases and lets our engineers move quickly and independently. They are easier to deploy, which helps us build towards incremental, frequent deploys and continuous delivery. In […] The post Mo Repos,…

devopsgolanggitgithub

22 Nov 2017

19 Nov 2017

Henrik Warne 5 min read

During my career as a software developer, I have seen the release frequency increasing steadily. When I started, it would take 12 to 18 months for new features to reach the customer. Years later the frequency increased, so deployment to … Continue reading →

workcontinuous deliverydeploymentdevops

11 Oct 2017

13 Sept 2017

14 Jul 2017

21 Apr 2017

23 Jan 2017

jona fenocchi 7 min read

At Bazaarvoice, we’re big fans of cloud. Real big. We’re also fans of DevOps. There’s been a lot of discussion over the past several years about “What is DevOps?” Usually, this term is used to describe Systems Engineers and Site Reliability Engineers (sometimes called Infrastructure Engineers, Systems Engineers, Operations Engineers or, in the most unfortunate […]

culturedevops

7 Feb 2016

Schakko 2 min read

You may have already heard that InfluxDB 0.10 GA has been published a few days ago. In my case the most interesting improvement are the much higher compression rates: At the moment my co-workers of NeosIT and I are collecting performance data from four internal virtual machines. Have been running […] The post Migrating InfluxDB from 0.9.6 to 0.10.0 GA…

devopsinfluxdb

13 Oct 2015

Schakko 4 min read

My employer NeosIT offers a web based SMS notifiyng solution for organizations with security roles named ZABOS. In the last months we extended the ZABOS application to support digital alerting through POCSAG. After some problems with a third party component we implemented the ability to collect all POCSAG telegrams delivered […] The post Collecting and visualizing metrics with statsd, InfluxDB…

devops

8 Apr 2015

Nathan Leiby 7 min read

Sometimes it’s obvious what code has to change, but it’s painfully hard to prove you’ve fixed it. When’s the last time a conceptually simple fix took you hours longer to than planned, because you could not get the project running locally to verify your change worked? I just want to change a little CSS on […] The post Aviator: locally…

devops

23 Jan 2014

Dave Cheney 2 min read

This post is a rant about a word. A rant about a word that had a clear meaning but has been appropriated for something wholly less meaningful. The word is of course Devops. Over the last few years, as the practice itself has grown in prominence, its description has become diluted beyond the point of recovery. […]

small ideasdevops

28 Jul 2013

Schakko 5 min read

You know it: an old project gets reactivated because your customer needs a new feature or has found bug. Meanwhile, the responsible developers have been reassigned to new projects and have no time to finish the task. You are currently not at a 100 percent workload so you get the […] The post DevOps: Stop coding and get your stuff…

workdevopsdevsdocumentationops