REX Tech Lead : reprendre un data lake AWS en perdition. Dette technique de 5 ans, pas de tests, pas de monitoring, équipe de 2 juniors sans PO. 4 leviers : assainir (monorepo, Terraform, CI/CD, Python), coaching, pair programming, collective ownership. Résultat : 5 000 tests, datamart Snowflake/dbt, 100 000 $ d'économies FinOps, équipe autonome.
#devops
62 posts
9 Jun
REX FinOps : en ouvrant AWS Cost Explorer sur un data lake à 6 500 $/mois, trois anomalies apparaissent. AWS KMS à 4 000 $/mois, des licences fantômes pour des développeurs partis, 10 pipelines CI/CD redondantes. Optimisation des coûts cloud : 100 000 $ d'économies annuelles. Sans outil, sans migration. Juste de la curiosité. Checklist incluse.
3 Jun
How a screen-level performance metric reshaped platform decisions, engineering ownership, and release discipline Continue reading on Expedia Group Technology »
1 Jun
Les DSI ont multiplié outils, automatisation et cloud pour accélérer le delivery. Mais le vrai frein n’est plus le code : ce sont les processus autour : validations, sécurité, architecture, production. Le Platform Engineering émerge pour répondre à cette tension, avec une approche produit, organisationnelle et stratégique.
28 May
Comprehensive guide covering this topic with practical implementation details. Automating Code Review with DeepSeek in GitHub Actions on SitePoint.
24 May
ECC supercharges Anthropic's Claude Code with 60 specialized agents, 232 skills, 75 commands, and a security scanner running 1,282 tests — plus multi-harness support across Codex, Cursor, OpenCode, and GitHub Copilot. Everything Claude Code: Turn Your AI Coding Agent Into a Production Engineering Platform on SitePoint.
10 May
Maxim AI vs DeepEval vs LangSmith vs QA Wolf: Which AI Agent Testing Framework Should You Trust With Production in 2026?
SitePointComprehensive guide covering Maxim AI vs DeepEval vs LangSmith vs QA Wolf: Which AI Agent Testing Framework Should You Trust With Production in 2026? with practical implementation details. Maxim AI vs DeepEval vs LangSmith vs QA Wolf: Which AI Agent Testing Framework Should You Trust With Production in 2026? on SitePoint.
2 May
If you spend any time in Linux forums, you've seen DistroWatch's Page Hit Ranking cited as proof one distro is "more popular" than another. It isn't. Continue reading...
20 Apr
Advanced guide to deploying Claude Code as a fully autonomous agent for software engineering tasks. Covers agent scaffolding, multi-turn reasoning loops, error recovery, and integration with existing CI/CD pipelines. Includes real-world examples of agents handling full feature development cycles. Claude Code as an Autonomous Agent: Advanced Workflows (2026) on SitePoint.
28 Aug 2025
In production environments, debugging alerts can sometimes feel like finding a needle in a haystack. Over the years, I’ve found the OSI (Open Systems Interconnection) model to be a reliable guide during Root Cause Analysis (RCA) of production issues. What is the OSI Model? The OSI model is a conceptual framework that standardizes the functions of a telecommunication or computing…
25 Aug 2025
Discover how Bazaarvoice migrated millions of UGC records from RDS MySQL to AWS Aurora – at scale and with minimal user impact. Learn about the technical challenges, strategies, and outcomes that enabled this ambitious transformation in reliability, performance, and cost efficiency Bazaarvoice ingests and serves millions of user-generated content (UGC) items—reviews, ratings, questions, answers, and […]
30 Jun 2025
You may be new to this series; and if so welcome! If so, I encourage you to start at the beginning of our datastore journey and see the blog post “ Unlocking Efficiency: A New Era for Datastore Provisioning ”. Already up to date in our series? MAGICAL — then let’s continue with a quick re-cap. Where are we? We…
8 Jun 2025
Are you ready for more self-service datastore adventures? If you haven’t already, have a look at our previous entries in this series: Unlocking Efficiency: A New Era for Datastore Provisioning Simplifying Datastore Provisioning with Kubernetes Operators Resolving Incidents With The Remote Incident Console They’re a fun read. The story so far Last time, in Simplifying Datastore Provisioning with Kubernetes Operators…
14 Apr 2025
In the world of DevOps and Developer Experience (DevXP), speed and efficiency can make a big difference on an engineer’s day-to-day tasks. Today, we’ll dive into how Slack’s DevXP team took some existing tools and used them to optimize an end-to-end (E2E) testing pipeline. This lowered build times and reduced redundant processes, saving both time…
16 Dec 2024
Overview The past few months have been exciting times for Slack’s CI infrastructure. After years of developer frustration with Jenkins (everything from security issues to downtime to generally poor UX) internal pressure led us to move a majority of Slack’s CI jobs from Jenkins to GitHub Actions. My intern project at Slack this summer involved…
31 Aug 2023
On the racetrack of building ML applications, traditional software development steps are often overtaken. Welcome to the world of MLOps, where unique challenges meet innovative solutions and consistency is king. At Bazaarvoice, training pipelines serve as the backbone of our MLOps strategy. They underpin the reproducibility of our model builds. A glaring gap existed, however, […]
28 Jun 2023
It’s not just developers who rely on APIs. DevOps engineers and data engineers also use APIs for many reasons, including to manage cloud infrastructure. For example, you can programmatically manage resources, configure services, and perform operations using APIs. Let’s review other reasons to use cloud APIs. Reasons to use cloud APIs In addition to providing a management console and SDKs,…
21 Mar 2023
This blog post discusses the strategies that Slack uses to manage the lifecycle (development, support, and eventual retirement) of infrastructure projects, through the lens of the migration through three successive internal “platform” offerings. Our challenges Circa 2020, our Cloud Engineering team (now evolved into multiple teams responsible for narrower aspects) was responsible for managing our…
16 Mar 2023
A conversation with engineers who help run Blinkit Chinthakunta Sumanth Kumar Reddy is an SDE 3 at Blinkit. He joined us in March 2021 and has since helped us build a resilient application platform at Blinkit. He currently works as a part of Software Resilience Engineering (SRE)–enabling scalable database migrations for Blinkit’s applications. Tell us about your background and your…
23 Feb 2023
A conversation with engineers who help run Blinkit Jay Dihenkar is a Staff Engineer at Blinkit. He joined us in December 2020 and has helped different teams manage and streamline their build and release processes. He is currently working towards continuously improving the reliability, scalability, observability, developer productivity, and other such aspects of a software system critical for ensuring that…
24 Oct 2022
This case study shows how we reformed a scale-up's dev processes after uncovering severe discrepancies between the official and real way of getting things done. The post Do your engineers do what you think they do? appeared first on RisingStack Engineering.
29 Aug 2022
A second update on our Gitea migration. It's short in text, but contains a set of 10 videos recorded on the 10th of August.
12 Jul 2022
First installment of a series about moving Blender’s development from Phabricator to Gitea.
20 Dec 2021
— Building Felix, the Design System for Groupon Several new features have been released on Groupon.com recently, such as the QR code in the navigation bar to download the app, and a banner carousel to display multiple banner messages within a single view. In the past, similar product features might take 2–3 sprints to complete, but now all of these…
20 Oct 2021
About a year ago, I wrote a blog post called Building the Next Evolution of Cloud Networks at Slack. In it, we discussed how Slack’s AWS infrastructure has evolved over the years and the pain points that drove us to spin up a brand-new network architecture redesign project called Whitecastle. If you have not had…
19 Oct 2021
10 Aug 2021
Why multi-region sessions? Each year leading up to Back to School (our busiest season), Clever’s engineering team invests in our highest traffic systems to make sure we can handle user growth and new traffic patterns. During 2020–2021, SAML auth at Clever grew from <10% of our login related traffic to about 40% of our traffic! For this […] The post…
7 Jul 2021
The main drawback of a Ceph storage is that you have to host and manage it yourself. In this post, we'll check two different approaches of deploying Ceph. The post How to Deploy a Ceph Storage to Bare Virtual Machines appeared first on RisingStack Engineering.
25 Jun 2021
This Ansible tutorial teaches the basics of this open-source software provisioning, configuration management, and application-deployment tool. The post Getting Started with Ansible Tutorial – Automate your Infrastructure appeared first on RisingStack Engineering.
2 Jun 2021
A new package called runtime-env-cra allows you to handle environment variables in quick and easy way with create-react-apps. The post Handling runtime environment variables in create-react-apps appeared first on RisingStack Engineering.
7 Jul 2020
Learn how to distribute and run Jmeter tests along multiple droplets on DigitalOcean using Terraform, Ansible, and bash scripting - to automate the process. The post Distributed Load Testing with Jmeter appeared first on RisingStack Engineering.
20 Jan 2020
In this tutorial we show how you can generate a static site with Hugo and Netlify in an easy and fast way. The post Generating a Static Site with Hugo + Netlify in 15 minutes appeared first on RisingStack Engineering.
24 Aug 2019
Shortly after I had started the work on nerdhood.de I built a deployment pipeline. The bash-based build script for my Laravel application was easy but triggering the deployment itself turned out to be more difficult than expected. In the end I built something with two AWS Lambda function, SNS, an […] The post Deploying with SSH using GitHub Actions appeared…
9 Apr 2019
We are currently in the process of migrating our alerting infrastructure from OMD to Atlassian’s OpsGenie. Most of the features (SMS, phone call etc.) worked out of the box but we struggled with pushing alerts back into our on-premises Jira instance. Enable logging of POST requests OpsGenie does not provide […] The post Using Atlassian OpsGenie with a localized on-premises…
12 Mar 2019
In this post we show how we solved a DNS resolution issue for a client. The tools & methods we used can be useful in case you face a similar issue later. The post Case Study: Nameserver Issue Investigation using curl, dig+trace & nslookup appeared first on RisingStack Engineering.
24 Jan 2019
I’m a fan of Travis CI and use it for continuous integration across pretty much all my open-source projects on GitHub. From time to time, I need to obtain a URL to a file in the repository in my build, e.g. to point a particular tool to it, in a way that respects branches as well as pull requests...
21 Jan 2019
Receiving “com.amazonaws.services.s3.model.AmazonS3Exception: Not Found” when using Jenkins’ pipeline-aws-plugin and s3Upload step with Minio
SchakkoI am currently working on a Jenkins declarative pipeline to connect the Jenkins builds with Kubernetes, Helm and Netflix Spinnaker. One of TODOs has been to deploy different artifacts (e.g. a helm chart my-chart-0.0.1.tar.gz) to an AWS S3-compatible bucket inside a Minio installation with help of pipeline-aws-plugin. When running withAWS(endpointUrl: […] The post Receiving “com.amazonaws.services.s3.model.AmazonS3Exception: Not Found” when using Jenkins’…
18 Jan 2019
Are you working on an agile team? Odds are high that you probably are. Whether you do Scrum/Kanban/lean/extreme, you are all about getting work done with the least resistance possible. Heck, if you are still on Waterfall, you care about that. But how well are you doing? Do you know? Is that something a developer […]
16 Oct 2018
This post highlights some git features that might be less used/known, but can end up saving you when things go south in the codebase. The post Git Catastrophes and Tips to Avoid Them appeared first on RisingStack Engineering.
5 Sept 2018
In this post, we'll take a look at how to prepare, how you should drive the co-operation and what kind of services you can expect from IT outsourcing companies. The post Advice for Working with Professional Services Companies ( IT Outsourcing ) appeared first on RisingStack Engineering.
14 Aug 2018
In this post, I’m going to teach how you can debug a Node.js app in a Docker container to catch bugs that cannot be revealed in any other way. The post How to Debug a Node.js app in a Docker Container appeared first on RisingStack Engineering.
10 Apr 2018
The post DevOps 101 (not just) from a Node.js Perspective appeared first on RisingStack Engineering.
16 Mar 2018
On Tuesday, Wednesday, and Thursday, March 6th-8th, 2018, Clever logins failed for all customers: 1h on Tuesday, 1h15 on Wednesday, and almost 5h on Thursday. This was Clever’s single worst outage ever in length, repeatedness, and impact. This postmortem is the first of many public steps we’ll be taking to ensure Clever is a service […] The post Postmortem on…
13 Mar 2018
How to get started with Angular? What are the core libraries? Read our tips & tricks to kickstart your Angular projects, and become a front-end ninja! The post AngularJS to Angular – a brief history with some tips to get started! appeared first on RisingStack Engineering.
7 Mar 2018
The post Integrating legacy and CQRS appeared first on RisingStack Engineering.
28 Feb 2018
The post When should you use CQRS? appeared first on RisingStack Engineering.
14 Feb 2018
The post Event sourcing vs CRUD appeared first on RisingStack Engineering.
18 Dec 2017
Let's see how to use pattern matching and query params with Pact for advanced contract testing in a Node.js microservices architecture. The post Advanced Contract Testing – Pact Verification with Pattern Matching appeared first on RisingStack Engineering.
12 Dec 2017
In this article I’ll walk you through how we perform consumer driven contract testing in our Node.js microservices architecture with the Pact framework. The post Consumer Driven Contract Testing with Pact appeared first on RisingStack Engineering.
11 Dec 2017
tl;dr: Try out microplane! It’s a CLI tool to make changes across many repos. The Problem At Clever, we’ve embraced microservices. They promote modularity, which leads to simpler code bases and lets our engineers move quickly and independently. They are easier to deploy, which helps us build towards incremental, frequent deploys and continuous delivery. In […] The post Mo Repos,…
22 Nov 2017
In this Rust tutorial, I’m going to walk you through the steps of writing a modern, fast and safe native Node.js module. The post Writing fast and safe native Node.js modules with Rust appeared first on RisingStack Engineering.
19 Nov 2017
During my career as a software developer, I have seen the release frequency increasing steadily. When I started, it would take 12 to 18 months for new features to reach the customer. Years later the frequency increased, so deployment to … Continue reading →
11 Oct 2017
Monitoring gives us observability in microservices systems. In this article we theorize what kind of monitoring & instrumentation we'll need in 2018. The post The Future of Microservices Monitoring & Instrumentation appeared first on RisingStack Engineering.
13 Sept 2017
This post explains timings in an HTTP request and shows how to measure them in Node.js to discover performance bottlenecks in client/server to server comms. The post Understanding & Measuring HTTP Timings with Node.js appeared first on RisingStack Engineering.
14 Jul 2017
21 Apr 2017
23 Jan 2017
At Bazaarvoice, we’re big fans of cloud. Real big. We’re also fans of DevOps. There’s been a lot of discussion over the past several years about “What is DevOps?” Usually, this term is used to describe Systems Engineers and Site Reliability Engineers (sometimes called Infrastructure Engineers, Systems Engineers, Operations Engineers or, in the most unfortunate […]
7 Feb 2016
You may have already heard that InfluxDB 0.10 GA has been published a few days ago. In my case the most interesting improvement are the much higher compression rates: At the moment my co-workers of NeosIT and I are collecting performance data from four internal virtual machines. Have been running […] The post Migrating InfluxDB from 0.9.6 to 0.10.0 GA…
13 Oct 2015
My employer NeosIT offers a web based SMS notifiyng solution for organizations with security roles named ZABOS. In the last months we extended the ZABOS application to support digital alerting through POCSAG. After some problems with a third party component we implemented the ability to collect all POCSAG telegrams delivered […] The post Collecting and visualizing metrics with statsd, InfluxDB…
8 Apr 2015
Sometimes it’s obvious what code has to change, but it’s painfully hard to prove you’ve fixed it. When’s the last time a conceptually simple fix took you hours longer to than planned, because you could not get the project running locally to verify your change worked? I just want to change a little CSS on […] The post Aviator: locally…
23 Jan 2014
This post is a rant about a word. A rant about a word that had a clear meaning but has been appropriated for something wholly less meaningful. The word is of course Devops. Over the last few years, as the practice itself has grown in prominence, its description has become diluted beyond the point of recovery. […]
28 Jul 2013
You know it: an old project gets reactivated because your customer needs a new feature or has found bug. Meanwhile, the responsible developers have been reassigned to new projects and have no time to finish the task. You are currently not at a 100 percent workload so you get the […] The post DevOps: Stop coding and get your stuff…