DEVOPS
55.3K views | +2 today
Follow
DEVOPS
DEVOPS, agilité, tests, déploiement, sécurité
Curated by Mickael Ruau
Your new post is loading...
Your new post is loading...
Scooped by Mickael Ruau
Scoop.it!

Definition of Ready — Dangerous or Necessary? – Serious Scrum –

Is this a divisive topic? 3 Serious Scrum Editors compare perspectives.

Mickael Ruau's insight:

 The risk associated with an assumption of simplicity is a core principle of Dave Snowden’s Cynefin sense-making framework, and I encourage you to delve into that topic if you have not done so already.

 

(...)

With an immature team, this DoR will give them the feeling they have all the requirements for this PBI. The team will enter in a tunnel during the Sprint. They won’t interact anymore with the PO. And they will receive the only feedback during the Sprint Review.

“Guys, it’s not at all what I expected!” will say the PO and the developers will just answer: “But we follow the acceptance criteria!”

Sounds familiar? Many times the DoR doesn’t encourage the collaboration between the developers and the PO during the Sprint. It only encourages it to set the PBI as “Ready”. To define the requirements. It totally gives me the feeling we working on mini Waterfall projects of two weeks.

more...
No comment yet.
Scooped by Mickael Ruau
Scoop.it!

Antifragile : Comment prospérer dans un monde de chaos et d’incertitude

L’antifragile profite de l’imprévu

Les choses qui sont antifragiles se développent et se renforcent de la volatilité et du stress jusqu’à un certain point. Les systèmes antifragiles se nourrissent dans le désordre et l’incertitude pour devenir meilleurs.

Taleb relie l’antifragilité à l’Hydre de Lerne dans la mythologie grecque. L’Hydre était un monstre mythique avec plusieurs têtes. Si un héros découpe une tête de l’hydre, deux repoussent à la place. L’Hydre devenait plus forte avec l’adversité.

Alors qu’est-ce qui rend quelque chose antifragile ?

Pour devenir antifragile, mieux vaut être petit et agile pour maintenir une flexibilité durant des temps chaotiques et volatiles. Si je navigue dans une mer brumeuse avec des icebergs cachés, je préférerais être un passager dans une toute petite embarcation manœuvrable plutôt que d’être un paquebot géant et léthargique.

La réponse à la variabilité au stress est bâtie intrinsèquement dans le système antifragile. Contrairement aux choses fragiles qui requièrent une réponse extérieure pour les protéger du stress, les systèmes antifragiles peuvent se protéger eux-mêmes. Notre système squelettique ou le processus évolutif est un bon exemple de réponse à la variabilité.

Les choses antifragiles intègrent les redondances. Pour l’antifragile, prospérer dans le hasard est l’objectif, ce qui requiert bien souvent de ne pas être efficient à travers des couches de redondance.
more...
No comment yet.
Scooped by Mickael Ruau
Scoop.it!

The Dangers of a Definition of Ready

The Dangers of a Definition of Ready | DEVOPS | Scoop.it
Although not as popular as a Definition of Done, some Scrum teams use a Definition of Ready to control what product backlog items can enter an iteration.
Mickael Ruau's insight:

A Definition of Ready Is Not Always a Good Idea

So some of the rules our bouncer establishes seem like good ideas. For example, I have no objection against a team deciding not to bring into an iteration stories that are over a certain size.

But some other rules I commonly see on a Definition of Ready can cause trouble—big trouble—for a team. I’ll explain.

A Definition of Ready can be thought of like a gate into the iteration. A set of rules is established and our bouncer ensures that only stories that meet those rules are allowed in.

If these rules include saying that something must be 100 percent finished before a story can be brought into an iteration, the Definition of Ready becomes a huge step towards a sequential, stage-gate approach. This will prevent the team from being agile.

more...
No comment yet.
Scooped by Mickael Ruau
Scoop.it!

Collaboration entre testeur et développeur au sein d’une équipe agile utilisant une chaine d’intégration continue –

Collaboration entre testeur et développeur au sein d’une équipe agile utilisant une chaine d’intégration continue – | DEVOPS | Scoop.it

Cet article a été écrit pour et publié initialement dans le magazine Programmez! d'avril 2019

more...
No comment yet.
Scooped by Mickael Ruau
Scoop.it!

Why Do I Need TOSCA If I’m Using Kubernetes? Part II of II

In the first part of this series, titled ”Why Do I Need * If I’m Using Kubernetes?”, I discussed why I believe the general sentiment that everything should move to Kubernetes is shortsighted and misguided.

more...
No comment yet.
Scooped by Mickael Ruau
Scoop.it!

Postmortem Culture: Learning from Failure

Postmortem Culture: Learning from Failure | DEVOPS | Scoop.it
Reward Postmortem Outcomes

When well written, acted upon, and widely shared, postmortems are an effective vehicle for driving positive organizational change and preventing repeat outages. Consider the following strategies to incentivize postmortem culture.

Reward action item closeout

If you reward engineers for writing postmortems, but not for closing the associated action items, you risk an unvirtuous cycle of unclosed postmortems. Ensure that incentives are balanced between writing the postmortem and successfully implementing its action plan.

Reward positive organizational change

You can incentivize widespread implementation of postmortem lessons by presenting postmortems as an opportunity to expand impact across an organization. Reward this level of impact with peer bonuses, positive performance reviews, promotion, and the like.

Highlight improved reliability

Over time, an effective postmortem culture leads to fewer outages and more reliable systems. As a result, teams can focus on feature velocity instead of infrastructure patching. It’s intrinsically motivating to highlight these improvements in reports, presentations, and performance reviews.

Hold up postmortem owners as leaders

Celebrating postmortems through emails or meetings, or by giving the authors an opportunity to present lessons learned to an audience, can appeal to individuals that appreciate public accolades. Setting up the owner as an “expert” on a type of failure and its avoidance can be rewarding for many engineers who seek peer acknowledgment. For example, you might hear someone say, “Talk to Sara, she’s an expert now. She just coauthored a postmortem where she figured out how to fix that gap!”

Gamification

Some individuals are incentivized by a sense of accomplishment and progress toward a larger goal, such as fixing system weaknesses and increasing reliability. For these individuals, a scoreboard or burndown of postmortem action items can be an incentive. At Google, we hold “FixIt” weeks twice a year. SREs who close the most postmortem action items receive small tokens of appreciation and (of course) bragging rights. Figure 10-3 shows an example of a postmortem leaderboard.

Figure 10-3. Postmortem leaderboard
Mickael Ruau's insight:
Why Is This Postmortem Better?

This postmortem exemplifies several good writing practices.

Clarity

The postmortem is well organized and explains key terms in sufficient detail. For example:

Glossary

  • A well-written glossary makes the postmortem accessible and comprehensible to a broad audience.

Action items

  • This was a large incident with many action items. Grouping action items by theme makes it easier to assign owners and priorities.

Quantifiable metrics

  • The postmortem presents useful data on the incident, such as cache hit ratios, traffic levels, and duration of the impact. Relevant sections of the data are presented with links back to the original sources. This data transparency removes ambiguity and provides context for the reader.
Concrete action items

A postmortem with no action items is ineffective. These action items have a few notable characteristics:

Ownership

  • All action items have both an owner and a tracking number.

Prioritization

  • All action items are assigned a priority level.

Measurability

  • The action items have a verifiable end state (e.g., “Add an alert when more than X% of our machines have been taken away from us”).

Preventative action

  • Each action item “theme” has Prevent/Mitigate action items that help avoid outage recurrence (for example, “Disallow any single operation from affecting servers spanning namespace/class boundaries”).
Blamelessness

The authors focused on the gaps in system design that permitted undesirable failure modes. For example:

Things that went poorly

  • No individual or team is blamed for the incident.

Root cause and trigger

  • Focuses on “what” went wrong, not “who” caused the incident.

Action items

  • Are aimed at improving the system instead of improving people.
Depth

Rather than only investigating the proximate area of the system failure, the postmortem explores the impact and system flaws across multiple teams. Specifically:

Impact

  • This section contains lots of details from various perspectives, making it balanced and objective.

Root cause and trigger

  • This section performs a deep dive on the incident and arrives at a root cause and trigger.

Data-driven conclusions

  • All of the conclusions presented are based on facts and data. Any data used to arrive at a conclusion is linked from the document.

Additional resources

  • These present further useful information in the form of graphs. Graphs are explained to give context to readers who aren’t familiar with the system.
Promptness

The postmortem was written and circulated less than a week after the incident was closed. A prompt postmortem tends to be more accurate because information is fresh in the contributors’ minds. The people who were affected by the outage are waiting for an explanation and some demonstration that you have things under control. The longer you wait, the more they will fill the gap with the products of their imagination. That seldom works in your favor!

Conciseness

The incident was a global one, impacting multiple systems. As a result, the postmortem recorded and subsequently parsed a lot of data. Lengthy data sources, such as chat transcripts and system logs, were abstracted, with the unedited versions linked from the main document. Overall, the postmortem strikes a balance between verbosity and readability.

more...
No comment yet.
Scooped by Mickael Ruau
Scoop.it!

Innovation frugale, concevoir mieux avec moins !

Innovation frugale, concevoir mieux avec moins ! | DEVOPS | Scoop.it
La réflexion sur les nouveaux usages et l’esprit de l’innovation frugale ou encore du “Life Hacking” nous invitent à considérer toute chose comme une “coquille éphémère”, ce que l’on peut en faire ! Générer des alternatives et opérer dans le décalage. Il faut s’extraire

d’un système binaire :

Haut de gamme pour les Happy Few,
Bas de gamme pour les moins privilégiés.

Envisager le progrès différemment.
more...
No comment yet.
Scooped by Mickael Ruau
Scoop.it!

How Uber, Airbnb, and Etsy Attracted Their First 1,000 Customers - HBS Working Knowledge - Harvard Business School

How Uber, Airbnb, and Etsy Attracted Their First 1,000 Customers - HBS Working Knowledge - Harvard Business School | DEVOPS | Scoop.it
Thales Teixeira studies three of the most successful “platform” startups to understand the chicken-and-egg challenge of how companies can attract their first customers.
more...
No comment yet.
Scooped by Mickael Ruau
Scoop.it!

SLO, SLA, SLI Oh My! Creating them can be easy

Service level objectives and agreements are a great way to create accountability in all layers of your organization.
more...
No comment yet.
Scooped by Mickael Ruau
Scoop.it!

On Infrastructure at Scale: A Cascading Failure of Distributed Systems

At Target, we run a heterogeneous infrastructure in our datacenters (and many other places), where we have multiple different backend hosting infrastructure for workloads. Most of this is a legacy…
more...
No comment yet.
Scooped by Mickael Ruau
Scoop.it!

OpenStack Collaboration made in heaven with Heat, Mistral, Neutron an…

Cross-project collaboration is something OpenStack community has embraced for a long time. Common libraries like Oslo reduces the time and effort to build a new service. Another way this manifests is in new OpenStack services getting built using existing services to solve an higher level use-case.

In this talk we are present how the band of projects comprising of Mistral, Tacker, Neutron, Heat, TOSCA-parser and Barbican came together to build an industry leading ETSI NFV Orchestrator that leveraged the best of these projects. Each of these projects brought in critical functionalities needed towards the final product. You will learn how, when strung together, this solution follows the classic Microservices design pattern that the industry is rapidly adopting.

more...
No comment yet.
Scooped by Mickael Ruau
Scoop.it!

Network function virtualization - Wikipedia

Network function virtualization - Wikipedia

Network functions virtualization (also network function virtualization or NFV) is a network architecture concept that uses the technologies of IT virtualization to virtualize entire classes of network node functions into building blocks that may connect, or chain together, to create communication services.

more...
No comment yet.
Scooped by Mickael Ruau
Scoop.it!

Dying well: How to conduct an effective post mortem (failure analysis)

The process of analysing a failure is called a “post mortem”, “retrospective”, “after action report”, “failure analysis” or a host of other names. We’ll stick to “post mortem” for this post as it’s perhaps the most common.

Complex Systems

To understand why software fails, it’s perhaps first stepping back and examining what we mean by “software”.

A software service provides some functionality to its users that allow its users to do something valuable. For example, a word processor may help users form, view, render and print a document. Or a website might allow the purchase of a nice shirt.

Each of these components of software is essentially unimaginably complex.

Mickael Ruau's insight:

 

The nature of failure in complex systems

Any system of sufficient complexity is in a constant state of partial failure (degradation). Generally speaking, in both the human and technical expression of the system there’s a level of “tolerated failure”. This might include things like:

  • Technical: A single replica in a multi-node web service stops responding properly and needs to be restarted
  • Technical: Network congestion at an intermediary increases ~5% of application response times
  • Technical: In periods of high traffic, the database might reject connections it knows it cannot tolerate, leading to user errors
  • Human: The business owner may choose to take on technical debt to push a feature quickly to market
  • Human: A developer may choose to build in addressing technical debt in feature work

However, occasionally multiple failure chain together and create some sort of catastrophe or “incident”. These failures almost never have a singular, root cause but are rather as a result of conflicting pressures or discrepancies in the models of software.

The process of responding to these catastrophes is called “incident response”.

more...
No comment yet.
Scooped by Mickael Ruau
Scoop.it!

[1404.3056] Principles of Antifragile Software

The goal of this paper is to study and define the concept of "antifragile
software". For this, I start from Taleb's statement that antifragile systems
love errors, and discuss whether traditional software dependability fits into
this class. The answer is somewhat negative, although adaptive fault tolerance
is antifragile: the system learns something when an error happens, and always
imrpoves. Automatic runtime bug fixing is changing the code in response to
errors, fault injection in production means injecting errors in business
critical software. I claim that both correspond to antifragility. Finally, I
hypothesize that antifragile development processes are better at producing
antifragile software systems.
more...
No comment yet.
Scooped by Mickael Ruau
Scoop.it!

From Scrum Master to VP of Engineering: why job titles matter

by Marco Massenzio

Over the past 20 years I’ve been a Senior Engineering Manager at Google. I’ve been a VP of Engineering at a couple of startups and Director of Engineering at a couple more. And I’m now a Senior Architect at Apple.

As a consequence, in the last ten years I have directly hired more than 100 people in various technical functions, at various levels of seniority, for teams of all sizes.

Additionally, I’ve had decisive influence over whether to take senior executives on board (or not).

And I can tell you one thing for sure: your job title matters. A lot.

more...
No comment yet.
Scooped by Mickael Ruau
Scoop.it!

SAFe, le Release Train Engineer, clé de voute du train agile

SAFe, le Release Train Engineer, clé de voute du train agile | DEVOPS | Scoop.it
Le ReleaseTrain Engineer (RTE) est la clé de boite d'un train agile SAFe, découvrez les traits qui font un bon RTE.
Mickael Ruau's insight:

Le RTE, Scrum Master des Scrum Masters

Pour les adeptes de Scrum, le premier rôle visible et simple à appréhender du RTE est celui de Scrum master des Scrum Masters. Le RTE organise deux fois par semaines des ART Sync (voir l’article sur les ART Sync). Equivalents de Scrum Of Scrum avec les Scrum Masters et les Product Owners, cet événement a pour objectif d’identifier tous éléments pouvant bloquer tout ou partie du train agile et si nécessaire de définir les actions de contournement.

Le RTE veille à la montée en compétence des Scrum Masters des différentes équipes et à leur bonne appropriation des différents événements qu’ils doivent faciliter en commun.

Le RTE facilitateur des principaux événements de niveau programme

Le RTE s’appuie en effet sur les Scrums Master pour faciliter l’événement principal de SAFe, à savoir le PI planning. Véritable grand-messe qui réunit l’intégralité des équipes agiles et des donneurs d’ordre du programme, cet événement réuni des dizaines de personnes qui doivent collaborer de manière efficace durant les deux jours pour débroussailler le travail sur plusieurs Sprints. Le RTE et les Scrums Master s’assurent de leur alignement durant les deux sessions de team breakout. Pour cela ils doivent se coordonner de manière étroite et rapide durant les deux jours. Faciliter un PI Planning revient à surveiller une douzaine de casseroles de lait sur le feu proches de l’ébullition… cela demande une vraie capacité à jongler et un calme olympien.

Pour préparer cet événement le RTE doit s’assurer de la qualité des Features qui lui sont fournis par les Product Managers et System Architects, de la logistique de la salle, des fournitures, de la présence des parties prenantes, de l’agenda des différentes interventions… une telle mobilisation de moyens humains ne peut être freinée par un détail de logistique.
Préparer un PI Planning nécessite une vraie anticipation !

Hormis cet événement majeur, le RTE facilite l’ART Sync, l’Inspect and Adapt, les Systems Demos et potentiellement aussi le Release Management.

more...
No comment yet.
Scooped by Mickael Ruau
Scoop.it!

Livre CFTL – L’organisation des tests en Agile –

Livre CFTL – L’organisation des tests en Agile – | DEVOPS | Scoop.it

Cet article a été écrit et est paru dans le livre du CFTL "Les tests en agile". La qualité de la mise en plage est par conséquent nettement meilleur sur le livre. Néanmoins, le contenu reste le même

Mickael Ruau's insight:

Voici un schéma permettant d’appréhender la différence d’échelle entre les mises en service
avec le cycle en V et avec des méthodes agiles :

 

more...
No comment yet.
Scooped by Mickael Ruau
Scoop.it!

Building Fabric Infrastructure for an OpenStack Private Cloud « ipSpace.net blog

Building Fabric Infrastructure for an OpenStack Private Cloud « ipSpace.net blog | DEVOPS | Scoop.it
An attendee in my Building Next-Generation Data Center online course was asked to deploy numerous relatively small OpenStack clou
more...
No comment yet.
Scooped by Mickael Ruau
Scoop.it!

Example project – Customer Development: Co-founders wanted (part 5) | Ywan van Loon

Example project – Customer Development: Co-founders wanted (part 5) | Ywan van Loon | DEVOPS | Scoop.it

This article is part of a series of articles, as an example how to use Customer Development in practice. If you didn’t read the introduction, please read it first.

Now we start with to State your business model Hypotheses as the first part of Customer Discovery. We use the checklists to fill. The checklists are all based on Customer Development from Steve Blank. In this part we describe checklists number 9, 10, 11 and 12. In part 3 checklists number 3, 4 and 5 are described and in part 4 the checklists number 4 6, 7 and 8

more...
No comment yet.
Scooped by Mickael Ruau
Scoop.it!

The Lean Startup Circle Wiki / Validation Tools

The Lean Startup Circle Wiki / Validation Tools | DEVOPS | Scoop.it

Due To the Frequency of "Why Is No One Leaving Their E-Mail on My Landing Page" Question: Always Start With Conversations with Prospects

 

Blog posts on Customer/Product Validation (see also Customer Interview Template and Resources )

more...
No comment yet.
Scooped by Mickael Ruau
Scoop.it!

An Engineer’s Guide To SLA, SLO, and SLI.

An Engineer’s Guide To SLA, SLO, and SLI. | DEVOPS | Scoop.it
Engineers want software systems to be massive, yet be agile, to perform at the highest class, and to not compromise on security. They want software with the ability to scale, be simple in design…
Mickael Ruau's insight:

In summary,
1. SLIs are ways for engineers to communicate quantitative data about systems.
2. SLOs are designed to provide a certain level of service, defined using SLIs.
3. SLAs are exchanged on the basis of understanding the SLOs which teams adopt.
4. If user behaviour is not included in these definitions, they remain deficient.

more...
No comment yet.
Scooped by Mickael Ruau
Scoop.it!

SLA vs KPI: Service Level Agreements vs. Key Performance Indicators

What is the difference between Service Level Agreement measurements and Key Performance Indicators? Well, although sometimes they are referred to as synonyms, there are a few differences.

Mickael Ruau's insight:

Let us take a few examples that outline the differences. Consider a help desk service.

  • SLA examples (for a particular customer): reaction time, resolution time, compliance to agreed deadlines
  • KPI examples (organization or service oriented): average reaction time for all customers, service desk employee load, incoming ticket volume trend, required capacity to fulfil SLA promises to customers

To sum it up – SLAs are about minimal, expected and agreed quality of a service provided to a customer; however KPIs are about desired operation efficiency and organization goals. It is important to measure both service level compliance and key performance indicators in order to keep promises and excel service quality.

more...
No comment yet.
Scooped by Mickael Ruau
Scoop.it!

The Site Reliability Engineering Journey

The Site Reliability Engineering Journey | DEVOPS | Scoop.it
Tayllan's adventures through the worlds of Web Development, Software Architecture, DevOps and SRE!
more...
No comment yet.
Scooped by Mickael Ruau
Scoop.it!

What is a Virtual Network Function or VNF?

What is a Virtual Network Function or VNF? | DEVOPS | Scoop.it
Virtual network function or VNF, often used interchangeably w/ network functions virtualization (NFV), offers a new way to design & manage network services.
more...
No comment yet.
Scooped by Mickael Ruau
Scoop.it!

Network Virtualization - ScienceDirect

Multiple tenants (customers) are hosted in some of the large cloud data center environments. Not only are these tenants provided dedicated virtual mac…
more...
No comment yet.