DEVOPS
55.3K views | +8 today
Follow
DEVOPS
DEVOPS, agilité, tests, déploiement, sécurité
Curated by Mickael Ruau
Your new post is loading...
Your new post is loading...
Scooped by Mickael Ruau
Scoop.it!

The Dangers of a Definition of Ready

The Dangers of a Definition of Ready | DEVOPS | Scoop.it
Although not as popular as a Definition of Done, some Scrum teams use a Definition of Ready to control what product backlog items can enter an iteration.
Mickael Ruau's insight:

A Definition of Ready Is Not Always a Good Idea

So some of the rules our bouncer establishes seem like good ideas. For example, I have no objection against a team deciding not to bring into an iteration stories that are over a certain size.

But some other rules I commonly see on a Definition of Ready can cause trouble—big trouble—for a team. I’ll explain.

A Definition of Ready can be thought of like a gate into the iteration. A set of rules is established and our bouncer ensures that only stories that meet those rules are allowed in.

If these rules include saying that something must be 100 percent finished before a story can be brought into an iteration, the Definition of Ready becomes a huge step towards a sequential, stage-gate approach. This will prevent the team from being agile.

more...
No comment yet.
Scooped by Mickael Ruau
Scoop.it!

Collaboration entre testeur et développeur au sein d’une équipe agile utilisant une chaine d’intégration continue –

Collaboration entre testeur et développeur au sein d’une équipe agile utilisant une chaine d’intégration continue – | DEVOPS | Scoop.it

Cet article a été écrit pour et publié initialement dans le magazine Programmez! d'avril 2019

more...
No comment yet.
Scooped by Mickael Ruau
Scoop.it!

Why Do I Need TOSCA If I’m Using Kubernetes? Part II of II

In the first part of this series, titled ”Why Do I Need * If I’m Using Kubernetes?”, I discussed why I believe the general sentiment that everything should move to Kubernetes is shortsighted and misguided.

more...
No comment yet.
Scooped by Mickael Ruau
Scoop.it!

Postmortem Culture: Learning from Failure

Postmortem Culture: Learning from Failure | DEVOPS | Scoop.it
Reward Postmortem Outcomes

When well written, acted upon, and widely shared, postmortems are an effective vehicle for driving positive organizational change and preventing repeat outages. Consider the following strategies to incentivize postmortem culture.

Reward action item closeout

If you reward engineers for writing postmortems, but not for closing the associated action items, you risk an unvirtuous cycle of unclosed postmortems. Ensure that incentives are balanced between writing the postmortem and successfully implementing its action plan.

Reward positive organizational change

You can incentivize widespread implementation of postmortem lessons by presenting postmortems as an opportunity to expand impact across an organization. Reward this level of impact with peer bonuses, positive performance reviews, promotion, and the like.

Highlight improved reliability

Over time, an effective postmortem culture leads to fewer outages and more reliable systems. As a result, teams can focus on feature velocity instead of infrastructure patching. It’s intrinsically motivating to highlight these improvements in reports, presentations, and performance reviews.

Hold up postmortem owners as leaders

Celebrating postmortems through emails or meetings, or by giving the authors an opportunity to present lessons learned to an audience, can appeal to individuals that appreciate public accolades. Setting up the owner as an “expert” on a type of failure and its avoidance can be rewarding for many engineers who seek peer acknowledgment. For example, you might hear someone say, “Talk to Sara, she’s an expert now. She just coauthored a postmortem where she figured out how to fix that gap!”

Gamification

Some individuals are incentivized by a sense of accomplishment and progress toward a larger goal, such as fixing system weaknesses and increasing reliability. For these individuals, a scoreboard or burndown of postmortem action items can be an incentive. At Google, we hold “FixIt” weeks twice a year. SREs who close the most postmortem action items receive small tokens of appreciation and (of course) bragging rights. Figure 10-3 shows an example of a postmortem leaderboard.

Figure 10-3. Postmortem leaderboard
Mickael Ruau's insight:
Why Is This Postmortem Better?

This postmortem exemplifies several good writing practices.

Clarity

The postmortem is well organized and explains key terms in sufficient detail. For example:

Glossary

  • A well-written glossary makes the postmortem accessible and comprehensible to a broad audience.

Action items

  • This was a large incident with many action items. Grouping action items by theme makes it easier to assign owners and priorities.

Quantifiable metrics

  • The postmortem presents useful data on the incident, such as cache hit ratios, traffic levels, and duration of the impact. Relevant sections of the data are presented with links back to the original sources. This data transparency removes ambiguity and provides context for the reader.
Concrete action items

A postmortem with no action items is ineffective. These action items have a few notable characteristics:

Ownership

  • All action items have both an owner and a tracking number.

Prioritization

  • All action items are assigned a priority level.

Measurability

  • The action items have a verifiable end state (e.g., “Add an alert when more than X% of our machines have been taken away from us”).

Preventative action

  • Each action item “theme” has Prevent/Mitigate action items that help avoid outage recurrence (for example, “Disallow any single operation from affecting servers spanning namespace/class boundaries”).
Blamelessness

The authors focused on the gaps in system design that permitted undesirable failure modes. For example:

Things that went poorly

  • No individual or team is blamed for the incident.

Root cause and trigger

  • Focuses on “what” went wrong, not “who” caused the incident.

Action items

  • Are aimed at improving the system instead of improving people.
Depth

Rather than only investigating the proximate area of the system failure, the postmortem explores the impact and system flaws across multiple teams. Specifically:

Impact

  • This section contains lots of details from various perspectives, making it balanced and objective.

Root cause and trigger

  • This section performs a deep dive on the incident and arrives at a root cause and trigger.

Data-driven conclusions

  • All of the conclusions presented are based on facts and data. Any data used to arrive at a conclusion is linked from the document.

Additional resources

  • These present further useful information in the form of graphs. Graphs are explained to give context to readers who aren’t familiar with the system.
Promptness

The postmortem was written and circulated less than a week after the incident was closed. A prompt postmortem tends to be more accurate because information is fresh in the contributors’ minds. The people who were affected by the outage are waiting for an explanation and some demonstration that you have things under control. The longer you wait, the more they will fill the gap with the products of their imagination. That seldom works in your favor!

Conciseness

The incident was a global one, impacting multiple systems. As a result, the postmortem recorded and subsequently parsed a lot of data. Lengthy data sources, such as chat transcripts and system logs, were abstracted, with the unedited versions linked from the main document. Overall, the postmortem strikes a balance between verbosity and readability.

more...
No comment yet.
Scooped by Mickael Ruau
Scoop.it!

Innovation frugale, concevoir mieux avec moins !

Innovation frugale, concevoir mieux avec moins ! | DEVOPS | Scoop.it
La réflexion sur les nouveaux usages et l’esprit de l’innovation frugale ou encore du “Life Hacking” nous invitent à considérer toute chose comme une “coquille éphémère”, ce que l’on peut en faire ! Générer des alternatives et opérer dans le décalage. Il faut s’extraire

d’un système binaire :

Haut de gamme pour les Happy Few,
Bas de gamme pour les moins privilégiés.

Envisager le progrès différemment.
more...
No comment yet.
Scooped by Mickael Ruau
Scoop.it!

How Uber, Airbnb, and Etsy Attracted Their First 1,000 Customers - HBS Working Knowledge - Harvard Business School

How Uber, Airbnb, and Etsy Attracted Their First 1,000 Customers - HBS Working Knowledge - Harvard Business School | DEVOPS | Scoop.it
Thales Teixeira studies three of the most successful “platform” startups to understand the chicken-and-egg challenge of how companies can attract their first customers.
more...
No comment yet.
Scooped by Mickael Ruau
Scoop.it!

SLO, SLA, SLI Oh My! Creating them can be easy

Service level objectives and agreements are a great way to create accountability in all layers of your organization.
more...
No comment yet.
Scooped by Mickael Ruau
Scoop.it!

On Infrastructure at Scale: A Cascading Failure of Distributed Systems

At Target, we run a heterogeneous infrastructure in our datacenters (and many other places), where we have multiple different backend hosting infrastructure for workloads. Most of this is a legacy…
more...
No comment yet.
Scooped by Mickael Ruau
Scoop.it!

OpenStack Collaboration made in heaven with Heat, Mistral, Neutron an…

Cross-project collaboration is something OpenStack community has embraced for a long time. Common libraries like Oslo reduces the time and effort to build a new service. Another way this manifests is in new OpenStack services getting built using existing services to solve an higher level use-case.

In this talk we are present how the band of projects comprising of Mistral, Tacker, Neutron, Heat, TOSCA-parser and Barbican came together to build an industry leading ETSI NFV Orchestrator that leveraged the best of these projects. Each of these projects brought in critical functionalities needed towards the final product. You will learn how, when strung together, this solution follows the classic Microservices design pattern that the industry is rapidly adopting.

more...
No comment yet.
Scooped by Mickael Ruau
Scoop.it!

Network function virtualization - Wikipedia

Network function virtualization - Wikipedia

Network functions virtualization (also network function virtualization or NFV) is a network architecture concept that uses the technologies of IT virtualization to virtualize entire classes of network node functions into building blocks that may connect, or chain together, to create communication services.

more...
No comment yet.
Scooped by Mickael Ruau
Scoop.it!

Architecture of NFV Platform for Orchestrating Cloud-based & vBranch …

Architecture of NFV Platform for Orchestrating Cloud-based & vBranch Managed Services
more...
No comment yet.
Scooped by Mickael Ruau
Scoop.it!

Solving reliability fears with service level objectives (Google)

Solving reliability fears with service level objectives (Google) | DEVOPS | Scoop.it
Solving reliability fears with service level objectives (Google)
more...
No comment yet.
Scooped by Mickael Ruau
Scoop.it!

From Scrum Master to VP of Engineering: why job titles matter

by Marco Massenzio

Over the past 20 years I’ve been a Senior Engineering Manager at Google. I’ve been a VP of Engineering at a couple of startups and Director of Engineering at a couple more. And I’m now a Senior Architect at Apple.

As a consequence, in the last ten years I have directly hired more than 100 people in various technical functions, at various levels of seniority, for teams of all sizes.

Additionally, I’ve had decisive influence over whether to take senior executives on board (or not).

And I can tell you one thing for sure: your job title matters. A lot.

more...
No comment yet.
Scooped by Mickael Ruau
Scoop.it!

SAFe, le Release Train Engineer, clé de voute du train agile

SAFe, le Release Train Engineer, clé de voute du train agile | DEVOPS | Scoop.it
Le ReleaseTrain Engineer (RTE) est la clé de boite d'un train agile SAFe, découvrez les traits qui font un bon RTE.
Mickael Ruau's insight:

Le RTE, Scrum Master des Scrum Masters

Pour les adeptes de Scrum, le premier rôle visible et simple à appréhender du RTE est celui de Scrum master des Scrum Masters. Le RTE organise deux fois par semaines des ART Sync (voir l’article sur les ART Sync). Equivalents de Scrum Of Scrum avec les Scrum Masters et les Product Owners, cet événement a pour objectif d’identifier tous éléments pouvant bloquer tout ou partie du train agile et si nécessaire de définir les actions de contournement.

Le RTE veille à la montée en compétence des Scrum Masters des différentes équipes et à leur bonne appropriation des différents événements qu’ils doivent faciliter en commun.

Le RTE facilitateur des principaux événements de niveau programme

Le RTE s’appuie en effet sur les Scrums Master pour faciliter l’événement principal de SAFe, à savoir le PI planning. Véritable grand-messe qui réunit l’intégralité des équipes agiles et des donneurs d’ordre du programme, cet événement réuni des dizaines de personnes qui doivent collaborer de manière efficace durant les deux jours pour débroussailler le travail sur plusieurs Sprints. Le RTE et les Scrums Master s’assurent de leur alignement durant les deux sessions de team breakout. Pour cela ils doivent se coordonner de manière étroite et rapide durant les deux jours. Faciliter un PI Planning revient à surveiller une douzaine de casseroles de lait sur le feu proches de l’ébullition… cela demande une vraie capacité à jongler et un calme olympien.

Pour préparer cet événement le RTE doit s’assurer de la qualité des Features qui lui sont fournis par les Product Managers et System Architects, de la logistique de la salle, des fournitures, de la présence des parties prenantes, de l’agenda des différentes interventions… une telle mobilisation de moyens humains ne peut être freinée par un détail de logistique.
Préparer un PI Planning nécessite une vraie anticipation !

Hormis cet événement majeur, le RTE facilite l’ART Sync, l’Inspect and Adapt, les Systems Demos et potentiellement aussi le Release Management.

more...
No comment yet.
Scooped by Mickael Ruau
Scoop.it!

Livre CFTL – L’organisation des tests en Agile –

Livre CFTL – L’organisation des tests en Agile – | DEVOPS | Scoop.it

Cet article a été écrit et est paru dans le livre du CFTL "Les tests en agile". La qualité de la mise en plage est par conséquent nettement meilleur sur le livre. Néanmoins, le contenu reste le même

Mickael Ruau's insight:

Voici un schéma permettant d’appréhender la différence d’échelle entre les mises en service
avec le cycle en V et avec des méthodes agiles :

 

more...
No comment yet.
Scooped by Mickael Ruau
Scoop.it!

Building Fabric Infrastructure for an OpenStack Private Cloud « ipSpace.net blog

Building Fabric Infrastructure for an OpenStack Private Cloud « ipSpace.net blog | DEVOPS | Scoop.it
An attendee in my Building Next-Generation Data Center online course was asked to deploy numerous relatively small OpenStack clou
more...
No comment yet.
Scooped by Mickael Ruau
Scoop.it!

Example project – Customer Development: Co-founders wanted (part 5) | Ywan van Loon

Example project – Customer Development: Co-founders wanted (part 5) | Ywan van Loon | DEVOPS | Scoop.it

This article is part of a series of articles, as an example how to use Customer Development in practice. If you didn’t read the introduction, please read it first.

Now we start with to State your business model Hypotheses as the first part of Customer Discovery. We use the checklists to fill. The checklists are all based on Customer Development from Steve Blank. In this part we describe checklists number 9, 10, 11 and 12. In part 3 checklists number 3, 4 and 5 are described and in part 4 the checklists number 4 6, 7 and 8

more...
No comment yet.
Scooped by Mickael Ruau
Scoop.it!

The Lean Startup Circle Wiki / Validation Tools

The Lean Startup Circle Wiki / Validation Tools | DEVOPS | Scoop.it

Due To the Frequency of "Why Is No One Leaving Their E-Mail on My Landing Page" Question: Always Start With Conversations with Prospects

 

Blog posts on Customer/Product Validation (see also Customer Interview Template and Resources )

more...
No comment yet.
Scooped by Mickael Ruau
Scoop.it!

An Engineer’s Guide To SLA, SLO, and SLI.

An Engineer’s Guide To SLA, SLO, and SLI. | DEVOPS | Scoop.it
Engineers want software systems to be massive, yet be agile, to perform at the highest class, and to not compromise on security. They want software with the ability to scale, be simple in design…
Mickael Ruau's insight:

In summary,
1. SLIs are ways for engineers to communicate quantitative data about systems.
2. SLOs are designed to provide a certain level of service, defined using SLIs.
3. SLAs are exchanged on the basis of understanding the SLOs which teams adopt.
4. If user behaviour is not included in these definitions, they remain deficient.

more...
No comment yet.
Scooped by Mickael Ruau
Scoop.it!

SLA vs KPI: Service Level Agreements vs. Key Performance Indicators

What is the difference between Service Level Agreement measurements and Key Performance Indicators? Well, although sometimes they are referred to as synonyms, there are a few differences.

Mickael Ruau's insight:

Let us take a few examples that outline the differences. Consider a help desk service.

  • SLA examples (for a particular customer): reaction time, resolution time, compliance to agreed deadlines
  • KPI examples (organization or service oriented): average reaction time for all customers, service desk employee load, incoming ticket volume trend, required capacity to fulfil SLA promises to customers

To sum it up – SLAs are about minimal, expected and agreed quality of a service provided to a customer; however KPIs are about desired operation efficiency and organization goals. It is important to measure both service level compliance and key performance indicators in order to keep promises and excel service quality.

more...
No comment yet.
Scooped by Mickael Ruau
Scoop.it!

The Site Reliability Engineering Journey

The Site Reliability Engineering Journey | DEVOPS | Scoop.it
Tayllan's adventures through the worlds of Web Development, Software Architecture, DevOps and SRE!
more...
No comment yet.
Scooped by Mickael Ruau
Scoop.it!

What is a Virtual Network Function or VNF?

What is a Virtual Network Function or VNF? | DEVOPS | Scoop.it
Virtual network function or VNF, often used interchangeably w/ network functions virtualization (NFV), offers a new way to design & manage network services.
more...
No comment yet.
Scooped by Mickael Ruau
Scoop.it!

Network Virtualization - ScienceDirect

Multiple tenants (customers) are hosted in some of the large cloud data center environments. Not only are these tenants provided dedicated virtual mac…
more...
No comment yet.
Scooped by Mickael Ruau
Scoop.it!

SRE vs. DevOps: competing standards or close friends?

SRE vs. DevOps: competing standards or close friends? | DEVOPS | Scoop.it

DevOps emerged as a culture and a set of practices that aims to reduce the gaps between software development and software operation. However, the DevOps movement does not explicitly define how to succeed in these areas. In this way, DevOps is like an abstract class or interface in programming. It defines the overall behavior of the system, but the implementation details are left up to the author.

SRE, which evolved at Google to meet internal needs in the early 2000s independently of the DevOps movement, happens to embody the philosophies of DevOps, but has a much more prescriptive way of measuring and achieving reliability through engineering and operations work. In other words, SRE prescribes how to succeed in the various DevOps areas. For example, the table below illustrates the five DevOps pillars and the corresponding SRE practices:

Mickael Ruau's insight:
 
DevOps SRE Reduce organization silos Share ownership with developers by using the same tools and techniques across the stack Accept failure as normal Have a formula for balancing accidents and failures against new releases Implement gradual change Encourage moving quickly by reducing costs of failure Leverage tooling & automation Encourages "automating this year's job away" and minimizing manual systems work to focus on efforts that bring long-term value to the system Measure everything Believes that operations is a software problem, and defines prescriptive ways for measuring availability, uptime, outages, toil, etc.
more...
No comment yet.
Scooped by Mickael Ruau
Scoop.it!

CNCF Cloud Native Interactive Landscape

Filter and sort by GitHub stars, funding, commits, contributors, hq location, and tweets. Updated: 2019-06-19 04:36:39Z
more...
No comment yet.