+ - 0:00:00
Notes for current slide
Notes for next slide
  1. Main steps of the last 9 years
  2. How we change our piprline at Criteo
  3. Ideal pipeline from a theortical point of view
  4. Concretely at Criteo

First, a few words about me and Criteo

The Continuous Delivery at Criteo

Meetup - June 21, 2017

1 / 59

Agenda

 

  1. A 9 years old history

  2. How to reinvent your pipeline?

  3. Towards a better CD

  4. The CD implementation at Criteo

2 / 59
  1. Main steps of the last 9 years
  2. How we change our piprline at Criteo
  3. Ideal pipeline from a theortical point of view
  4. Concretely at Criteo

First, a few words about me and Criteo

About me — Emmanuel Debanne

 

Until now

  • 2002: Canon (Print software)
  • 2009: ThePresentFriend (Gift recommendation)
  • 2010: Ullink (Trading software)
  • 2012: Criteo (Online Advertising), QA, DevTools
3 / 59

About Criteo

"Real-Time Digital Advertising That Works"

criteo_display

4 / 59

About Criteo

"Real-Time Digital Advertising That Works"

criteo_display

  • 130 countries

  • 11K advertisers

  • 16K publishers

  • Listed on the NASDAQ since October 2013

  • 90% retention rate

  • R&D: 21% of the workforce

4 / 59

Among others, Criteo proud of retention rate

A bit of history

  • 2005: Criteo is founded
  • 2008: Start to grow
5 / 59

Who did not found a start-up about recommendation?

A bit of history

  • 2005: Criteo is founded
  • 2008: Start to grow
  • 2011: Weekly integration of branches, 90 min to build
5 / 59

Who did not found a start-up about recommendation?

A big part of the build was related to the checkout. Bash scripts calling MSBuild.

A bit of history

  • 2005: Criteo is founded
  • 2008: Start to grow
  • 2011: Weekly integration of branches, 90 min to build
  • 2012 June: Weekly integration lasts 6 working days
5 / 59

Who did not found a start-up about recommendation?

A big part of the build was related to the checkout. Bash scripts calling MSBuild.

A bit of history

  • 2005: Criteo is founded
  • 2008: Start to grow
  • 2011: Weekly integration of branches, 90 min to build
  • 2012 June: Weekly integration lasts 6 working days

      → The mono-repo is split. Nugets are introduced.

5 / 59

Who did not found a start-up about recommendation?

A big part of the build was related to the checkout. Bash scripts calling MSBuild.

Lot of work to decide about ownership of each repo.

A bit of history

  • 2005: Criteo is founded
  • 2008: Start to grow
  • 2011: Weekly integration of branches, 90 min to build
  • 2012 June: Weekly integration lasts 6 working days

      → The mono-repo is split. Nugets are introduced.

  • 2012 November: 1 month to update a library.
5 / 59

Who did not found a start-up about recommendation?

A big part of the build was related to the checkout. Bash scripts calling MSBuild.

Lot of work to decide about ownership of each repo.

A bit of history

  • 2005: Criteo is founded
  • 2008: Start to grow
  • 2011: Weekly integration of branches, 90 min to build
  • 2012 June: Weekly integration lasts 6 working days

      → The mono-repo is split. Nugets are introduced.

  • 2012 November: 1 month to update a library.
  • 2012 December: Deployment freeze
5 / 59

Who did not found a start-up about recommendation?

A big part of the build was related to the checkout. Bash scripts calling MSBuild.

Lot of work to decide about ownership of each repo.

A bit of history

  • 2005: Criteo is founded
  • 2008: Start to grow
  • 2011: Weekly integration of branches, 90 min to build
  • 2012 June: Weekly integration lasts 6 working days

      → The mono-repo is split. Nugets are introduced.

  • 2012 November: 1 month to update a library.
  • 2012 December: Deployment freeze
  • 2013 April: 4 months to catch up
5 / 59

Who did not found a start-up about recommendation?

A big part of the build was related to the checkout. Bash scripts calling MSBuild.

Lot of work to decide about ownership of each repo.

A bit of history

  • 2005: Criteo is founded
  • 2008: Start to grow
  • 2011: Weekly integration of branches, 90 min to build
  • 2012 June: Weekly integration lasts 6 working days

      → The mono-repo is split. Nugets are introduced.

  • 2012 November: 1 month to update a library.
  • 2012 December: Deployment freeze
  • 2013 April: 4 months to catch up

      → Build from Source (BFS) is introduced.
      → Dedicated project is created.

5 / 59

Who did not found a start-up about recommendation?

A big part of the build was related to the checkout. Bash scripts calling MSBuild.

Lot of work to decide about ownership of each repo.

A bit of history

  • 2005: Criteo is founded
  • 2008: Start to grow
  • 2011: Weekly integration of branches, 90 min to build
  • 2012 June: Weekly integration lasts 6 working days

      → The mono-repo is split. Nugets are introduced.

  • 2012 November: 1 month to update a library.
  • 2012 December: Deployment freeze
  • 2013 April: 4 months to catch up

      → Build from Source (BFS) is introduced.
      → Dedicated project is created.

  • 2013 Q2 to Q4: Migration to BFS:
    • Add missing tests
    • Catch up nuget lag
    • Create the tooling
5 / 59

Who did not found a start-up about recommendation?

A big part of the build was related to the checkout. Bash scripts calling MSBuild.

Lot of work to decide about ownership of each repo.

No new features during several months.

A bit of history

 

  • And iterate:

moabs_timeline

6 / 59

A bit of history

commits_to_prod

7 / 59

A bit of history

 

 

 

Questions?

8 / 59

How to change?

  • Be endorsed by the management
9 / 59

Whole dedication of the R&D is needed to: adapt the code, increase code coverage.

How to change?

  • Be endorsed by the management

  • Have a dedicated team

9 / 59

Whole dedication of the R&D is needed to: adapt the code, increase code coverage.

The change will take time:

  • weeks of planning,
  • months of engineering.

The team proposes plans for:

  • implementation,
  • migration

The project requires a real owner:

  • get the whole picture,
  • support other teams,
  • advocate and commit to a solution.

How to change?

  • Be endorsed by the management

  • Have a dedicated team

  • Have the right level of competence

9 / 59

Whole dedication of the R&D is needed to: adapt the code, increase code coverage.

The change will take time:

  • weeks of planning,
  • months of engineering.

The team proposes plans for:

  • implementation,
  • migration

The project requires a real owner:

  • get the whole picture,
  • support other teams,
  • advocate and commit to a solution.

Knowledge about a configuration management tool - like Chef or Puppet - will be necessary to setup the build cluster.
Optimization of builds may require a good knowledge of build tools.

How to change?

  • Be endorsed by the management

  • Have a dedicated team

  • Have the right level of competence

  • Deliver the same quality as for other products

9 / 59

Whole dedication of the R&D is needed to: adapt the code, increase code coverage.

The change will take time:

  • weeks of planning,
  • months of engineering.

The team proposes plans for:

  • implementation,
  • migration

The project requires a real owner:

  • get the whole picture,
  • support other teams,
  • advocate and commit to a solution.

Knowledge about a configuration management tool - like Chef or Puppet - will be necessary to setup the build cluster.
Optimization of builds may require a good knowledge of build tools.

Performance issues to solve.
Right code coverage is required as the tooling will be maintained and updated in the coming years.

How to change?

  • Be endorsed by the management

  • Have a dedicated team

  • Have the right level of competence

  • Deliver the same quality as for other products

  • Define the right frontier between the CD team and the other teams:

9 / 59

Whole dedication of the R&D is needed to: adapt the code, increase code coverage.

The change will take time:

  • weeks of planning,
  • months of engineering.

The team proposes plans for:

  • implementation,
  • migration

The project requires a real owner:

  • get the whole picture,
  • support other teams,
  • advocate and commit to a solution.

Knowledge about a configuration management tool - like Chef or Puppet - will be necessary to setup the build cluster.
Optimization of builds may require a good knowledge of build tools.

Performance issues to solve.
Right code coverage is required as the tooling will be maintained and updated in the coming years.

Right level of responsabilties is necessary. Of course devs should still be allowed to modify the build of their projects.

How to change?

  • Be endorsed by the management

  • Have a dedicated team

  • Have the right level of competence

  • Deliver the same quality as for other products

  • Define the right frontier between the CD team and the other teams:

    • Allow customization that don't jeopardize the future
9 / 59

Whole dedication of the R&D is needed to: adapt the code, increase code coverage.

The change will take time:

  • weeks of planning,
  • months of engineering.

The team proposes plans for:

  • implementation,
  • migration

The project requires a real owner:

  • get the whole picture,
  • support other teams,
  • advocate and commit to a solution.

Knowledge about a configuration management tool - like Chef or Puppet - will be necessary to setup the build cluster.
Optimization of builds may require a good knowledge of build tools.

Performance issues to solve.
Right code coverage is required as the tooling will be maintained and updated in the coming years.

Right level of responsabilties is necessary. Of course devs should still be allowed to modify the build of their projects.

How to change?

  • Be endorsed by the management

  • Have a dedicated team

  • Have the right level of competence

  • Deliver the same quality as for other products

  • Define the right frontier between the CD team and the other teams:

    • Allow customization that don't jeopardize the future
    • Allow changes but under supervision
9 / 59

Whole dedication of the R&D is needed to: adapt the code, increase code coverage.

The change will take time:

  • weeks of planning,
  • months of engineering.

The team proposes plans for:

  • implementation,
  • migration

The project requires a real owner:

  • get the whole picture,
  • support other teams,
  • advocate and commit to a solution.

Knowledge about a configuration management tool - like Chef or Puppet - will be necessary to setup the build cluster.
Optimization of builds may require a good knowledge of build tools.

Performance issues to solve.
Right code coverage is required as the tooling will be maintained and updated in the coming years.

Right level of responsabilties is necessary. Of course devs should still be allowed to modify the build of their projects.

How to change?

  • Be endorsed by the management

  • Have a dedicated team

  • Have the right level of competence

  • Deliver the same quality as for other products

  • Define the right frontier between the CD team and the other teams:

    • Allow customization that don't jeopardize the future
    • Allow changes but under supervision
    • Be owner of key parts
9 / 59

Whole dedication of the R&D is needed to: adapt the code, increase code coverage.

The change will take time:

  • weeks of planning,
  • months of engineering.

The team proposes plans for:

  • implementation,
  • migration

The project requires a real owner:

  • get the whole picture,
  • support other teams,
  • advocate and commit to a solution.

Knowledge about a configuration management tool - like Chef or Puppet - will be necessary to setup the build cluster.
Optimization of builds may require a good knowledge of build tools.

Performance issues to solve.
Right code coverage is required as the tooling will be maintained and updated in the coming years.

Right level of responsabilties is necessary. Of course devs should still be allowed to modify the build of their projects.

How to change?

 

 

 

Questions?

10 / 59

Towards an ideal pipeline

 

 

ideal_pipeline

11 / 59

High level representation of a CD pipeline.

The main interface from the point of view of the user is the review.

"Pre-submit" and "post-submit" terms come from the Gerrit tool. Pre-submit means "before-sharing", "pre-merge"...

The goal of the pre-submit is to avoid bad and late surprises in the post-submit. It is a gatekeeper before the code is shared.

Towards an ideal pipeline

 

 

Important qualities of a CD pipeline:

  • scalable
12 / 59

Towards an ideal pipeline

 

 

Important qualities of a CD pipeline:

  • scalable

  • fast

12 / 59

Towards an ideal pipeline

 

 

Important qualities of a CD pipeline:

  • scalable

  • fast

  • safe

12 / 59

Towards an ideal pipeline

 

 

Important qualities of a CD pipeline:

  • scalable

  • fast

  • safe

  • developer friendly

12 / 59

Let's see each of this point in detail.

Ideal pipeline: Scalable

Dependency hell

Usually, with multiple repos, each repo is built independently from the others. Versioned artifacts are published and used as dependencies.

This leads to the dependency hell:

  • Difficult to upgrade a dependency in all repos.
  • Several different versions to upgrade (or patch if an upgrade is too risky).
13 / 59

Ideal pipeline: Scalable

Dependency hell

Usually, with multiple repos, each repo is built independently from the others. Versioned artifacts are published and used as dependencies.

This leads to the dependency hell:

  • Difficult to upgrade a dependency in all repos.
  • Several different versions to upgrade (or patch if an upgrade is too risky).

Solved by building from source

With the "build from source" paradigm, dependencies are no more versioned — as nugets or Maven artifacts — but are built from source.

13 / 59

Ideal pipeline: Scalable (→ BFS)

 

Advantages of BFS:

  • Enforcement of a unique version among all clients.
  • Smoother upgrades (both for internal and external libraries).
  • Earlier discovery of issues.

        → Allows to scale!

14 / 59

Ideal pipeline: Scalable (→ BFS)

 

Advantages of BFS:

  • Enforcement of a unique version among all clients.
  • Smoother upgrades (both for internal and external libraries).
  • Earlier discovery of issues.

        → Allows to scale!

Drawbacks of BFS:

  • More difficult to change an internal or external library (in the short term).
  • Require to use "feature toggles" for progressive roll-out.
  • Need to handle flaky tests.
  • Can break the build of all repos (limited thanks to "partial" builds and a "fast revert" policy).
14 / 59

Transition

Is everyone convinced we need BFS?
Let's see how to implement it from multi repositories...

Ideal pipeline: Scalable (→ BFS)

 

Situation before BFS, with independent repos:

multi_repos_1

15 / 59

As an example, we considerate 3 pieces of code that are owned by 3 different teams.

Ideal pipeline: Scalable (→ BFS)

 

Situation before BFS, with independent repos:

multi_repos_1

 Exercise: How to migrate to BFS?

15 / 59

As an example, we considerate 3 pieces of code that are owned by 3 different teams.

Transition

If you've been to DevoXX, or if you visit the website about "trunk based development" (wher Google, Facebook, Netflix, Uber (iOS app) are mentioned), the response is quite obvious.

Ideal pipeline: Scalable (→ BFS)

 

Advertised as solved by a mono-repo:

multi_repos_2

16 / 59

Ideal pipeline: Scalable (→ BFS)

 

Advertised as solved by a mono-repo:

multi_repos_2

But what about the ownership and fine-grained reviews?
16 / 59

Ideal pipeline: Scalable (→ BFS)

Actually, what we need is:

  • fined-grained ownership/reviews
  • a way to test and merge a big change on the whole code base

multi_repos_3

17 / 59

Ideal pipeline: Scalable (→ BFS)

At Google:

multi_repos_4

18 / 59

Ideal pipeline: Scalable (→ BFS)

At Google:

multi_repos_4

At Criteo:

multi_repos_5

18 / 59

Ideal pipeline: Scalable (→ BFS)

 

Features requiring tooling Mono-repo Multi-repos
Partial checkout/build (on dev's machine)
Complete checkout (on CI builders) -
Reviews/commits relationships
Ownership (optional)
Support of open-sources repos -
Secrecy of sensitive data -
Fast checkout -
19 / 59

Transition

Big mono-repos are difficult to maintain. Git was designed for the Linux kernel, but some companies have a much larger code base.

Ideal pipeline: Scalable (→ BFS)

About repo performance

What is a big big repo?

Linux kernel Facebook Windows Google
Nb of files 60 K 200 K 3.5 M 9 M (no data, conf, doc)
Nb of lines 20 M 65 M 50 M 2000 M

(caution: rough estimates...)

20 / 59

estimations for HEAD of some mono-repos.

Ideal pipeline: Scalable (→ BFS)

vfs_for_monorepos

21 / 59

Ideal pipeline: Scalable (→ BFS)

 

 

Conclusion

BFS does not require a mono-repo

22 / 59

Ideal pipeline: Scalable (→ BFS)

 

 

Conclusion

BFS does not require a mono-repo

BFS does require tooling

22 / 59

Transition

  • Who is applying BFS?
  • Who is applying BFS with a mono-repo?
  • Who is applying BFS with multiple repos?
  • Who would like to apply BFS with a mono-repo?
  • Who would like to apply BFS with multiple repos?

Ideal pipeline: Safe

Be exhaustive

  • Put the build environment and external dependencies into one of the built-from-source repo.
23 / 59

New compiler, new external dep: should be validated at presubmit.

Ideal pipeline: Safe

Be exhaustive

  • Put the build environment and external dependencies into one of the built-from-source repo.

Execute the right tests

  • Question: Where should you test that "the application should start" ?
23 / 59

New compiler, new external dep: should be validated at presubmit.

Ideal pipeline: Safe

Be exhaustive

  • Put the build environment and external dependencies into one of the built-from-source repo.

Execute the right tests

  • Question: Where should you test that "the application should start" ?

  • Ideally all tests should be run at presubmit
           → Forget about usual categories: Unit, Functional, Integration, etc.

23 / 59

New compiler, new external dep: should be validated at presubmit.

Ideal pipeline: Safe

Be exhaustive

  • Put the build environment and external dependencies into one of the built-from-source repo.

Execute the right tests

  • Question: Where should you test that "the application should start" ?

  • Ideally all tests should be run at presubmit
           → Forget about usual categories: Unit, Functional, Integration, etc.

formula_roi_test

23 / 59

New compiler, new external dep: should be validated at presubmit.

Ideal pipeline: Fast

 

 

Presubmits ≤ ... min

Revert ≤ 10 min

Commit to prod availability ≤ 1 hour

24 / 59

Ideal pipeline: Developer friendly

Make your clients autonomous by providing:

  • trainings
  • exhaustive and empowering documentation
25 / 59

Ideal pipeline: Developer friendly

Make your clients autonomous by providing:

  • trainings
  • exhaustive and empowering documentation

Examples of documentation pages:

  • Creating a Git repository
  • Working with IntelliJ
  • Creating a hotfix
  • Adding an external dependency
  • Upgrading an external dependency

Even trivial things should be documented!

25 / 59

Do not mix user manual with your own internal doc.

Ideal pipeline: Developer friendly

Easy to use:

  • Provide smart tooling.
    • to bootstrap a project
    • to upgrade dependencies
    • to checkout the clients of your code
    • etc.
26 / 59

Not all developers like to go deep into .csproj, pom.xml. This requires skills.
The tooling can check the projects, provide guidance...

Ideal pipeline: Developer friendly

Easy to use:

  • Provide smart tooling.
    • to bootstrap a project
    • to upgrade dependencies
    • to checkout the clients of your code
    • etc.
  • Allow pipeline extension by providing helpers.
    Example:
filerHelper.upload(commitsFilename, "${newMoabPath}/commits.json")
26 / 59

Not all developers like to go deep into .csproj, pom.xml. This requires skills.
The tooling can check the projects, provide guidance...

Common pieces are mutualized (credential usage, priority management, user rights, ...)

Ideal pipeline: Developer friendly

Easy to use:

  • Provide a DSL for easier integration into the pipeline.
    Example:
ContinuousIntegration.jmoabProject {
gerritProject('identification/cactus') {
replicateTo('git@gitlab.criteois.com:identification/cactus.git')
}
marathonApp('cactus') {
continuousPreprodDeployment {
configureTestingJob {
steps {
shell("""\
[...]
""".stripIndent())
}
}
}
}
}
27 / 59

Easier than to have to know about idempotency, timeouts, clean-up, failure emails, logs rotation, script language...

Ideal pipeline: Developer friendly

 

 

Easy to fix:

  • Provide precise feedback about failures (pin-point the failing commits).

  • Propose or perform auto-reverts.

28 / 59

Towards an ideal pipeline

 

 

 

Questions?

29 / 59

The CD implementation at Criteo

Content

  • The code base
  • The MOABs
  • Partial checkouts
    • In C#
    • In Maven
    • In Gradle
  • The SQL projects
  • Optimized build
  • Encountered issues with BFS
  • Job scheduler
  • The pipeline of the CD infrastructure
  • The deployment to prod
30 / 59

The CD implementation at Criteo

 

 

Our code base is mainly made of 3 pipelines:

Repositories Projects Tools/Services/Apps
C# 130 1000 120
Java/Scala 240 500 190
Chef 150 - -
31 / 59

The CD implementation at Criteo

 

moab_pipeline

32 / 59

CD at Criteo: The MOABs

MOAB = build of all repos taking the last HEADs

moab_from_heads_2

33 / 59

MOAB is mother of all builds, not Massive Ordonance Air Blast, nor Mother of All Bombs.

This is what occur in the post-submit: all repos are built from source together.

Transition

This is what occurs on the CI buiders. Now let's focus on the developer's machine.

CD at Criteo: The MOABs

moab_from_heads_1

"app-a" can be built with its dependencies:

bfs checkout --with-dependencies app-a
      → Clone the repositories according to the dependency graph of the last MOAB.

34 / 59

CD at Criteo: The MOABs

moab_from_heads_1

"app-a" can be built with its dependencies:

bfs checkout --with-dependencies app-a
      → Clone the repositories according to the dependency graph of the last MOAB.

Or, with MOAB artifacts, by doing a partial checkout:
bfs checkout app-a
34 / 59

CD at Criteo: The MOABs

partial_checkout

35 / 59

CD at Criteo: The MOABs

MOABs are partially green

lib-a lib-b lib-c
MOAB 1000
MOAB 1001
MOAB 1002

Upgrade of internal dependencies

bfs refresh-moab
      → Fetch the artifacts of the most recent MOAB: 1001.
36 / 59

CD at Criteo: Partial checkout

 

Avoids to:

  • checkout all the code base

  • rebuild everything

  • have a huge solution in the IDE

  • get a broken source code

37 / 59

CD at Criteo: Partial checkout

 

Implementations

  • C#

  • Maven

  • Gradle

38 / 59

CD at Criteo: Partial checkout with C#

Expression of internal dependencies

apps/app-a/app-a.csproj:

<ItemGroup>
<Reference Include="lib-a" />
</ItemGroup>
39 / 59

app-a dependson lib-a.dll
The directory of the MOAB artifacts is part of the search paths.

CD at Criteo: Partial checkout with C#

Expression of internal dependencies

apps/app-a/app-a.csproj:

<ItemGroup>
<Reference Include="lib-a" />
</ItemGroup>

Priority of local artifacts over MOAB artifacts

<AssemblySearchPaths>
$(LocalBinariesFolder);$(MoabCacheFolder);$(AssemblySearchPaths)
</AssemblySearchPaths>
39 / 59

app-a dependson lib-a.dll
The directory of the MOAB artifacts is part of the search paths.

CD at Criteo: Partial checkout with C#

Project Graph

Project = "lib-a",
"libs\lib-a\lib-a.csproj",
"{0000000-0000-0000-0000-0000000000}"
EndProject
Project = "app-a",
"apps\app-a\app-a.csproj",
"{1111111-1111-1111-1111-1111111111}"
ProjectSection(ProjectDependencies) = postProject
{0000000-0000-0000-0000-0000000000} = {0000000-0000-0000-0000-0000000000}
EndProjectSection
EndProject
<Target Name="CopyBuildOutput" AfterTargets="Build">
<Copy SourceFiles="$(TargetPath)" Destination="$(LocalBinariesFolder)" />
</Target>
40 / 59

The tool used to build on the CI and to create the solution on dev's machine parses the .csproj and decides the build order of the projects.
This is done by matching the required DLLs with the assemblies built by the local .csproj.
The artifacts of a project are provided to the other projects by copying them in a priority directory.
This directory is the first one of the artifact search paths.

CD at Criteo: Partial checkout with Maven

Expression of internal dependencies

apps/app-a/pom.xml:

<project>
<artifactId>app-a</artifactId>
<version>1.0</version>
<dependencies>
<dependency><artifactId>lib-a</artifactId></dependency>
<dependency><artifactId>lib-b</artifactId></dependency>
</dependencies>
</project>

libs/lib-a/pom.xml:

<project>
<artifactId>lib-a</artifactId>
<version>1.0</version>
</project>
41 / 59

CD at Criteo: Partial checkout with Maven

Priority of local artifacts over MOAB artifacts

Bill of material (BOM) of MOAB 1000:

<project>
<artifactId>moab</artifactId>
<version>1000</version>
<packaging>pom</packaging>
<dependencyManagement>
<dependencies>
<dependency><artifactId>lib-a</artifactId><version>1000</version></dependency>
<dependency><artifactId>lib-b</artifactId><version>1000</version></dependency>
<dependency><artifactId>app-a</artifactId><version>1000</version></dependency>
</dependencies>
</dependencyManagement>
</project>
42 / 59

The MOAB BOM (bill of material) is generated at each MOAB.
It is called BOM to make it clear it is only providing version numbers.
It declares the built artifacts for a given MOAB in the "dependencyManagement" section.

CD at Criteo: Partial checkout with Maven

<project>
<modules>
<module>lib-a</module>
<module>app-a</module>
</modules>
<dependencyManagement>
<dependencies>
<!-- Import the MOAB BOM -->
<dependency>
<artifactId>moab</artifactId>
<version>1000</version>
<scope>import</scope>
<type>pom</type>
</dependency>
<!-- Force the checked-out projects to be considered locally -->
<dependency><artifactId>lib-a</artifactId><version>1.0</version></dependency>
<dependency><artifactId>app-a</artifactId><version>1.0</version></dependency>
<!-- Version of lib-b is taken from the MOAB BOM -->
</dependencies>
</dependencyManagement>
</project>
43 / 59

The pom at the workspace root declares the modules (repos), and import the BOM of a MOAB.
The artifacts of the MOAB are at version 1000. Those locally compiled are at version 1.0 (convention).

CD at Criteo: Partial checkout with Gradle

Expression of internal dependencies

apps/app-a/build.gradle:

dependencies {
compile name: 'lib-a', version: '1.0'
}
44 / 59

CD at Criteo: Partial checkout with Gradle

Priority of local artifacts over MOAB artifacts

Map<String, Project> modulesToProject = [
'lib-a': project(':libs/lib-a'),
...
]
subprojects {
configurations.all {
resolutionStrategy {
// Executed after all projects are read
dependencySubstitutions.all { dependencySubstitution ->
String moduleId = "${it.requested.group}:${it.requested.module}"
Project dependentProject = modulesToProject.get(moduleId)
if (dependentProject != null) {
dependencySubstitution.useTarget(dependentProject)
}
}
}
}
}
45 / 59

The Gradle scripts know which artifact is built for each project. A map associating artifacts and the projects that build them can be computed.

Gradle allows to define substitution rules. Thus, a dependency to an artifact can be replaced by a dependency to a project. This works in the IDE too.

CD at Criteo: The SQL projects

sql_projects

46 / 59

CD at Criteo: Optimized build

Dependency graph

graph_img_1

Exercise: How to build and test this?
47 / 59

CD at Criteo: Optimized build

Dependency graph

graph_img_1

Exercise: How to build and test this?

Parallelized build

graph_img_2

47 / 59

CD at Criteo: Optimized build

Dependency graph

graph_img_1

Exercise: How to build and test this?

Parallelized build

graph_img_2

Distributed build

graph_img_3

47 / 59

r2 has already been compiled, so it can be taken from the cache.

CD at Criteo: Issues with BFS

 

  • The pipeline cannot be branched.
48 / 59

Branches might be legitimate for big changes in the pipeline (e.g. new compiler) that should be completely tested (including a deployment in prod).

CD at Criteo: Issues with BFS

 

  • The pipeline cannot be branched.

  • A review that impacts many repositories require many rebase iterations.

48 / 59

Branches might be legitimate for big changes in the pipeline (e.g. new compiler) that should be completely tested (including a deployment in prod).

CD at Criteo: Issues with BFS

 

  • The pipeline cannot be branched.

  • A review that impacts many repositories require many rebase iterations.

  • BFS is not easily polyglot and able to deal with dependencies of multiple languages (JS, Python...)

48 / 59

Branches might be legitimate for big changes in the pipeline (e.g. new compiler) that should be completely tested (including a deployment in prod).

A tool is needed to discover the dependencies between repos and implement the partial checkout.
We did it on bower dependencies for front-end projects.

CD at Criteo: Issues with BFS

 

  • The pipeline cannot be branched.

  • A review that impacts many repositories require many rebase iterations.

  • BFS is not easily polyglot and able to deal with dependencies of multiple languages (JS, Python...)

  • Flaky tests.

48 / 59

Branches might be legitimate for big changes in the pipeline (e.g. new compiler) that should be completely tested (including a deployment in prod).

A tool is needed to discover the dependencies between repos and implement the partial checkout.
We did it on bower dependencies for front-end projects.

CD at Criteo: Issues with BFS

 

  • The pipeline cannot be branched.

  • A review that impacts many repositories require many rebase iterations.

  • BFS is not easily polyglot and able to deal with dependencies of multiple languages (JS, Python...)

  • Flaky tests.  Exercise: How to get rid of flaky tests?

48 / 59

Branches might be legitimate for big changes in the pipeline (e.g. new compiler) that should be completely tested (including a deployment in prod).

A tool is needed to discover the dependencies between repos and implement the partial checkout.
We did it on bower dependencies for front-end projects.

Of course you will have flaky testes. We have 65 000 unit tests for example.

CD at Criteo: Issues with BFS

 

  • The pipeline cannot be branched.

  • A review that impacts many repositories require many rebase iterations.

  • BFS is not easily polyglot and able to deal with dependencies of multiple languages (JS, Python...)

  • Flaky tests.  Exercise: How to get rid of flaky tests?

    • Retry until they pass.
48 / 59

Branches might be legitimate for big changes in the pipeline (e.g. new compiler) that should be completely tested (including a deployment in prod).

A tool is needed to discover the dependencies between repos and implement the partial checkout.
We did it on bower dependencies for front-end projects.

Of course you will have flaky testes. We have 65 000 unit tests for example.

CD at Criteo: Issues with BFS

 

  • The pipeline cannot be branched.

  • A review that impacts many repositories require many rebase iterations.

  • BFS is not easily polyglot and able to deal with dependencies of multiple languages (JS, Python...)

  • Flaky tests.  Exercise: How to get rid of flaky tests?

    • Retry until they pass.
    • Or retry until they fail.
48 / 59

Branches might be legitimate for big changes in the pipeline (e.g. new compiler) that should be completely tested (including a deployment in prod).

A tool is needed to discover the dependencies between repos and implement the partial checkout.
We did it on bower dependencies for front-end projects.

Of course you will have flaky testes. We have 65 000 unit tests for example.

CD at Criteo: Issues with BFS

 

  • The pipeline cannot be branched.

  • A review that impacts many repositories require many rebase iterations.

  • BFS is not easily polyglot and able to deal with dependencies of multiple languages (JS, Python...)

  • Flaky tests.  Exercise: How to get rid of flaky tests?

    • Retry until they pass.
    • Or retry until they fail.
    • Or provide reports.
48 / 59

Branches might be legitimate for big changes in the pipeline (e.g. new compiler) that should be completely tested (including a deployment in prod).

A tool is needed to discover the dependencies between repos and implement the partial checkout.
We did it on bower dependencies for front-end projects.

Of course you will have flaky testes. We have 65 000 unit tests for example.

Reports can be provided by running the tests 100 times to detect the flaky ones and send reports every 2 weeks for instance.

CD at Criteo: Job scheduler

 

 

 

  • 1 Jenkins server and 92 builders (Linux/Windows/Mac)
49 / 59

CD at Criteo: Job scheduler

 

 

 

  • 1 Jenkins server and 92 builders (Linux/Windows/Mac)

  • 2500 jobs expressed with the Job DSL plugin

49 / 59

CD at Criteo: Job scheduler

 

 

 

  • 1 Jenkins server and 92 builders (Linux/Windows/Mac)

  • 2500 jobs expressed with the Job DSL plugin

  • 30K executions / day

49 / 59

CD at Criteo: Job scheduler

The jobs of the MOAB pipeline: moab_jobs

50 / 59
  • prepare-build: associate a MOAB id to the set of sha1s of the heads of all repos
  • *-build: compile and test a repo
  • finalize-build: aggregate the results
  • comment-reviews: send a feedback in the reviews
  • send-mails: warn about failing commits
  • e2e-triggers-*: check the state of the test environment
  • e2e-*: execute the e2e tests for each app

CD at Criteo: Job scheduler

The jobs of the MOAB pipeline: moab_jobs

 Question: How to execute a prepare-build if many e2e-* jobs are still queued?

50 / 59
  • prepare-build: associate a MOAB id to the set of sha1s of the heads of all repos
  • *-build: compile and test a repo
  • finalize-build: aggregate the results
  • comment-reviews: send a feedback in the reviews
  • send-mails: warn about failing commits
  • e2e-triggers-*: check the state of the test environment
  • e2e-*: execute the e2e tests for each app

CD at Criteo: Job scheduler

No dedicated machines thanks to the Priority Sorter plugin

  • Weighted Fair Queuing
  • Interleave the jobs
  • Improved Round-Robin
51 / 59
  • replaces the default FIFO queue in Jenkins.
  • defines the execution rate of a job compared to the other jobs.
  • inspired by an algorithm used for network or CPU schedulers.

CD at Criteo: Job scheduler

No dedicated machines thanks to the Priority Sorter plugin

  • Weighted Fair Queuing
  • Interleave the jobs
  • Improved Round-Robin
Job name Priority
hotfix 1
presubmit 10
e2etest 50
Job name BUILD_ID Queue order
presubmit 100 1010
presubmit 101 1020
e2etest 300 1021
presubmit 102 1030
presubmit 103 1040
hotfix 200 1041
hotfix 201 1042
presubmit 104 1050
presubmit 105 1060
presubmit 106 1070
e2etest 301 1071
... ... ...
51 / 59
  • replaces the default FIFO queue in Jenkins.
  • defines the execution rate of a job compared to the other jobs.
  • inspired by an algorithm used for network or CPU schedulers.

CD at Criteo: Job scheduler

Exercise: finalize-build has been executed with MOABs 1000, 1001, 1002. e2e-my-slow-tests with MOAB 1000 is still executing.
How to by-pass execution with MOAB 1001 and execute directly with MOAB 1002?

52 / 59

CD at Criteo: Job scheduler

Exercise: finalize-build has been executed with MOABs 1000, 1001, 1002. e2e-my-slow-tests with MOAB 1000 is still executing.
How to by-pass execution with MOAB 1001 and execute directly with MOAB 1002?

Cancel job with MOAB 1001 and trigger a new one with MOAB 1002:

def nextJob = Jenkins.instance.getJob('e2e-my-slow-tests')
def qeued = nextJob.queueItem
queued.doCancelQueue()
def params = [new StringParameterValue('MOAB_ID', build.resolve('MOAB_ID'))]
nextJob.scheduleBuild2(0, new Cause.UpstreamCause(build), params)
52 / 59

CD at Criteo: Job scheduler

Exercise: finalize-build has been executed with MOABs 1000, 1001, 1002. e2e-my-slow-tests with MOAB 1000 is still executing.
How to by-pass execution with MOAB 1001 and execute directly with MOAB 1002?

Cancel job with MOAB 1001 and trigger a new one with MOAB 1002:

def nextJob = Jenkins.instance.getJob('e2e-my-slow-tests')
def qeued = nextJob.queueItem
queued.doCancelQueue()
def params = [new StringParameterValue('MOAB_ID', build.resolve('MOAB_ID'))]
nextJob.scheduleBuild2(0, new Cause.UpstreamCause(build), params)

Better: Replace job in order to not lose order:

if (queued != null) {
queued.replaceAction(params)
} else {
nextJob.scheduleBuild2(0, new Cause.UpstreamCause(build), params)
}
52 / 59

CD at Criteo: Pipeline of the infra

The CD cluster is managed by Chef — as for the production clusters.

53 / 59

We talked about the CD but not about the infra that allows to run the CD.

This infra also has a CD pipeline.

This is the same kind of pipeline as for our 20000+ machines in prod.

CD at Criteo: Pipeline of the infra

The CD cluster is managed by Chef — as for the production clusters.

  • 16 nodes in "preprod"
  • 168 nodes in "prod"
  • 3 datacenters
53 / 59

We talked about the CD but not about the infra that allows to run the CD.

This infra also has a CD pipeline.

This is the same kind of pipeline as for our 20000+ machines in prod.

CD at Criteo: Pipeline of the infra

The CD cluster is managed by Chef — as for the production clusters.

  • 16 nodes in "preprod"
  • 168 nodes in "prod"
  • 3 datacenters

Detail of the "prod":

  • 92 builders: 30 Linux, 58 Windows, 4 Mac
  • 34 sandboxes made of 2 hosts: 1 Linux + 1 Windows
  • Gerrit
  • Jenkins
  • Nexus
  • Filer
  • SonarQube
  • and instances for resilience
53 / 59

We talked about the CD but not about the infra that allows to run the CD.

This infra also has a CD pipeline.

This is the same kind of pipeline as for our 20000+ machines in prod.

CD at Criteo: Pipeline of the infra

2 kind of repositories:

  • 140 repos "Shared cookbooks": When a commit is merged, a commit that bumps the cookbook version is automatically proposed as a review in the repositories that manage chef clusters.

  • 8 repos "Chef clusters": Apply "shared cookbooks" and "cluster cookbooks" to a set of nodes.

54 / 59

CD at Criteo: Pipeline of the infra

chef_pipeline

55 / 59

A "prod" branch exists to push to prod.

A knife plugin called "knife-deploy" has been developed to validate the convergence.
It updates the Chef server and checks the convergence of all non-downtimed nodes.

CD at Criteo: Pipeline of the infra

 

 

 

Exercise: Does this pipeline implement "Build from source"?

56 / 59

CD at Criteo: The deployment

Same principles as the CI pipeline:

  • Suppress toil
  • Allow scaling
57 / 59

CD at Criteo: The deployment

Same principles as the CI pipeline:

  • Suppress toil
  • Allow scaling

A dedicated application keep track and trace of:

  • what is in production
  • where
  • since when
  • why
57 / 59

CD at Criteo: The deployment

deployment_pipeline

App configurations are in a repository, produced thanks to a DSL.

Deployment is monitored via checks of SLOs.

58 / 59

Questions?

 

 

 

 

Thanks for your attention!

 

 

 

59 / 59

Agenda

 

  1. A 9 years old history

  2. How to reinvent your pipeline?

  3. Towards a better CD

  4. The CD implementation at Criteo

2 / 59
  1. Main steps of the last 9 years
  2. How we change our piprline at Criteo
  3. Ideal pipeline from a theortical point of view
  4. Concretely at Criteo

First, a few words about me and Criteo

Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow