First, a few words about me and Criteo
Meetup - June 21, 2017
A 9 years old history
How to reinvent your pipeline?
Towards a better CD
The CD implementation at Criteo
First, a few words about me and Criteo
"Real-Time Digital Advertising That Works"
"Real-Time Digital Advertising That Works"
130 countries
11K advertisers
16K publishers
Listed on the NASDAQ since October 2013
90% retention rate
R&D: 21% of the workforce
Among others, Criteo proud of retention rate
Who did not found a start-up about recommendation?
Who did not found a start-up about recommendation?
A big part of the build was related to the checkout. Bash scripts calling MSBuild.
Who did not found a start-up about recommendation?
A big part of the build was related to the checkout. Bash scripts calling MSBuild.
→ The mono-repo is split. Nugets are introduced.
Who did not found a start-up about recommendation?
A big part of the build was related to the checkout. Bash scripts calling MSBuild.
Lot of work to decide about ownership of each repo.
→ The mono-repo is split. Nugets are introduced.
Who did not found a start-up about recommendation?
A big part of the build was related to the checkout. Bash scripts calling MSBuild.
Lot of work to decide about ownership of each repo.
→ The mono-repo is split. Nugets are introduced.
Who did not found a start-up about recommendation?
A big part of the build was related to the checkout. Bash scripts calling MSBuild.
Lot of work to decide about ownership of each repo.
→ The mono-repo is split. Nugets are introduced.
Who did not found a start-up about recommendation?
A big part of the build was related to the checkout. Bash scripts calling MSBuild.
Lot of work to decide about ownership of each repo.
→ The mono-repo is split. Nugets are introduced.
→ Build from Source (BFS) is introduced.
→ Dedicated project is created.
Who did not found a start-up about recommendation?
A big part of the build was related to the checkout. Bash scripts calling MSBuild.
Lot of work to decide about ownership of each repo.
→ The mono-repo is split. Nugets are introduced.
→ Build from Source (BFS) is introduced.
→ Dedicated project is created.
Who did not found a start-up about recommendation?
A big part of the build was related to the checkout. Bash scripts calling MSBuild.
Lot of work to decide about ownership of each repo.
No new features during several months.
Whole dedication of the R&D is needed to: adapt the code, increase code coverage.
Be endorsed by the management
Have a dedicated team
Whole dedication of the R&D is needed to: adapt the code, increase code coverage.
The change will take time:
The team proposes plans for:
The project requires a real owner:
Be endorsed by the management
Have a dedicated team
Have the right level of competence
Whole dedication of the R&D is needed to: adapt the code, increase code coverage.
The change will take time:
The team proposes plans for:
The project requires a real owner:
Knowledge about a configuration management tool - like Chef or Puppet - will be necessary to setup the build cluster.
Optimization of builds may require a good knowledge of build tools.
Be endorsed by the management
Have a dedicated team
Have the right level of competence
Deliver the same quality as for other products
Whole dedication of the R&D is needed to: adapt the code, increase code coverage.
The change will take time:
The team proposes plans for:
The project requires a real owner:
Knowledge about a configuration management tool - like Chef or Puppet - will be necessary to setup the build cluster.
Optimization of builds may require a good knowledge of build tools.
Performance issues to solve.
Right code coverage is required as the tooling will be maintained and updated in the coming years.
Be endorsed by the management
Have a dedicated team
Have the right level of competence
Deliver the same quality as for other products
Define the right frontier between the CD team and the other teams:
Whole dedication of the R&D is needed to: adapt the code, increase code coverage.
The change will take time:
The team proposes plans for:
The project requires a real owner:
Knowledge about a configuration management tool - like Chef or Puppet - will be necessary to setup the build cluster.
Optimization of builds may require a good knowledge of build tools.
Performance issues to solve.
Right code coverage is required as the tooling will be maintained and updated in the coming years.
Right level of responsabilties is necessary. Of course devs should still be allowed to modify the build of their projects.
Be endorsed by the management
Have a dedicated team
Have the right level of competence
Deliver the same quality as for other products
Define the right frontier between the CD team and the other teams:
Whole dedication of the R&D is needed to: adapt the code, increase code coverage.
The change will take time:
The team proposes plans for:
The project requires a real owner:
Knowledge about a configuration management tool - like Chef or Puppet - will be necessary to setup the build cluster.
Optimization of builds may require a good knowledge of build tools.
Performance issues to solve.
Right code coverage is required as the tooling will be maintained and updated in the coming years.
Right level of responsabilties is necessary. Of course devs should still be allowed to modify the build of their projects.
Be endorsed by the management
Have a dedicated team
Have the right level of competence
Deliver the same quality as for other products
Define the right frontier between the CD team and the other teams:
Whole dedication of the R&D is needed to: adapt the code, increase code coverage.
The change will take time:
The team proposes plans for:
The project requires a real owner:
Knowledge about a configuration management tool - like Chef or Puppet - will be necessary to setup the build cluster.
Optimization of builds may require a good knowledge of build tools.
Performance issues to solve.
Right code coverage is required as the tooling will be maintained and updated in the coming years.
Right level of responsabilties is necessary. Of course devs should still be allowed to modify the build of their projects.
Be endorsed by the management
Have a dedicated team
Have the right level of competence
Deliver the same quality as for other products
Define the right frontier between the CD team and the other teams:
Whole dedication of the R&D is needed to: adapt the code, increase code coverage.
The change will take time:
The team proposes plans for:
The project requires a real owner:
Knowledge about a configuration management tool - like Chef or Puppet - will be necessary to setup the build cluster.
Optimization of builds may require a good knowledge of build tools.
Performance issues to solve.
Right code coverage is required as the tooling will be maintained and updated in the coming years.
Right level of responsabilties is necessary. Of course devs should still be allowed to modify the build of their projects.
High level representation of a CD pipeline.
The main interface from the point of view of the user is the review.
"Pre-submit" and "post-submit" terms come from the Gerrit tool. Pre-submit means "before-sharing", "pre-merge"...
The goal of the pre-submit is to avoid bad and late surprises in the post-submit. It is a gatekeeper before the code is shared.
Important qualities of a CD pipeline:
Important qualities of a CD pipeline:
scalable
fast
Important qualities of a CD pipeline:
scalable
fast
safe
Important qualities of a CD pipeline:
scalable
fast
safe
developer friendly
Let's see each of this point in detail.
Usually, with multiple repos, each repo is built independently from the others. Versioned artifacts are published and used as dependencies.
This leads to the dependency hell:
Usually, with multiple repos, each repo is built independently from the others. Versioned artifacts are published and used as dependencies.
This leads to the dependency hell:
With the "build from source" paradigm, dependencies are no more versioned — as nugets or Maven artifacts — but are built from source.
Advantages of BFS:
→ Allows to scale!
Advantages of BFS:
→ Allows to scale!
Drawbacks of BFS:
Is everyone convinced we need BFS?
Let's see how to implement it from multi repositories...
Situation before BFS, with independent repos:
As an example, we considerate 3 pieces of code that are owned by 3 different teams.
Situation before BFS, with independent repos:
Exercise: How to migrate to BFS?
As an example, we considerate 3 pieces of code that are owned by 3 different teams.
If you've been to DevoXX, or if you visit the website about "trunk based development" (wher Google, Facebook, Netflix, Uber (iOS app) are mentioned), the response is quite obvious.
Advertised as solved by a mono-repo:
Advertised as solved by a mono-repo:
Actually, what we need is:
At Google:
At Google:
Features requiring tooling | Mono-repo | Multi-repos |
---|---|---|
Partial checkout/build (on dev's machine) | ✓ | ✓ |
Complete checkout (on CI builders) | - | ✓ |
Reviews/commits relationships | ✓ | ✓ |
Ownership | ✓ | (optional) |
Support of open-sources repos | ✓ | - |
Secrecy of sensitive data | ✓ | - |
Fast checkout | ✓ | - |
Big mono-repos are difficult to maintain. Git was designed for the Linux kernel, but some companies have a much larger code base.
What is a big big repo?
Linux kernel | Windows | |||
---|---|---|---|---|
Nb of files | 60 K | 200 K | 3.5 M | 9 M (no data, conf, doc) |
Nb of lines | 20 M | 65 M | 50 M | 2000 M |
(caution: rough estimates...)
estimations for HEAD of some mono-repos.
BFS does not require a mono-repo
BFS does not require a mono-repo
BFS does require tooling
New compiler, new external dep: should be validated at presubmit.
New compiler, new external dep: should be validated at presubmit.
Question: Where should you test that "the application should start" ?
Ideally all tests should be run at presubmit
→ Forget about usual categories: Unit, Functional, Integration, etc.
New compiler, new external dep: should be validated at presubmit.
Question: Where should you test that "the application should start" ?
Ideally all tests should be run at presubmit
→ Forget about usual categories: Unit, Functional, Integration, etc.
New compiler, new external dep: should be validated at presubmit.
Presubmits ≤ ... min
Revert ≤ 10 min
Commit to prod availability ≤ 1 hour
Make your clients autonomous by providing:
Make your clients autonomous by providing:
Examples of documentation pages:
Even trivial things should be documented!
Do not mix user manual with your own internal doc.
Easy to use:
Not all developers like to go deep into .csproj, pom.xml. This requires skills.
The tooling can check the projects, provide guidance...
Easy to use:
filerHelper.upload(commitsFilename, "${newMoabPath}/commits.json")
Not all developers like to go deep into .csproj, pom.xml. This requires skills.
The tooling can check the projects, provide guidance...
Common pieces are mutualized (credential usage, priority management, user rights, ...)
Easy to use:
ContinuousIntegration.jmoabProject { gerritProject('identification/cactus') { replicateTo('git@gitlab.criteois.com:identification/cactus.git') } marathonApp('cactus') { continuousPreprodDeployment { configureTestingJob { steps { shell("""\ [...] """.stripIndent()) } } } } }
Easier than to have to know about idempotency, timeouts, clean-up, failure emails, logs rotation, script language...
Easy to fix:
Provide precise feedback about failures (pin-point the failing commits).
Propose or perform auto-reverts.
Our code base is mainly made of 3 pipelines:
Repositories | Projects | Tools/Services/Apps | |
---|---|---|---|
C# | 130 | 1000 | 120 |
Java/Scala | 240 | 500 | 190 |
Chef | 150 | - | - |
MOAB = build of all repos taking the last HEADs
MOAB is mother of all builds, not Massive Ordonance Air Blast, nor Mother of All Bombs.
This is what occur in the post-submit: all repos are built from source together.
This is what occurs on the CI buiders. Now let's focus on the developer's machine.
"app-a" can be built with its dependencies:
bfs checkout --with-dependencies app-a
"app-a" can be built with its dependencies:
bfs checkout --with-dependencies app-a
bfs checkout app-a
lib-a | lib-b | lib-c | |
---|---|---|---|
MOAB 1000 | ✔ | ✔ | ✔ |
MOAB 1001 | ✔ | ✔ | ✘ |
MOAB 1002 | ✔ | ✘ | ✔ |
bfs refresh-moab
Avoids to:
checkout all the code base
rebuild everything
have a huge solution in the IDE
get a broken source code
C#
Maven
Gradle
apps/app-a/app-a.csproj
:
<ItemGroup> <Reference Include="lib-a" /></ItemGroup>
app-a dependson lib-a.dll
The directory of the MOAB artifacts is part of the search paths.
apps/app-a/app-a.csproj
:
<ItemGroup> <Reference Include="lib-a" /></ItemGroup>
<AssemblySearchPaths> $(LocalBinariesFolder);$(MoabCacheFolder);$(AssemblySearchPaths)</AssemblySearchPaths>
app-a dependson lib-a.dll
The directory of the MOAB artifacts is part of the search paths.
Project = "lib-a", "libs\lib-a\lib-a.csproj", "{0000000-0000-0000-0000-0000000000}"EndProjectProject = "app-a", "apps\app-a\app-a.csproj", "{1111111-1111-1111-1111-1111111111}" ProjectSection(ProjectDependencies) = postProject {0000000-0000-0000-0000-0000000000} = {0000000-0000-0000-0000-0000000000} EndProjectSectionEndProject
<Target Name="CopyBuildOutput" AfterTargets="Build"> <Copy SourceFiles="$(TargetPath)" Destination="$(LocalBinariesFolder)" /></Target>
The tool used to build on the CI and to create the solution on dev's machine parses the .csproj and decides the build order of the projects.
This is done by matching the required DLLs with the assemblies built by the local .csproj.
The artifacts of a project are provided to the other projects by copying them in a priority directory.
This directory is the first one of the artifact search paths.
apps/app-a/pom.xml
:
<project> <artifactId>app-a</artifactId> <version>1.0</version> <dependencies> <dependency><artifactId>lib-a</artifactId></dependency> <dependency><artifactId>lib-b</artifactId></dependency> </dependencies></project>
libs/lib-a/pom.xml
:
<project> <artifactId>lib-a</artifactId> <version>1.0</version></project>
Bill of material (BOM) of MOAB 1000:
<project> <artifactId>moab</artifactId> <version>1000</version> <packaging>pom</packaging> <dependencyManagement> <dependencies> <dependency><artifactId>lib-a</artifactId><version>1000</version></dependency> <dependency><artifactId>lib-b</artifactId><version>1000</version></dependency> <dependency><artifactId>app-a</artifactId><version>1000</version></dependency> </dependencies> </dependencyManagement></project>
The MOAB BOM (bill of material) is generated at each MOAB.
It is called BOM to make it clear it is only providing version numbers.
It declares the built artifacts for a given MOAB in the "dependencyManagement" section.
<project> <modules> <module>lib-a</module> <module>app-a</module> </modules> <dependencyManagement> <dependencies> <!-- Import the MOAB BOM --> <dependency> <artifactId>moab</artifactId> <version>1000</version> <scope>import</scope> <type>pom</type> </dependency> <!-- Force the checked-out projects to be considered locally --> <dependency><artifactId>lib-a</artifactId><version>1.0</version></dependency> <dependency><artifactId>app-a</artifactId><version>1.0</version></dependency> <!-- Version of lib-b is taken from the MOAB BOM --> </dependencies> </dependencyManagement></project>
The pom at the workspace root declares the modules (repos), and import the BOM of a MOAB.
The artifacts of the MOAB are at version 1000. Those locally compiled are at version 1.0 (convention).
apps/app-a/build.gradle
:
dependencies { compile name: 'lib-a', version: '1.0'}
Map<String, Project> modulesToProject = [ 'lib-a': project(':libs/lib-a'), ...]subprojects { configurations.all { resolutionStrategy { // Executed after all projects are read dependencySubstitutions.all { dependencySubstitution -> String moduleId = "${it.requested.group}:${it.requested.module}" Project dependentProject = modulesToProject.get(moduleId) if (dependentProject != null) { dependencySubstitution.useTarget(dependentProject) } } } }}
The Gradle scripts know which artifact is built for each project. A map associating artifacts and the projects that build them can be computed.
Gradle allows to define substitution rules. Thus, a dependency to an artifact can be replaced by a dependency to a project. This works in the IDE too.
Dependency graph
Dependency graph
Parallelized build
Dependency graph
Parallelized build
Distributed build
r2 has already been compiled, so it can be taken from the cache.
Branches might be legitimate for big changes in the pipeline (e.g. new compiler) that should be completely tested (including a deployment in prod).
The pipeline cannot be branched.
A review that impacts many repositories require many rebase iterations.
Branches might be legitimate for big changes in the pipeline (e.g. new compiler) that should be completely tested (including a deployment in prod).
The pipeline cannot be branched.
A review that impacts many repositories require many rebase iterations.
BFS is not easily polyglot and able to deal with dependencies of multiple languages (JS, Python...)
Branches might be legitimate for big changes in the pipeline (e.g. new compiler) that should be completely tested (including a deployment in prod).
A tool is needed to discover the dependencies between repos and implement the partial checkout.
We did it on bower dependencies for front-end projects.
The pipeline cannot be branched.
A review that impacts many repositories require many rebase iterations.
BFS is not easily polyglot and able to deal with dependencies of multiple languages (JS, Python...)
Flaky tests.
Branches might be legitimate for big changes in the pipeline (e.g. new compiler) that should be completely tested (including a deployment in prod).
A tool is needed to discover the dependencies between repos and implement the partial checkout.
We did it on bower dependencies for front-end projects.
The pipeline cannot be branched.
A review that impacts many repositories require many rebase iterations.
BFS is not easily polyglot and able to deal with dependencies of multiple languages (JS, Python...)
Flaky tests. Exercise: How to get rid of flaky tests?
Branches might be legitimate for big changes in the pipeline (e.g. new compiler) that should be completely tested (including a deployment in prod).
A tool is needed to discover the dependencies between repos and implement the partial checkout.
We did it on bower dependencies for front-end projects.
Of course you will have flaky testes. We have 65 000 unit tests for example.
The pipeline cannot be branched.
A review that impacts many repositories require many rebase iterations.
BFS is not easily polyglot and able to deal with dependencies of multiple languages (JS, Python...)
Flaky tests. Exercise: How to get rid of flaky tests?
Branches might be legitimate for big changes in the pipeline (e.g. new compiler) that should be completely tested (including a deployment in prod).
A tool is needed to discover the dependencies between repos and implement the partial checkout.
We did it on bower dependencies for front-end projects.
Of course you will have flaky testes. We have 65 000 unit tests for example.
The pipeline cannot be branched.
A review that impacts many repositories require many rebase iterations.
BFS is not easily polyglot and able to deal with dependencies of multiple languages (JS, Python...)
Flaky tests. Exercise: How to get rid of flaky tests?
Branches might be legitimate for big changes in the pipeline (e.g. new compiler) that should be completely tested (including a deployment in prod).
A tool is needed to discover the dependencies between repos and implement the partial checkout.
We did it on bower dependencies for front-end projects.
Of course you will have flaky testes. We have 65 000 unit tests for example.
The pipeline cannot be branched.
A review that impacts many repositories require many rebase iterations.
BFS is not easily polyglot and able to deal with dependencies of multiple languages (JS, Python...)
Flaky tests. Exercise: How to get rid of flaky tests?
Branches might be legitimate for big changes in the pipeline (e.g. new compiler) that should be completely tested (including a deployment in prod).
A tool is needed to discover the dependencies between repos and implement the partial checkout.
We did it on bower dependencies for front-end projects.
Of course you will have flaky testes. We have 65 000 unit tests for example.
Reports can be provided by running the tests 100 times to detect the flaky ones and send reports every 2 weeks for instance.
1 Jenkins server and 92 builders (Linux/Windows/Mac)
2500 jobs expressed with the Job DSL plugin
1 Jenkins server and 92 builders (Linux/Windows/Mac)
2500 jobs expressed with the Job DSL plugin
30K executions / day
The jobs of the MOAB pipeline:
The jobs of the MOAB pipeline:
Question: How to execute a prepare-build if many e2e-* jobs are still queued?
No dedicated machines thanks to the Priority Sorter plugin
No dedicated machines thanks to the Priority Sorter plugin
Job name | Priority |
---|---|
hotfix | 1 |
presubmit | 10 |
e2etest | 50 |
Job name | BUILD_ID | Queue order |
---|---|---|
presubmit | 100 | 1010 |
presubmit | 101 | 1020 |
e2etest | 300 | 1021 |
presubmit | 102 | 1030 |
presubmit | 103 | 1040 |
hotfix | 200 | 1041 |
hotfix | 201 | 1042 |
presubmit | 104 | 1050 |
presubmit | 105 | 1060 |
presubmit | 106 | 1070 |
e2etest | 301 | 1071 |
... | ... | ... |
Exercise:
finalize-build
has been executed with MOABs 1000, 1001, 1002. e2e-my-slow-tests
with MOAB 1000 is still executing.
How to by-pass execution with MOAB 1001 and execute directly with MOAB 1002?
Exercise:
finalize-build
has been executed with MOABs 1000, 1001, 1002. e2e-my-slow-tests
with MOAB 1000 is still executing.
How to by-pass execution with MOAB 1001 and execute directly with MOAB 1002?
Cancel job with MOAB 1001 and trigger a new one with MOAB 1002:
def nextJob = Jenkins.instance.getJob('e2e-my-slow-tests')def qeued = nextJob.queueItemqueued.doCancelQueue()def params = [new StringParameterValue('MOAB_ID', build.resolve('MOAB_ID'))]nextJob.scheduleBuild2(0, new Cause.UpstreamCause(build), params)
Exercise:
finalize-build
has been executed with MOABs 1000, 1001, 1002. e2e-my-slow-tests
with MOAB 1000 is still executing.
How to by-pass execution with MOAB 1001 and execute directly with MOAB 1002?
Cancel job with MOAB 1001 and trigger a new one with MOAB 1002:
def nextJob = Jenkins.instance.getJob('e2e-my-slow-tests')def qeued = nextJob.queueItemqueued.doCancelQueue()def params = [new StringParameterValue('MOAB_ID', build.resolve('MOAB_ID'))]nextJob.scheduleBuild2(0, new Cause.UpstreamCause(build), params)
Better: Replace job in order to not lose order:
if (queued != null) { queued.replaceAction(params)} else { nextJob.scheduleBuild2(0, new Cause.UpstreamCause(build), params)}
The CD cluster is managed by Chef — as for the production clusters.
We talked about the CD but not about the infra that allows to run the CD.
This infra also has a CD pipeline.
This is the same kind of pipeline as for our 20000+ machines in prod.
The CD cluster is managed by Chef — as for the production clusters.
We talked about the CD but not about the infra that allows to run the CD.
This infra also has a CD pipeline.
This is the same kind of pipeline as for our 20000+ machines in prod.
The CD cluster is managed by Chef — as for the production clusters.
Detail of the "prod":
We talked about the CD but not about the infra that allows to run the CD.
This infra also has a CD pipeline.
This is the same kind of pipeline as for our 20000+ machines in prod.
2 kind of repositories:
140 repos "Shared cookbooks": When a commit is merged, a commit that bumps the cookbook version is automatically proposed as a review in the repositories that manage chef clusters.
8 repos "Chef clusters": Apply "shared cookbooks" and "cluster cookbooks" to a set of nodes.
A "prod" branch exists to push to prod.
A knife plugin called "knife-deploy" has been developed to validate the convergence.
It updates the Chef server and checks the convergence of all non-downtimed nodes.
Exercise: Does this pipeline implement "Build from source"?
Same principles as the CI pipeline:
Same principles as the CI pipeline:
A dedicated application keep track and trace of:
App configurations are in a repository, produced thanks to a DSL.
Deployment is monitored via checks of SLOs.
Thanks for your attention!
A 9 years old history
How to reinvent your pipeline?
Towards a better CD
The CD implementation at Criteo
First, a few words about me and Criteo
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |