Head for the Cloud

Keeping the Cloud simple!

Validating Python code with a CodeCatalyst pipeline

2024-01-18 11 min read AWS Walkthroughs

Table of Contents

CodeCatalyst is a unified development environment created by AWS.

It has many features such as blueprints to assist in writing code, integrated Git repositories, dev environments which can be pre-defined and now AI integration. However, for myself, one of the most useful things is being able to define and use pipelines stored in the code repository.

Pipelines are one of the most commonly used tools that many of us working with code and Cloud use, allowing us to automate tasks to be carried out when we make changes to our code, whether that’s checking that our code works, building artefacts and packages, and deploying to our environments.

In this post, I’ll share an example pipeline that we can use to validate some Python code as we work on it. Whilst this post doesn’t cover all options available when working with pipelines, it should be enough to explain how the pipeline works and how you can modify them in your workflows.

To streamline the post, I’ll assume you understand how CodeCatalyst works and can create and work with code in repositories.

Our example Python Code

For this post, I’m using some example code from https://github.com/headforthecloud/example-python-lambda which defines a simple function that could be used as an AWS Lambda function. The code is structured like:

Screenshot showing repository structure

The code for the main function lambda_function.py, shown below, just sets up some logging, outputs a message and returns a status code to indicate it ran successfully. It also includes a couple of functions that could be used to demonstrate testing:

#! /usr/bin/env python
""" An example lambda function """

import os
import json
import logging


# define a logger using logging library. If LOG_LEVEL is not set, default to INFO.
# otherwise use value of LOG_LEVEL
logger = logging.getLogger()
logger.setLevel(os.getenv('LOG_LEVEL', 'INFO'))


def lambda_handler(event, context):
    """ define a lambda_handler function that takes in an event and a context """
    logger.info("Hello from Lambda!")

    return {
        "statusCode": 200,
        "body": json.dumps(event)
    }


def add_x_y(x, y):
    """ This is a simple function that adds two numbers together and returns the result. """
    return x + y


def multiply_x_y(x, y):
    """ This is a simple function that multiplies two numbers together and returns the result. """
    return x * y


# if this file is run directly, run the lambda_handler function with dummy event and context
if __name__ == '__main__':
    lambda_handler(None, None)

We also have some code written using the PyTest framework, in a folder called tests.

What will the pipeline do?

In this example, we’re going to perform a set of actions which are typical of a pipeline used with Python:

1 Linting - we do this to make sure our code meets general best practices in terms of code, and that it should at least run. For this, we’ll use a well-known tool called PyLint.

2 Vulnerability Scanning - we do this to try and make sure our code doesn’t contain any security issues such as secret values, possible SQL injection routes etc. We will use a tool called Bandit for this.

3 Automated Testing - we want to make sure that our code performs as expected. To this end, we’ll use the PyTest framework and check that our tests work and that we test an appropriate amount of our code.

4 Reporting - for each step, we will use the CodeCatalyst functionality to generate reports showing the outcome of each step and whether it was successful.

Creating a Pipeline

There are two approaches to generating or modifying a pipeline with CodeCatalyst - either via a visual editor built into https://codecatalyst.aws or working in the repository and defining a pipeline using a YAML file in the .codecatalyst/workflows file.

The full definition for the pipelines can be found here

For this example, I’ll use the latter approach, working with a file called .codecatalyst/workflows/python-testing-pipeline.yaml. If you’d like to see the full file, it’s available in GitHub

General Configuration

Firstly, we’re going to define where and when the pipeline will run with this code:

SchemaVersion: "1.0"

Name: python-testing-pipeline

Compute:
  Type: EC2

Triggers:
  - Type: Push

With this, we’re saying that the pipeline will be called python-testing-pipeline, and that it will be executed using EC2 (we could also use lambdas).

We’re also going to define that the pipeline should be triggered every time changes are pushed to the repository. We could also have workflows triggered when working with a pull request, or even on a scheduled basis.

Running Actions

Once we’ve defined when and where the pipeline runs, we need to tell it what steps to carry out - to do this we’ll use an Actions section, which will have a number of these items:

  • A name
  • Identifier - these are equivalent to GitHub actions - in fact we can use some GitHub actions (see here for more info.). In our examples, we’ll use the aws/build@v1 and aws/managed-test@v1 actions (these are functionally equivalent and interchangeable).
  • Inputs - in this case, we’re going to use these to specify that we want to retrieve our code from the WorkflowSource i.e. the repository containing the pipeline, but we could also specify that we want to use artefacts that might contain saved files.
  • Configuration steps - we’ll use these to list the specific actions we want to perform in the pipeline. With the build and managed-test actions, we provide a list of Run steps which use the Linux shell bash to execute the provided commands.
  • Outputs - in our example pipeline, we’ll use these to define Reports that will feedback on the results of our actions in CodeCatalyst.

Linting our code

For our first action, we’re going to check that our code meets the best practices for Python. In this case, as mentioned earlier we’re going to use PyLint and our pipeline will carry out the following steps:

  • Specify that we want to use the code from our repository
  • Install Pylint using pip
  • Ensure that we have a location we can use to store the results of our linting
  • Run pylint and capture the results in the folder created in the previous step
  • Upload the results as a report to CodeCatalyst, using the PYLINTJSON format and defining our success criteria which will control if this pipeline step is successful. In this example, we can specify what level of issues are allowed within a set of categories.

To perform the above, we can use this code:

Actions:
  Linting:
    Identifier: aws/build@v1.0.0
    Inputs:
      Sources:
        - WorkflowSource
    Configuration:
      Steps:
        - Run: |
            echo "Installing pylint"
            pip install --user pylint
            export PATH=$PATH:~/.local/bin            
        - Run: |
            echo "Check testresults folder exists"
            if [ ! -d tests/testresults ]
            then
              mkdir tests/testresults
            fi            
        - Run: |
            echo "Linting Python"
            pylint *py tests/*py > tests/testresults/pylint-output.py            
    Outputs:
      Reports:
        PyLintResults:
          Format: PYLINTJSON
          IncludePaths:
            - tests/testresults/pylint-output.py
          SuccessCriteria:
            StaticAnalysisQuality:
              Severity: HIGH
              Number: 1
            StaticAnalysisSecurity:
              Severity: MEDIUM
              Number: 1
            StaticAnalysisBug:
              Severity: MEDIUM
              Number: 1

PyLint configuration

We have control over what checks PyLint will carry out by using a configuration file .pylintrc. In our example, we’ll use this setup

[BASIC]
good-names=i,j,k,x,y,ex,Run,_
fail-under=0.1

[FORMAT]
max-line-length=120
indent-string='    '

[REPORTS]
output-format=json

Vulnerability scanning

We’re also going to add a section to our actions to check that we don’t have any security issues in our code such as including secrets, allowing SQL injection etc. To do this, we’re going to use a tool called Bandit

The steps are very similar to those from the linting:

  • Specify that we want to use the code from our repository
  • Install Bandit using pip
  • Ensure that we have a location we can use to store the results of our scans
  • Run bandit and capture the results in the folder created in the previous step. We’ll output the results in a standard format used by scanning tools called sarif
  • Upload the results as a report to CodeCatalyst, using the SARIFSA format and defining our success criteria which will control if this pipeline step is successful. Again, we’ll specify what criteria are needed for a successful run.

To perform the above, we can use this code:

  vuln_scan:
    Identifier: aws/build@v1.0.0
    Inputs:
      Sources:
        - WorkflowSource
    Configuration:
      Steps:
        - Run: |
            echo "Installing bandit"
            pip install --user bandit bandit-sarif-formatter
            export PATH=$PATH:~/.local/bin            
        - Run: |
            echo "Check testresults folder exists"
            if [ ! -d tests/testresults ]
            then
              mkdir tests/testresults
            fi            
        - Run: |
            echo "Running Bandit"
            bandit -r . --format sarif --output tests/testresults/bandit-output.sarif --exit-zero            
    Outputs:
      Reports:
        BanditResults:
          Format: SARIFSA
          IncludePaths:
            - tests/testresults/bandit-output.sarif
          SuccessCriteria:
            StaticAnalysisFinding:
              Severity: MEDIUM
              Number: 2

Automated testing

Whilst our other steps check our code from a static viewpoint, we want to be sure that our code works as we expect, so we’ll have a step included in most pipelines - using automated testing to validate that our code works in the way we want.

In our example, we’re going to use the popular PyTest framework, which will use code stored in the tests folder to check functionality - for this example, we’re going to have a single, simple test to demonstrate how this can be done.

As well as understanding whether our code passes the provided tests, we want to understand how much of our code has been tested, so we’ll also capture what is known as code coverage which records which lines of our code have been tested.

Again our steps will follow the now familiar process of installing any required tools, executing them, and then capturing the results as a report within CodeCatalyst using the following code:

  unit_tests:
    Identifier: aws/managed-test@v1.0.0
    Inputs:
      Sources:
        - WorkflowSource
    Configuration:
      Steps:
        - Run: |
            echo "Installing pytest"
            pip install --user pytest pytest-cov
            export PATH=$PATH:~/.local/bin            
        - Run: |
            echo "Check testresults folder exists"
            if [ ! -d tests/testresults ]
            then
              mkdir tests/testresults
            fi            
        - Run: |
            echo "Check for requirements"
            if [ ! -r requirements.txt ]
            then
              pip install --user -r requirements.txt
            fi            
        - Run: |
            echo "Running PyTest"
            python -m pytest            
    Outputs:
      Reports:
        PyTestResults:
          Format: JUNITXML
          IncludePaths:
            - tests/testresults/junit.xml
          SuccessCriteria:
            PassRate: 100
        CodeCoverage:
          Format: COBERTURAXML
          IncludePaths:
            - tests/testresults/coverage.xml
          SuccessCriteria:
            LineCoverage: 80

PyTest configuration

With PyTest, we’re going to use two configuration files. .pytest.ini is used to define where our tests are and what output we’ll generate from the tests - our example looks like:

[pytest]
log_level = INFO
addopts = 
    -v --no-header --cov=.
    --junitxml=tests/testresults/junit.xml
    --cov-report=xml:tests/testresults/coverage.xml
    --cov-report=term-missing

testpaths = tests

and we’ll also use a .coveragerc file to tell PyTest not to include our test files when calculating code coverage via:

[run]
omit = ./tests/*

Action ordering

As defined here, there are no constraints on the ordering of the linting, scanning and testing steps, so they will run in parallel.

However, if we want to ensure that a step will only run if a previous step is completed, we can use a DependsOn clause in each action, so for example if we wanted our unit_tests action to only run if the linting step worked, we could change our action definition to include the following lines:

  unit_tests:
    DependsOn:
      - Linting
    Identifier: aws/managed-test@v1.0.0
...

Running the pipeline

Once we’ve created our pipeline and committed it to the code repository in CodeCatalyst along with our Python code, we should have a file structure that looks something like:

Complete repository structure

With all of this in place, CodeCatalyst should recognise that it needs to run the pipeline anytime there are changes to the code in the repository, including the pipeline configuration file. These runs are visible in the CodeCatalyst console under CI/CD > Workflows as shown below:

Screenshot showing example workflow runs

Each pipeline will be listed using the name defined at the start of the configuration, along with each run, showing the status of the run, a run ID, a commit ID that triggered the run, and which repository and branch were used.

Clicking on the ID of the run, will take us to the details for that particular run, looking something like this:

Screenshot of run details

As mentioned earlier, you can see that because we didn’t define any dependencies between the steps, they ran in parallel. We can also see whether the run was successful, the commit ID that triggered the run, along with when the run started, and how long it took.

We can also click on any of the steps to see the details of each step, including the output from any commands:

Screenshot of run details

Reporting

As well as being able to see whether a workflow run was successful, we can see any reports generated by clicking on Reports, either in the sidebar or the details of the run screen.

Screenshot of workflow reports

The screen above shows the reports generated by our workflow, when they were generated, if they were successful (as long as we defined criteria to specify what success means), along with repository details, the workflow action step that created the report and the type of data the report contains.

These reports are, in my view, one of the items that helps CodeCatalyst stand out - it’s very simple to define what reports are being generated, what type of data they contain, and what constitutes a successful report.

Clicking through on the report name takes you to the detailed report data:

Screenshot of example Code Coverage report

In the example above, showing code coverage i.e. how much of the code has been tested, we can see what the success criteria were, how much of the code we’ve tested both as a summary, and on a per file basis.

CodeCoverage details.

In the example, we could see a summary of the overall coverage, as well as a per file basis. We’re also able to click through to the individual files to see which lines were tested or not:

Screenshot showing which lines were tested in a file

Conclusion

In my opinion, CodeCatalyst is a useful development tool - by integrating many of the tools required in the SDLC (Software Development Life Cycle), it can provide a very functional working space.

In this example, we’ve concentrated on how we can define and perform pipelines when we make changes to our code, and how we can report on the outcomes from those changes - an area that I think CodeCatalyst is particularly strong in.

If you have any questions, comments or suggestions for other tasks we could use in the pipelines, use the comment box below.