Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
db8e047
changes to project-1 and project-3
thedave42 Sep 15, 2021
31d6cec
new changes to detect
thedave42 Sep 15, 2021
4379617
only run scan on top level dirs with changes
thedave42 Sep 16, 2021
06ab50d
put diff.txt in right place
thedave42 Sep 17, 2021
1db87fe
Get diff.txt from right place
thedave42 Sep 17, 2021
54acbe8
remove target-dir exclude
thedave42 Sep 17, 2021
0b9e655
right git diff command
thedave42 Sep 17, 2021
0a5eb10
is checkout wrong?
thedave42 Sep 17, 2021
5ad63b6
Ensure only directories are listed
thedave42 Sep 17, 2021
d6442ec
Merge pull request #1 from thedave42/run-only-changes
thedave42 Sep 17, 2021
d1216cf
Changed 3 dirs and a file
thedave42 Sep 17, 2021
1396c7f
one code scan workflow for all one for changes
thedave42 Sep 17, 2021
1b9ca6f
scan only changes on pr, scan all on push, schedule
thedave42 Sep 17, 2021
7d1ed23
do it all in one workflow file
thedave42 Sep 17, 2021
8c058e4
fix yaml syntax error
thedave42 Sep 17, 2021
34f1e53
fixes for running all vs only changes
thedave42 Sep 17, 2021
cd7394b
separate jobs for changed vs all
thedave42 Sep 17, 2021
a563fbc
upload diff as workflow artifact
thedave42 Sep 17, 2021
ebb0834
fix yaml
thedave42 Sep 17, 2021
6b04ec5
do not stop other matrix jobs if one fails
thedave42 Sep 17, 2021
045e042
Merge pull request #2 from thedave42/workflow-testing
thedave42 Sep 17, 2021
59c6107
Update README.md
thedave42 Sep 17, 2021
53931eb
Update code-scanning.yml
thedave42 Jun 2, 2022
cc15c30
test what happens with a language matrix
thedave42 Aug 17, 2022
50ab9f4
add script to list langauges found in diff.txt
thedave42 Aug 17, 2022
bd7ed6a
remove all workflow referece to language config
thedave42 Aug 18, 2022
645edcc
remove all reference to workflow config
thedave42 Aug 18, 2022
b62b541
See if it picks up python
thedave42 Aug 18, 2022
49c24b7
Try again
thedave42 Aug 18, 2022
96592a5
all codeql supported languages by extension
thedave42 Aug 18, 2022
38a065f
testing non-rec matrix
thedave42 Aug 18, 2022
1df457d
fixes to use non-rec matrix
thedave42 Aug 18, 2022
8f43960
fix error with non-rec matrix
thedave42 Aug 18, 2022
a5820d9
getting closer
thedave42 Aug 18, 2022
3249d81
test non-rec matrix hard code
thedave42 Aug 18, 2022
75eaaf3
testing again
thedave42 Aug 18, 2022
a4cdcff
another try
thedave42 Aug 18, 2022
1b0e300
trying again
thedave42 Aug 18, 2022
3109265
testing again
thedave42 Aug 18, 2022
6583b89
build matrix different
thedave42 Aug 18, 2022
b63aca9
new test
thedave42 Aug 18, 2022
7de6b05
dynamic matrix test
thedave42 Aug 18, 2022
a4d22a7
Revert "Update code-scanning.yml"
thedave42 Aug 18, 2022
86e1224
Updates to use script
thedave42 Aug 18, 2022
139d2a8
Merge pull request #20 from thedave42/dynamic-languages-configuration
thedave42 Aug 18, 2022
5267922
changes to dynamic
thedave42 Aug 18, 2022
39c586a
Merge pull request #21 from thedave42/dynamic-languages-configuration
thedave42 Aug 18, 2022
dd8bef1
fixes for dynamic scannign
thedave42 Aug 18, 2022
513f8a1
Changes to use new scripts
thedave42 Aug 18, 2022
12f8f97
Updates to logging
thedave42 Aug 18, 2022
4926a3e
More logging updates
thedave42 Aug 18, 2022
8ae1a27
Fix for workflow
thedave42 Aug 18, 2022
b7ed154
Fixes for changed workflow
thedave42 Aug 18, 2022
597b8d8
Fix analysis categories
thedave42 Aug 18, 2022
4041948
Clean up workflow file
thedave42 Aug 18, 2022
7cb819e
More accurate file extension identification
thedave42 Aug 19, 2022
e1cbd07
Clean up comments
thedave42 Aug 19, 2022
e8a1d67
testing streamlined flow
thedave42 Aug 19, 2022
52d0c25
testing streamline
thedave42 Aug 19, 2022
3c4fafa
test streamline
thedave42 Aug 19, 2022
6eb8bec
test streamline
thedave42 Aug 19, 2022
35879e8
test streamline
thedave42 Aug 19, 2022
3563401
testing streamlining
thedave42 Aug 19, 2022
ca862ba
more streamlining
thedave42 Aug 19, 2022
295b9d3
more streamlining
thedave42 Aug 19, 2022
9e02eb1
testing autobuild streamline
thedave42 Aug 19, 2022
f6deb54
more streamlining of autobuild
thedave42 Aug 19, 2022
70afe1d
Update README.md
thedave42 Dec 14, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .github/codeql/codeql-config-javascript.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
name: "CodeQL config"

paths:
- project-1
- project-3
4 changes: 4 additions & 0 deletions .github/codeql/codeql-config-python.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
name: "CodeQL config"

paths:
- python-project
58 changes: 58 additions & 0 deletions .github/scripts/list-all
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
#!/usr/bin/env python3
#
# This script prints a JSON representation of an include matrix that will create a job for each top level folder/language combination detected in the repository.
#
# {"include": [{"target-dir": ".github", "languages": "javascript"}, {"target-dir": "project with spaces", "languages": "javascript"}, {"target-dir": "project-1", "languages": "javascript"}, {"target-dir": "python-project", "languages": "python"}]}
#

from genericpath import isdir
import json
import os
import glob

javascript = [".js", ".jsx", ".mjs", ".es", ".es6", ".htm", ".html", ".xhtm", ".xhtml", ".vue", ".hbs", ".ejs", ".njk", ".json", ".yaml", ".yml", ".raml", ".xml"]
typescript = [".ts", ".tsx", ".mts", ".cts"]
c_and_cplus = [".cpp", ".c++", ".cxx", ".hpp", ".hh", ".h++", ".hxx", ".c," ".cc", ".h"]
csharp = [".sln", ".csproj", ".cs", ".cshtml", ".xaml"]
golang = [".go"]
python_lang = [".py"]
java = [".java"]
ruby = [".rb", ".erb", ".gemspec", "Gemfile"]

outlines = dict()
outlines["include"] = set()

def serialize_sets(obj):
if isinstance(obj, set):
l = list()
for item in obj:
if isinstance(item, tuple):
l.append(dict((x, y) for x, y in item))
return l

def find_in_list(list, string):
for item in list:
if string.strip().endswith(item):
return True
return False

for line in glob.glob('**', recursive=True):
path = line.split('/')[0]
if find_in_list(javascript, line) and (os.path.isdir(path)):
outlines["include"].add(tuple(dict({"target-dir": path, "languages": "javascript"}).items()))
if find_in_list(typescript, line) and (os.path.isdir(path)):
outlines["include"].add(tuple(dict({"target-dir": path, "languages": "javascript"}).items()))
if find_in_list(c_and_cplus, line) and (os.path.isdir(path)):
outlines["include"].add(tuple(dict({"target-dir": path, "languages": "cpp"}).items()))
if find_in_list(csharp, line) and (os.path.isdir(path)):
outlines["include"].add(tuple(dict({"target-dir": path, "languages": "csharp"}).items()))
if find_in_list(golang, line) and (os.path.isdir(path)):
outlines["include"].add(tuple(dict({"target-dir": path, "languages": "go"}).items()))
if find_in_list(python_lang, line) and (os.path.isdir(path)):
outlines["include"].add(tuple(dict({"target-dir": path, "languages": "python"}).items()))
if find_in_list(java, line) and (os.path.isdir(path)):
outlines["include"].add(tuple(dict({"target-dir": path, "languages": "java"}).items()))
if find_in_list(ruby, line) and (os.path.isdir(path)):
outlines["include"].add(tuple(dict({"target-dir": path, "languages": "ruby"}).items()))

print(json.dumps(outlines, default=serialize_sets))
58 changes: 58 additions & 0 deletions .github/scripts/list-changed
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
#!/usr/bin/env python3
#
# This script prints a JSON representation of an include matrix - based on the output of git diff - that will create a job for each top level folder/language combination detected in the repository.
#
# {"include": [{"target-dir": ".github", "languages": "javascript"}, {"target-dir": "project with spaces", "languages": "javascript"}, {"target-dir": "project-1", "languages": "javascript"}, {"target-dir": "python-project", "languages": "python"}]}
#

from genericpath import isdir
import json
import os

javascript = [".js", ".jsx", ".mjs", ".es", ".es6", ".htm", ".html", ".xhtm", ".xhtml", ".vue", ".hbs", ".ejs", ".njk", ".json", ".yaml", ".yml", ".raml", ".xml"]
typescript = [".ts", ".tsx", ".mts", ".cts"]
c_and_cplus = [".cpp", ".c++", ".cxx", ".hpp", ".hh", ".h++", ".hxx", ".c," ".cc", ".h"]
csharp = [".sln", ".csproj", ".cs", ".cshtml", ".xaml"]
golang = [".go"]
python_lang = [".py"]
java = [".java"]
ruby = [".rb", ".erb", ".gemspec", "Gemfile"]

lines = list(open("./.github/scripts/diff.txt").readlines())
outlines = dict()
outlines["include"] = set()

def serialize_sets(obj):
if isinstance(obj, set):
l = list()
for item in obj:
if isinstance(item, tuple):
l.append(dict((x, y) for x, y in item))
return l

def find_in_list(list, string):
for item in list:
if string.strip().endswith(item):
return True
return False

for line in lines:
path = line.split('/')[0]
if find_in_list(javascript, line) and (os.path.isdir(path)):
outlines["include"].add(tuple(dict({"target-dir": path, "languages": "javascript"}).items()))
if find_in_list(typescript, line) and (os.path.isdir(path)):
outlines["include"].add(tuple(dict({"target-dir": path, "languages": "javascript"}).items()))
if find_in_list(c_and_cplus, line) and (os.path.isdir(path)):
outlines["include"].add(tuple(dict({"target-dir": path, "languages": "cpp"}).items()))
if find_in_list(csharp, line) and (os.path.isdir(path)):
outlines["include"].add(tuple(dict({"target-dir": path, "languages": "csharp"}).items()))
if find_in_list(golang, line) and (os.path.isdir(path)):
outlines["include"].add(tuple(dict({"target-dir": path, "languages": "go"}).items()))
if find_in_list(python_lang, line) and (os.path.isdir(path)):
outlines["include"].add(tuple(dict({"target-dir": path, "languages": "python"}).items()))
if find_in_list(java, line) and (os.path.isdir(path)):
outlines["include"].add(tuple(dict({"target-dir": path, "languages": "java"}).items()))
if find_in_list(ruby, line) and (os.path.isdir(path)):
outlines["include"].add(tuple(dict({"target-dir": path, "languages": "ruby"}).items()))

print(json.dumps(outlines, default=serialize_sets))
95 changes: 62 additions & 33 deletions .github/workflows/code-scanning.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: Code scanning
name: Code scanning for all apps

#
# Scan the code using CodeQL whenever new commits are pushed to the main branch
Expand All @@ -18,48 +18,71 @@ on:
paths-ignore:
- 'docs/**'
- '*'
schedule:
- cron: "35 13 * * 2"
workflow_dispatch:


jobs:
generate-dir-list:
generate-scan-list:
# Find all the top level directories in the repostiory and use them for the scan
# when the workflow is not triggered by a pull_request
#
name: Generate directory list
runs-on: ubuntu-latest
outputs:
dir-list: ${{steps.find-dirs.outputs.dir-list}}
matrix: ${{steps.set-matrix.outputs.matrix}}

steps:
- name: Checkout repository
uses: actions/checkout@v2
- name: Checkout repository
uses: actions/checkout@v3
with:
fetch-depth: 0

#
# Generate a JSON array containing all non-hidden subdirectories of the
# repository's root directory and store it as a job output so it can be
# consumed by all downstream jobs depending on this one. For more
# information about this, visit:
#
# https://docs.github.com/en/actions/reference/workflow-syntax-for-github-actions#jobsjob_idoutputs
#
- name: Find existing directories
id: find-dirs
run: |
echo "::set-output name=dir-list::$(./.github/scripts/list-dirs)"
- name: Find all apps
if: ${{ github.event_name != 'pull_request'}}
id: find-all
run: |
echo "::set-output name=all::$(./.github/scripts/list-all)"
- name: Find changed apps
if: ${{ github.event_name == 'pull_request'}}
id: find-changed
run: |
git diff --name-only origin/$GITHUB_BASE_REF $GITHUB_SHA >./.github/scripts/diff.txt
echo "::set-output name=changed::$(./.github/scripts/list-changed)"

codeql:
name: Scan code with CodeQL
needs: generate-dir-list
- name: Setup scanning matrix
id: set-matrix
env:
ALL: ${{ steps.find-all.outputs.all }}
CHANGED: ${{ steps.find-changed.outputs.changed }}
run: |
echo "::set-output name=matrix::$ALL$CHANGED"
echo "::notice::All set to $ALL"
echo "::notice::Changed set to $CHANGED"

- name: Upload diff as artifact
if: ${{ github.event_name == 'pull_request'}}
uses: actions/upload-artifact@v2
with:
name: diff
path: |
./.github/scripts/diff.txt



codeql-scan:
name: Scanning ${{matrix.target-dir}} (${{ matrix.languages }}) with CodeQL
needs: generate-scan-list
runs-on: ubuntu-latest
strategy:
matrix:
target-dir: ${{fromJson(needs.generate-dir-list.outputs.dir-list)}}
#
# Prevent the creation of jobs for directories where code scanning is
# not necessary/desired.
#
exclude:
- target-dir: docs
fail-fast: false
matrix: ${{ fromJson(needs.generate-scan-list.outputs.matrix) }}

steps:
- name: Checkout repository
uses: actions/checkout@v2
uses: actions/checkout@v3

#
# Build the configuration file for CodeQL to instruct it to only scan the
Expand All @@ -70,16 +93,22 @@ jobs:
#
- name: Build CodeQL config file
env:
TARGET_DIR: ${{matrix.target-dir}}
TARGET_DIR: ${{ matrix.target-dir }}
run: |
cp .github/codeql/codeql-config-template.yml codeql-config.yml
sed -i 's@__TARGET_DIR__@'"$TARGET_DIR"'@' codeql-config.yml

- name: Initialize CodeQL
uses: github/codeql-action/init@v1
uses: github/codeql-action/init@v2
with:
config-file: codeql-config.yml
languages: javascript
languages: ${{ matrix.languages }}

- name: Attempting build
if: ${{ (matrix.languages == 'cpp' || matrix.languages == 'csharp' || matrix.languages == 'java') }}
uses: github/codeql-action/autobuild@v2

- name: Perform CodeQL analysis
uses: github/codeql-action/analyze@v1
uses: github/codeql-action/analyze@v2
with:
category: ${{ matrix.target-dir }}-${{ matrix.languages }}
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
.github/scripts/diff.txt
**/node_modules/**
24 changes: 9 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# Parallel code scanning with CodeQL

https://github.com/thedave42/parallel-code-scanning/labels/documentation

If you have a large repository containing various independent projects (a
"monorepo"), the time taken to scan your code with CodeQL can be significantly
reduced by splitting the scanning work into various parallel jobs which will
Expand All @@ -20,6 +22,11 @@ this repository (e.g. `project-4`) requires no changes to the workflow file as a
dedicated code scanning job will be automatically generated for it when the
workflow is executed.

If the workflow is triggered by a pull request the list of sub-directories that
will be scanned will be limited to the subdirectories that contain changes. The
changes are based on a `git diff` between the base and head repositories specified
in the pull request.

This strategy is possible because GitHub Actions workflows accept JSON input to
define a job matrix, and the JSON contents can be generated during the
workflow's execution. In other words, the job matrix can be defined dynamically.
Expand All @@ -33,20 +40,7 @@ general capabilities of CodeQL before doing this.

## Answers to common questions

**1.** _Even if files in only one subdirectory in the repository are changed,
code scanning jobs will be generated for all subdirectories containing software
projects, which is wasteful. Is it possible to limit the generation of jobs so
that only subdirectories with modified files will be scanned?_

Yes. The list of subdirectories which is used as input for the code scanning job
matrix is produced by a [script](./.github/scripts/list-dirs) which simply
outputs all subdirectories under the repository's root directory. This script
can be modified in any way you want, so you can use [`git
diff`](https://stackoverflow.com/questions/50440420/git-diff-only-show-which-directories-changed)
to build a list containing only subdirectories with modified files and use that
list as input for the job matrix generation.

**2.** _Every code scanning job checks out the repository in parallel. If a
**1.** _Every code scanning job checks out the repository in parallel. If a
change is made to the repository during that time (e.g. a subdirectory is added
or removed, or a file in a pre-existing subdirectory is modified), you
essentially have a race condition which is not being properly handled._
Expand All @@ -63,4 +57,4 @@ very first job which is executed in the workflow and then consuming that
artifact in all downstream jobs. The
[`actions/upload-artifact`](https://github.com/actions/upload-artifact) and
[`actions/download-artifact`](https://github.com/actions/download-artifact)
actions will help you accomplish this.
actions will help you accomplish this.
1 change: 1 addition & 0 deletions project with spaces/app with spaces.js
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
console.log("Hello World");
1 change: 0 additions & 1 deletion project-1/add.js
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
import createMathOperation from './.internal/createMathOperation.js'

/**
* Adds two numbers.
*
Expand Down
24 changes: 24 additions & 0 deletions python-project/list-changed-dirs.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
#!/usr/bin/env python3

#
# This script prints a JSON array containing all non-hidden subdirectories of
# the current working directory. As an example, if the current working
# directory contains the subdirectories "foo", "bar" and "baz", the output
# will be (the order of the directories is not necessarily alphabetical):
#
# ["foo", "bar", "baz"]
#
from genericpath import isdir
import json
import os

lines = list(open('./.github/scripts/diff.txt').readlines())
outlines = set()

#only add items that are directories
for line in lines:
path = line.split('/')[0]
if (os.path.isdir(path)):
outlines.add(path)

print(json.dumps(list(outlines)))
39 changes: 39 additions & 0 deletions python-project/list-changed-langs.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
#!/usr/bin/env python3
#
# This script prints a JSON array containing all the supported CodeQL programming languages based on the file extension
#
# ["foo", "bar", "baz"]
#
import json

javascript = [".js", ".jsx", ".mjs", ".es", ".es6", ".htm", ".html", ".xhtm", ".xhtml", ".vue", ".hbs", ".ejs", ".njk", ".json", ".yaml", ".yml", ".raml", ".xml"]
typescript = [".ts", ".tsx", ".mts", ".cts"]
c_and_cplus = [".cpp", ".c++", ".cxx", ".hpp", ".hh", ".h++", ".hxx", ".c," ".cc", ".h"]
csharp = [".sln", ".csproj", ".cs", ".cshtml", ".xaml"]
golang = [".go"]
python_lang = [".py"]
java = [".java"]
ruby = [".rb", ".erb", ".gemspec", "Gemfile"]


lines = list(open("./.github/scripts/diff.txt").readlines())
outlines = set()

def find_in_list(list, string):
for item in list:
if item in string:
return True
return False

#only add items that are directories
for line in lines:
if find_in_list(javascript, line):
outlines.add("javascript")
if find_in_list(typescript, line):
outlines.add("javascript")
if find_in_list(javascript, line):
outlines.add("javascript")



print(json.dumps(list(outlines)))
2 changes: 1 addition & 1 deletion .github/scripts/list-dirs → python-project/list-dirs.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,4 @@
import glob
import json

print(json.dumps(glob.glob("*/")).replace("/", ""))
print(json.dumps(glob.glob("*/")).replace("/", ""))