Skip to content

Commit 95895d6

Browse files
anonrigCarlosEduR
andauthored
optimize url::can_parse method (#1106)
* optimize url::can_parse method * update clang-tools to 22 * create AGENTS.md * remove unused methods * update comments & abi-check * bump SOVERSION to 5 for intentional ABI break * fix clang-tidy-22 warnings: noexcept-escape, unchecked-optional-access, throwing-static-init * address fuzzing issues * fix throwing-static-init false positive and add clang-tidy to run-clangcldocker.sh * fix docker clang-tidy: generate compile_commands.json on host, run tidy in container * fix gen_compile_commands: drop -stdlib=libc++ when using host GCC * wipe stale cmake cache before gen_compile_commands to drop old CXX_FLAGS * fix docker clang-tidy: install cmake+ninja in container, use clang++-22 to match CI exactly * wipe build-clang-tidy before docker cmake to avoid generator mismatch * install clang-22 and libc++-22-dev in docker tidy container * reduce apt-get verbosity with -qq flag * add git to docker deps for CPM to clone gtest * suppress apt/docker verbosity, fix SSL certs, cache CPM downloads on host * exclude vendored gtest from clang-tidy and update ExcludeHeaderFilterRegex * scope clang-tidy to src/ only, fix git safe.directory, simplify docker setup * fix all clang-tidy issues: scope to ada.cpp, NOLINT false positives, fix stringview usage, update AGENTS.md * remove .cpm-cache from repo, add to .gitignore * add regression tests for extra-slash fuzzer crashes (ws:///..., ws://////5...) * fix % in host: return nullopt to defer to full parser; add regression tests for all fuzzer crashes * fix port leading-zeros: strip zeros before pl>5 check; add regression tests * fix IPv4 fast path bypassing port validation; add regression test * Update CMakeLists.txt * Update CMakeLists.txt * Update CMakeLists.txt * add shortcuts for can_parse slow path * optimize even further (#1111) * Fix error in optimized can_parse (#1118) * Improve consistency in optimized can_parse (#1119) --------- Co-authored-by: Carlos Sousa <40635471+CarlosEduR@users.noreply.github.com>
1 parent 5bb6647 commit 95895d6

35 files changed

Lines changed: 1266 additions & 670 deletions

.clang-tidy

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,13 @@ Checks: >
55
-bugprone-narrowing-conversions,
66
-bugprone-suspicious-include,
77
-bugprone-unhandled-exception-at-new,
8+
-bugprone-throwing-static-initialization,
89
clang-analyzer-*,
910
-clang-analyzer-security.insecureAPI.DeprecatedOrUnsafeBufferHandling,
1011
1112
# Turn all the warnings from the checks above into errors.
1213
WarningsAsErrors: '*'
1314
# Check first-party (non-system, non-vendored) headers.
1415
HeaderFilterRegex: '.*'
15-
ExcludeHeaderFilterRegex: 'build/_deps/'
16+
ExcludeHeaderFilterRegex: '(build[^/]*/_deps/|\.cpm-cache/|/expected\.h$)'
1617
SystemHeaders: false

.github/workflows/abi-check.yml

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -37,8 +37,21 @@ jobs:
3737
- name: Find latest release tag
3838
id: baseline
3939
run: |
40-
# Find the most recent vX.Y.Z tag reachable from the current commit's history
41-
LATEST_TAG=$(git tag --list 'v*.*.*' --sort=-version:refname | head -1)
40+
# Find the most recent vX.Y.Z tag that does NOT point to the current HEAD.
41+
# Excluding HEAD ensures we compare against a previous release even when a
42+
# new release tag was just pushed to main alongside this workflow run.
43+
CURRENT_SHA=$(git rev-parse HEAD)
44+
LATEST_TAG=$(git tag --merged HEAD --list 'v*.*.*' --sort=-version:refname | while IFS= read -r tag; do
45+
TAG_SHA=$(git rev-parse "${tag}^{}" 2>/dev/null)
46+
if [ "$TAG_SHA" != "$CURRENT_SHA" ]; then
47+
echo "$tag"
48+
break
49+
fi
50+
done)
51+
if [ -z "$LATEST_TAG" ]; then
52+
echo "No previous release tag found — cannot establish a baseline."
53+
exit 1
54+
fi
4255
echo "Latest release tag: $LATEST_TAG"
4356
echo "tag=$LATEST_TAG" >> "$GITHUB_OUTPUT"
4457

.github/workflows/lint_and_format_check.yml

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ jobs:
2929
- name: Run clang-format
3030
uses: jidicula/clang-format-action@654a770daa28443dd111d133e4083e21c1075674 # v4.18.0
3131
with:
32-
clang-format-version: '17'
32+
clang-format-version: '22'
3333
fallback-style: 'Google'
3434

3535
- uses: chartboost/ruff-action@e18ae971ccee1b2d7bbef113930f00c670b78da4 # v1.0.0
@@ -38,11 +38,15 @@ jobs:
3838
version: 0.6.0
3939

4040
- name: Install clang-tidy and libc++
41-
run: sudo apt-get update && sudo apt-get install -y clang-tidy-20 libc++-20-dev libc++abi-20-dev
41+
run: |
42+
wget -qO- https://apt.llvm.org/llvm-snapshot.gpg.key | sudo tee /etc/apt/trusted.gpg.d/apt.llvm.org.asc
43+
echo "deb http://apt.llvm.org/noble/ llvm-toolchain-noble-22 main" | sudo tee /etc/apt/sources.list.d/llvm.list
44+
sudo apt-get update
45+
sudo apt-get install -y clang-tidy-22 libc++-22-dev libc++abi-22-dev
4246
4347
- name: Run clang-tidy
4448
run: >
45-
cmake -B build -DADA_TESTING=ON -DADA_USE_UNSAFE_STD_REGEX_PROVIDER=ON -DCMAKE_CXX_CLANG_TIDY=clang-tidy-20 -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DCMAKE_CXX_FLAGS="-stdlib=libc++" &&
49+
cmake -B build -DADA_TESTING=ON -DADA_USE_UNSAFE_STD_REGEX_PROVIDER=ON -DCMAKE_CXX_CLANG_TIDY=clang-tidy-22 -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -DCMAKE_CXX_FLAGS="-stdlib=libc++" &&
4650
cmake --build build -j=4
4751
env:
48-
CXX: clang++-20
52+
CXX: clang++-22

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,9 @@ build-*/
44
*-build-*
55
_fuzz_check/
66

7+
# CPM package cache (used by tools/run-clangcldocker.sh docker fallback)
8+
.cpm-cache/
9+
710
# Python cache
811
__pycache__
912
venv

CLAUDE.md renamed to AGENTS.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,19 @@
22

33
This guide provides instructions for building, testing, and benchmarking the Ada URL parser library using CMake.
44

5+
## Pre-commit Checklist
6+
7+
Always run the clang-format and clang-tidy script before committing:
8+
9+
```bash
10+
bash tools/run-clangcldocker.sh
11+
```
12+
13+
This runs clang-format on all tracked source files and clang-tidy on `src/ada.cpp`
14+
(the single translation unit that includes all first-party code). The script uses
15+
the locally installed LLVM 22 toolchain when available, otherwise falls back to the
16+
`xianpengshen/clang-tools:22` Docker image automatically.
17+
518
## Quick Reference
619

720
```bash

benchmarks/bench_protocol.cpp

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,8 @@
2121
#include "counters/bench.h"
2222

2323
template <class Function1, class Function2>
24-
counters::event_aggregate shuffle_bench(Function1 &&function1,
25-
Function2 &&function2,
24+
counters::event_aggregate shuffle_bench(Function1&& function1,
25+
Function2&& function2,
2626
size_t min_repeat = 300,
2727
size_t min_time_ns = 400'000'000,
2828
size_t max_repeat = 1000000,
@@ -77,7 +77,7 @@ constexpr uint64_t scheme_keys[] = {
7777

7878
// branchless load of up to 5 characters into a uint64_t, padding with zeros if
7979
// n < 5
80-
inline uint64_t branchless_load5(const char *p, size_t n) {
80+
inline uint64_t branchless_load5(const char* p, size_t n) {
8181
uint64_t input = (uint8_t)p[0];
8282
input |= ((uint64_t)(uint8_t)p[n > 1] << 8) & (0 - (uint64_t)(n > 1));
8383
input |= ((uint64_t)(uint8_t)p[(n > 2) * 2] << 16) & (0 - (uint64_t)(n > 2));
@@ -131,7 +131,7 @@ std::optional<SchemeType> get_scheme_type(std::string_view scheme) noexcept {
131131
return std::nullopt;
132132
}
133133

134-
double pretty_print(const std::string &name, size_t num_values,
134+
double pretty_print(const std::string& name, size_t num_values,
135135
counters::event_aggregate agg) {
136136
printf("%-50s : ", name.c_str());
137137
printf(" %5.3f ns ", agg.fastest_elapsed_ns() / double(num_values));
@@ -269,7 +269,7 @@ void collect_benchmark_results(size_t number_strings) {
269269
gen.seed(42); // reset seed to ensure same shuffle for all benchmarks
270270
}
271271

272-
int main(int argc, char **argv) {
272+
int main(int argc, char** argv) {
273273
if (!counters::has_performance_counters()) {
274274
printf(
275275
"Performance counters not available, you may need to run with sudo.\n");

benchmarks/competitors/servo-url/servo_url.h

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -16,13 +16,13 @@ struct Url;
1616

1717
extern "C" {
1818

19-
Url *parse_url(const char *raw_input, size_t raw_input_length);
19+
Url* parse_url(const char* raw_input, size_t raw_input_length);
2020

21-
void free_url(Url *raw);
21+
void free_url(Url* raw);
2222

23-
const char *parse_url_to_href(const char *raw_input, size_t raw_input_length);
23+
const char* parse_url_to_href(const char* raw_input, size_t raw_input_length);
2424

25-
void free_string(const char *);
25+
void free_string(const char*);
2626
} // extern "C"
2727

2828
} // namespace servo_url

fuzz/can_parse.cc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
#include "ada.cpp"
88
#include "ada.h"
99

10-
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
10+
extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
1111
FuzzedDataProvider fdp(data, size);
1212
std::string source = fdp.ConsumeRandomLengthString(256);
1313
std::string base_source = fdp.ConsumeRandomLengthString(256);

fuzz/idna.cc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
#include "ada.cpp"
77
#include "ada.h"
88

9-
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
9+
extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
1010
FuzzedDataProvider fdp(data, size);
1111
std::string source = fdp.ConsumeRandomLengthString(256);
1212
std::string source2 = fdp.ConsumeRandomLengthString(64);

fuzz/parse.cc

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,8 @@
88
#include "ada.cpp"
99
#include "ada.h"
1010

11-
bool is_valid_utf8_string(const char *buf, size_t len) {
12-
const uint8_t *data = reinterpret_cast<const uint8_t *>(buf);
11+
bool is_valid_utf8_string(const char* buf, size_t len) {
12+
const uint8_t* data = reinterpret_cast<const uint8_t*>(buf);
1313
uint64_t pos = 0;
1414
uint32_t code_point = 0;
1515
while (pos < len) {
@@ -93,7 +93,7 @@ bool is_valid_utf8_string(const char *buf, size_t len) {
9393
}
9494

9595
// Exercise all getters and boolean predicates on ada::url
96-
static void exercise_url_predicates(const ada::url &u) {
96+
static void exercise_url_predicates(const ada::url& u) {
9797
volatile size_t length = 0;
9898
length += u.get_href().size();
9999
length += u.get_origin().size();
@@ -119,7 +119,7 @@ static void exercise_url_predicates(const ada::url &u) {
119119
}
120120

121121
// Exercise all getters and boolean predicates on ada::url_aggregator
122-
static void exercise_aggregator_predicates(const ada::url_aggregator &u) {
122+
static void exercise_aggregator_predicates(const ada::url_aggregator& u) {
123123
volatile size_t length = 0;
124124
length += u.get_href().size();
125125
length += u.get_origin().size();
@@ -150,7 +150,7 @@ static void exercise_aggregator_predicates(const ada::url_aggregator &u) {
150150
(void)u.to_diagram();
151151
}
152152

153-
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
153+
extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
154154
FuzzedDataProvider fdp(data, size);
155155
std::string source = fdp.ConsumeRandomLengthString(256);
156156
std::string base = fdp.ConsumeRandomLengthString(256);

0 commit comments

Comments
 (0)