Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
129 changes: 129 additions & 0 deletions .cursor/rules/commit-message.mdc
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
---
description: Use when user wants to git commit
globs:
alwaysApply: false
---
# Commit

Create well-formatted commits that comply with Conventional Commits v1.0.0 and pass `@commitlint/config-conventional`.

## Features:
- Runs pre-commit checks by default (lint, build, generate docs)
- Automatically stages files if none are staged
- Uses the Conventional Commits format and validates message structure
- Suggests splitting commits for different concerns

## Usage:
- `/commit` - Standard commit with pre-commit checks
- `/commit --no-verify` - Skip pre-commit checks

## Message Format

```
<type>[optional scope][!]: <description>

[optional body]

[optional footer(s)]
```

### Unified Rules (Spec + Commitlint)
1. Start with `type[optional scope][!]: subject`.
2. `type` MUST be lower-case and one of: `build`, `chore`, `ci`, `docs`, `feat`, `fix`, `perf`, `refactor`, `revert`, `style`, `test`.
3. Use `feat` for new features; use `fix` for bug fixes.
4. `scope` MAY be provided in parentheses and SHOULD be lower-case (e.g., `fix(parser):`).
5. `subject` MUST be present, written in imperative mood, and SHOULD NOT end with a period.
6. Keep header length ≤ 72 characters where practical.
7. If a body is present, it MUST be separated by a blank line; body is free-form.
8. If footers are present, they MUST be separated by a blank line; use tokens like `Refs`, `Closes`, `Reviewed-by`.
9. Footer tokens MUST use `-` instead of spaces, except `BREAKING CHANGE`, which MAY contain a space. `BREAKING-CHANGE` is synonymous with `BREAKING CHANGE`.
10. Breaking changes MUST be indicated either by `!` in the header or by a `BREAKING CHANGE:` footer. If `!` is used, the subject SHOULD describe the breaking change.

### Types
- feat: Introduces a new feature (MINOR)
- fix: Patches a bug (PATCH)
- build: Build system or external dependencies
- chore: Other changes that don’t modify src or test files
- ci: CI configuration files and scripts
- docs: Documentation only changes
- perf: Improves performance without functional change
- refactor: Code change that neither fixes a bug nor adds a feature
- revert: Reverts a previous commit
- style: Changes that do not affect the meaning of the code
- test: Adding or correcting tests

### Breaking Changes
- Indicate with `!` after type/scope, e.g., `feat(api)!: ...`, or add a footer:
- `BREAKING CHANGE: <description>`
- `BREAKING-CHANGE` is synonymous with `BREAKING CHANGE` in footers.

### Canonical Examples
```
feat: allow provided config object to extend other configs

BREAKING CHANGE: `extends` key in config file is now used for extending other config files
```

```
feat!: send an email to the customer when a product is shipped
```

```
feat(api)!: send an email to the customer when a product is shipped
```

```
chore!: drop support for Node 6

BREAKING CHANGE: use JavaScript features not available in Node 6.
```

```
docs: correct spelling of CHANGELOG
```

```
fix: prevent racing of requests

Introduce a request id and a reference to latest request. Dismiss
incoming responses other than from latest request.

Remove timeouts which were used to mitigate the racing issue but are
obsolete now.

Reviewed-by: Z
Refs: #123
```

## Process:
1. Check for staged changes (`git status`)
2. If no staged changes, review and stage appropriate files
3. Run pre-commit checks (unless --no-verify)
4. Analyze changes to determine commit type
5. Generate descriptive commit message
6. Include scope if applicable: `type(scope): description`
7. Add body for complex changes explaining why
8. Add footers (e.g., `BREAKING CHANGE:`) if needed
9. Execute commit

## Quick Validation Checklist
- [ ] Type is allowed and lower-case
- [ ] Optional scope is lower-case and in parentheses
- [ ] Subject present, imperative, no trailing period
- [ ] Header ideally ≤ 72 chars
- [ ] Blank line before body (if any)
- [ ] Blank line before footer(s) (if any)
- [ ] Breaking change indicated via `!` or `BREAKING CHANGE:` footer

## Best Practices:
- Keep commits atomic and focused
- Write in imperative mood ("Add feature" not "Added feature")
- Explain why, not just what
- Reference issues/PRs when relevant
- Split unrelated changes into separate commits
- Prefer recognized types; avoid nonstandard types that hinder tooling
- Follow the spec for breaking changes and footers; prefer concise, clear subjects

References:
- Conventional Commits 1.0.0 — https://www.conventionalcommits.org/en/v1.0.0/
- @commitlint/config-conventional — https://github.com/conventional-changelog/commitlint/tree/master/%40commitlint/config-conventional
178 changes: 178 additions & 0 deletions .cursor/rules/lessons-learned.mdc
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
---
alwaysApply: true
---

# Lessons Learned - Kindle Parsing Bug Fix

## Character Encoding Issues

### Problem
- HTML content from Kindle exports contains special characters that don't match test expectations
- Non-breaking spaces (`&nbsp;` / `\u00A0`) appear identical to regular spaces but are different characters
- Curly quotes (`'` / `"`) vs straight quotes (`'` / `"`)
- En/em dashes (`–` / `—`) vs regular hyphens (`-`)

### Solution
- Always implement text normalization for HTML parsing
- Create a `normalizeText()` function that handles common character encoding differences
- Apply normalization to both content text and metadata (chapter names, etc.)

### Code Pattern
```typescript
const normalizeText = (text: string): string => {
return text
.replace(/\u00A0/g, " ") // Non-breaking spaces
.replace(/\u2019/g, "'") // Curly single quotes
.replace(/\u2018/g, "'") // Curly single quotes
.replace(/\u201D/g, '"') // Curly double quotes
.replace(/\u201C/g, '"') // Curly double quotes
.replace(/\u2013/g, "-") // En dashes
.replace(/\u2014/g, "-") // Em dashes
.trim();
};
```

## Testing Best Practices

### Always Test Features/Fixes
- **Never skip testing** - every feature or fix must have corresponding tests
- **Test with real data** - use actual HTML fixtures from different languages/formats
- **Test edge cases** - empty content, missing elements, malformed HTML
- **Test character encoding** - especially for international content

### Test Structure
- Test the first N items (e.g., first 5 highlights) to ensure parsing works consistently
- Validate all fields: text, color, page, location, chapter, notes
- Test both positive cases (valid data) and negative cases (missing/invalid data)

### Test Setup
- Use Vitest with jsdom environment for DOM parsing tests
- Disable plugins that conflict with test environment (e.g., logseq plugin)
- Use absolute paths for fixture files to avoid path resolution issues

## Parsing Robustness

### Language Agnostic Design
- Don't rely on specific text strings like "Highlight" or "Note"
- Use structural elements (CSS classes, DOM hierarchy) for identification
- Support multilingual labels for page numbers, locations, etc.

### Error Handling
- Always check for null/undefined before accessing properties
- Use optional chaining (`?.`) instead of non-null assertions (`!`)
- Provide fallback values for missing data

### DOM Navigation
- Use `nextElementSibling` and `previousElementSibling` for related content
- Check element classes rather than text content for identification
- Handle cases where expected elements might be missing

## Code Quality

### Linting
- Always run linter after making changes
- Fix regex usage (use `exec()` instead of `match()`)
- Remove unnecessary type assertions
- Use optional chaining where appropriate

### Documentation
- Document complex parsing logic with comments
- Explain the reasoning behind structural decisions
- Note language-specific considerations

## Key Takeaways

1. **Character encoding is critical** for international content
2. **Always test with real data** from the target environment
3. **Structural parsing** is more reliable than text-based parsing
4. **Error handling** should be defensive and graceful
5. **Test coverage** should include multiple languages and edge cases
6. **Code quality** (linting, type safety) prevents future bugs

# Lessons Learned - Kindle Parsing Bug Fix

## Character Encoding Issues

### Problem
- HTML content from Kindle exports contains special characters that don't match test expectations
- Non-breaking spaces (`&nbsp;` / `\u00A0`) appear identical to regular spaces but are different characters
- Curly quotes (`'` / `"`) vs straight quotes (`'` / `"`)
- En/em dashes (`–` / `—`) vs regular hyphens (`-`)

### Solution
- Always implement text normalization for HTML parsing
- Create a `normalizeText()` function that handles common character encoding differences
- Apply normalization to both content text and metadata (chapter names, etc.)

### Code Pattern
```typescript
const normalizeText = (text: string): string => {
return text
.replace(/\u00A0/g, " ") // Non-breaking spaces
.replace(/\u2019/g, "'") // Curly single quotes
.replace(/\u2018/g, "'") // Curly single quotes
.replace(/\u201D/g, '"') // Curly double quotes
.replace(/\u201C/g, '"') // Curly double quotes
.replace(/\u2013/g, "-") // En dashes
.replace(/\u2014/g, "-") // Em dashes
.trim();
};
```

## Testing Best Practices

### Always Test Features/Fixes
- **Never skip testing** - every feature or fix must have corresponding tests
- **Test with real data** - use actual HTML fixtures from different languages/formats
- **Test edge cases** - empty content, missing elements, malformed HTML
- **Test character encoding** - especially for international content

### Test Structure
- Test the first N items (e.g., first 5 highlights) to ensure parsing works consistently
- Validate all fields: text, color, page, location, chapter, notes
- Test both positive cases (valid data) and negative cases (missing/invalid data)

### Test Setup
- Use Vitest with jsdom environment for DOM parsing tests
- Disable plugins that conflict with test environment (e.g., logseq plugin)
- Use absolute paths for fixture files to avoid path resolution issues

## Parsing Robustness

### Language Agnostic Design
- Don't rely on specific text strings like "Highlight" or "Note"
- Use structural elements (CSS classes, DOM hierarchy) for identification
- Support multilingual labels for page numbers, locations, etc.

### Error Handling
- Always check for null/undefined before accessing properties
- Use optional chaining (`?.`) instead of non-null assertions (`!`)
- Provide fallback values for missing data

### DOM Navigation
- Use `nextElementSibling` and `previousElementSibling` for related content
- Check element classes rather than text content for identification
- Handle cases where expected elements might be missing

## Code Quality

### Linting
- Always run linter after making changes
- Fix regex usage (use `exec()` instead of `match()`)
- Remove unnecessary type assertions
- Use optional chaining where appropriate

### Documentation
- Document complex parsing logic with comments
- Explain the reasoning behind structural decisions
- Note language-specific considerations

## Key Takeaways

1. **Character encoding is critical** for international content
2. **Always test with real data** from the target environment
3. **Structural parsing** is more reliable than text-based parsing
4. **Error handling** should be defensive and graceful
5. **Test coverage** should include multiple languages and edge cases
6. **Code quality** (linting, type safety) prevents future bugs

39 changes: 39 additions & 0 deletions .github/workflows/pr.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
name: PR CI

on:
pull_request:
types: [opened, synchronize, reopened, edited, ready_for_review]

jobs:
ci:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4
with:
fetch-depth: 0

- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: '20'
cache: 'npm'

- name: Install dependencies
run: npm install

- name: Commitlint
uses: wagoid/commitlint-github-action@v6
with:
configFile: commitlint.config.mjs

- name: Lint
run: npm run lint

- name: Test
run: npm run test

- name: Build
run: npm run build


5 changes: 4 additions & 1 deletion .github/workflows/publish.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,10 @@ jobs:
- name: install dependencies
run: |
npm install
- name: build and test
- name: test
run: |
npm run test
- name: build
run: |
npm run build
- name: Install zip
Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,3 +29,5 @@ If you find that something isn't working right then I'm always happy to hear it

## ☕ Thank you!
A big thank you to the creators of the awesome logseq application :)

<a href="https://www.buymeacoffee.com/nicdun" rel="nofollow"><img src="https://user-images.githubusercontent.com/3909046/150683481-be070424-7bb0-4dd7-a3cb-43b5605163f5.png" alt="buymeacoffee-button" style="max-width: 100%;"></a>
3 changes: 3 additions & 0 deletions commitlint.config.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
export default {
extends: ["@commitlint/config-conventional"],
};
Loading