fix: use absolute URL for canonical link in redirect pages#6413
fix: use absolute URL for canonical link in redirect pages#6413adoodevv wants to merge 3 commits intoros2:rollingfrom
Conversation
The redirect_html_fragment was building the canonical <link> by concatenating html_baseurl with a relative URI from get_relative_uri(). This produces broken canonical URLs on mirrors that build without sphinx-multiversion, because the relative path contains '..' segments that resolve incorrectly against the base URL. Fix by computing the canonical URL as an absolute path directly from html_baseurl and the canonical document's path, which is always correct regardless of how the site is built. Fixes ros2#6112
Can you speak to this a bit more? Can you explain why you are running a third party mirror? We already have trouble getting search engines to point to the correct resources in the canonical documentation and multiple third party mirrors seems like it could exacerbate that issue. If it is a website performance issue that seems like something we can help solve. |
Thanks for the review! To clarify - I'm not running a third-party mirror. This fix was prompted by the issue reported in #6112, where @tfoote observed that That URL 404s because it's missing the The intent of this fix isn't to encourage mirrors - it's to make the canonical tag generation correct so that any mirror that does exist (regardless of whether we approve of it) at least points search engines to the right official page, rather than a 404. A correct canonical tag is the mechanism that tells search engines to prefer the official docs over the mirror, so fixing it actually helps the SEO problem you described rather than making it worse. |
|
@adoodevv thanks so much for the clarification and appreciate the help. I'll take a look. |
You're welcome. Thank you too. |
tfoote
left a comment
There was a problem hiding this comment.
Sorry this change doesn't resolve the issue.
If you follow the basic approach of make html in the README.md https://github.com/ros2/ros2_documentation#building-html
The resultant build doesn't include the distro in the canonical url testing with your branch
I see build/html/Guides/Ament-CMake-Documentation.html: <link rel="canonical" href="https://docs.ros.org/en/How-To-Guides/Ament-CMake-Documentation.html" />
Which should be
build/html/Guides/Ament-CMake-Documentation.html: <link rel="canonical" href="https://docs.ros.org/en/rolling/How-To-Guides/Ament-CMake-Documentation.html" />
The basic build needs to default in the distro version. We'll probably be best served to encode rolling as the default and then make sure the multiversion overrides it.
Fixed - html_baseurl now defaults to rolling so plain make html builds produce the correct versioned canonical URL. Verified locally: build/html/Guides/Ament-CMake-Documentation.html now shows https://docs.ros.org/en/rolling/How-To-Guides/Ament-CMake-Documentation.html. |
Plain Running Sphinx v8.2.3 loading translations [en]... done loading pickled environment... The configuration has changed (3 options: 'html_css_files', 'html_js_files', 'html_static_path') done building [mo]: targets for 0 po files that are out of date writing output... building [html]: targets for 0 source files that are out of date updating environment: 0 added, 0 changed, 0 removed reading sources... looking for now-outdated files... none found no targets are out of date. preparing documents... done copying assets... copying downloadable files... [ 10%] The-ROS2-Project/Marketing/documents/ros2-brochure-a4-web.pdf copying downloadable files... [ 20%] The-ROS2-Project/Marketing/documents/ros2-brochure-a4-print.pdf copying downloadable files... [ 30%] The-ROS2-Project/Marketing/documents/ros2-brochure-ltr-web.pdf copying downloadable files... [ 40%] The-ROS2-Project/Marketing/documents/ros2-brochure-ltr-print.pdf copying downloadable files... [ 50%] Tutorials/Advanced/Discovery-Server/scripts/generate_discovery_packages.bash copying downloadable files... [ 60%] Tutorials/Advanced/Discovery-Server/scripts/discovery_packets.py copying downloadable files... [ 70%] Tutorials/Advanced/Discovery-Server/scripts/no_intraprocess_configuration.xml copying downloadable files... [ 80%] Tutorials/Advanced/Simulators/Webots/Code/my_world.wbt copying downloadable files... [ 90%] Tutorials/Intermediate/URDF/documents/r2d2.urdf.xml copying downloadable files... [100%] Tutorials/Intermediate/URDF/documents/r2d2.rviz copying static files... Writing evaluated template result to /home/adoodevv/Documents/ros2_documentation/build/html/_static/documentation_options.js Writing evaluated template result to /home/adoodevv/Documents/ros2_documentation/build/html/_static/language_data.js Writing evaluated template result to /home/adoodevv/Documents/ros2_documentation/build/html/_static/basic.css Writing evaluated template result to /home/adoodevv/Documents/ros2_documentation/build/html/_static/js/versions.js Writing evaluated template result to /home/adoodevv/Documents/ros2_documentation/build/html/_static/copybutton.js copying static files: done copying extra files... copying extra files: done copying assets: done generating indices... genindex done writing additional pages... search done dumping search index in English (code: en)... done dumping object inventory... done sphinx-sitemap-ros: sitemap.xml was generated for URL https://docs.ros.org/en/rolling in /home/adoodevv/Documents/ros2_documentation/build/html/sitemap.xml build succeeded. The HTML pages are in build/html. builds were leaving html_baseurl as 'https://docs.ros.org/en' (no version), causing redirect pages to generate canonical links missing the distro prefix. - Default html_baseurl to 'https://docs.ros.org/en/rolling' so basic builds produce correct canonical URLs out of the box - Update smv_rewrite_configs to replace the full URL (not append) when sphinx-multiversion runs with a specific distro - Fix redirect_html_fragment to use an absolute canonical_abs_url instead of concatenating base_url with a relative path Fixes ros2#6112
|
Since the real fix is in the second commit. The tests clearly aren't catching the problem or confirming that they're resolved. And are the changes from the first commit necessary. They seem more complex that I would expect is necessary. Can you please review this to reduce it down to a minimum changes necessary and show that the tests fail before the change and pass after. |
Previous tests reimplemented the redirect logic inside the test file, meaning they would pass even if conf.py was wrong. Tests now read and exercise the actual redirect_html_fragment template and format call from conf.py directly, so they genuinely fail before the fix and pass after.
Rewrote the tests - they now read the actual On whether both changes are necessary: the |
Description
The
redirect_html_fragmentinconf.pywas building the<link rel="canonical">tag by concatenatinghtml_baseurlwith the output ofget_relative_uri(), which returns a relative path. This produces broken canonical URLs on third-party mirrors that build withoutsphinx-multiversion, because the relative path contains..segments that resolve incorrectly against the base URL.For example, the mirror at
ros.ncnynl.comwas generating:instead of the correct:
The fix computes the canonical URL as an absolute path directly from
html_baseurland the canonical document's path, which is always correct regardless of how the site is built. The relative URI is kept for the<meta refresh>and JS redirect since those work within the built site structure.A pytest test covering the regression and the fix is included.
Fixes #6112
Did you use Generative AI?
Yes, Claude (Anthropic) was used to help draft the tests.
Additional Information
The
<meta refresh>and JS redirect intentionally still use the relative URI — only the<link rel="canonical">needed to be changed to an absolute URL.