I've just gone through adding a notebook to the toolbox and wanted to offer some feedback. Apologies if this is a bit long and/or opinionated. I'd be very happy to have a chat on Teams at your convenience if anything below is unclear!
Minor changes, additions etc are included in a pull request , however I thought some more big picture stuff might be worth adding here since it pertains to a higher-level things that might have design/UX implications.
All comments are regarding the information provided in CONTRIBUTING.md and maybe links therein, as an idea of what a naive user (me!) might manage to work out looking at things fresh. It probably represents what someone in a bit of a hurry might see.
- There is a bit of muddling in the document over the "internal" UKCEH way of doing things (becoming a collaborator on the
data-science-toolbox repo) vs. the fork-and-PR approach that non-UKCEH folks will presumably need to follow. Perhaps this could be streamlined so that the fork-and-PR is the standard route? It doesn't seem that much more complicated and then the document might be easier to follow.
- The document has instructions and then an additional details which re-iterates a lot of the same material. Is it worth pushing those together into one place? Or making the first set of instructions much much briefer with links to the second? It's hard to know which one to follow (or come back to the right place when following) PLUS you now have two places you need to update and keep in sync when changing the documentation.
- The "Jupyter Book Description" section seems to be aimed at people contributing to the code in
data-science-toolbox rather than those contributing notebooks (as is the case with the rest of the document). Maybe it's worth separating that into a different file and having a CONTRIBUTING-NOTEBOOKS.md and a CONTRIBUTING-CORE.md (but with better names :)) to make this clear?
- Is it necessary to do both Issue creation and pull request? Or is the latter enough? Maybe that's an important bit of the process for you, I'm not sure, but minimizing steps will always make people happier.
- Creating of remote branches via GitHub. I'd never done this before, preferring to use
git branch blah/git checkout blah/git push -u origin blah process.
- I've linked to the
miniconda installation page in my PR. It's perhaps worth including some information about this in more detail as at least for me (R+Linux user, Python non-enthusiast) it was the most fraught bit of the process.
- I realise folks might have very different setups but is it work setting up a template repository which will include things like a template
CITATION.cff file, some reminders like including the .ipynb, a template .qmd/.Rmd etc?
- On that note, having a bit of guidance for R users might be nice (I'd be happy to write this but I'm not sure where it would go at the moment) as it seems that for RMarkdown/Quarto users there is a very easy path to generate the
.ipynb files that we need that doesn't involve knowing anything about Jupyter. This again might encourage uptake.
- Really up to you, but you might in the long-term prefer if the various places where you list your e-mail address you instead link to the issues in this repo. That can reduce duplication (as multiple users with the same issue can help each other either interactively or via reading previous questions).
Ultra-minor stuff
- You refer to "UKCEH Data Science Book" and "Data Science Toolbox" this is a bit confusing. Is it one or the other or both?
- Link to
https://jez-carter.github.io/UKCEH_Data_Science_Book/notebooks/methods/template.html seems to be broken.
- The link to "Jupyter Books 101" seems to go to a weirdly long Google search URL that doesn't seem to go anywhere for me.
I've just gone through adding a notebook to the toolbox and wanted to offer some feedback. Apologies if this is a bit long and/or opinionated. I'd be very happy to have a chat on Teams at your convenience if anything below is unclear!
Minor changes, additions etc are included in a pull request , however I thought some more big picture stuff might be worth adding here since it pertains to a higher-level things that might have design/UX implications.
All comments are regarding the information provided in
CONTRIBUTING.mdand maybe links therein, as an idea of what a naive user (me!) might manage to work out looking at things fresh. It probably represents what someone in a bit of a hurry might see.data-science-toolboxrepo) vs. the fork-and-PR approach that non-UKCEH folks will presumably need to follow. Perhaps this could be streamlined so that the fork-and-PR is the standard route? It doesn't seem that much more complicated and then the document might be easier to follow.data-science-toolboxrather than those contributing notebooks (as is the case with the rest of the document). Maybe it's worth separating that into a different file and having aCONTRIBUTING-NOTEBOOKS.mdand aCONTRIBUTING-CORE.md(but with better names :)) to make this clear?git branch blah/git checkout blah/git push -u origin blahprocess.minicondainstallation page in my PR. It's perhaps worth including some information about this in more detail as at least for me (R+Linux user, Python non-enthusiast) it was the most fraught bit of the process.CITATION.cfffile, some reminders like including the.ipynb, a template.qmd/.Rmdetc?.ipynbfiles that we need that doesn't involve knowing anything about Jupyter. This again might encourage uptake.Ultra-minor stuff
https://jez-carter.github.io/UKCEH_Data_Science_Book/notebooks/methods/template.htmlseems to be broken.