Conversation
Signed-off-by: 5hojib <yesiamshojib@gmail.com>
Signed-off-by: 5hojib <yesiamshojib@gmail.com>
Signed-off-by: 5hojib <yesiamshojib@gmail.com>
Reviewer's GuideAdds included-extension support, RSS-aware behavior, CPU/FFmpeg throttling, Telegram client refactor, and a variety of robustness and UX improvements across downloads, uploads, RSS, and configuration handling. Class diagram for listener, config, and extension filteringclassDiagram
class Config {
+str EXCLUDED_EXTENSIONS
+str INCLUDED_EXTENSIONS
+str DEFAULT_UPLOAD
+dict FFMPEG_CMDS
+str NAME_SUBSTITUTE
}
class Globals {
+list excluded_extensions
+list included_extensions
+int cpu_no
+int threads
+str cores
+str DOWNLOAD_DIR
}
class Listener {
+bool is_leech
+bool is_rss
+bool stop_duplicate
+bool select
+bool same_dir
+str up_dest
+str dir
+str up_dir
+str name
+int size
+list excluded_extensions
+list included_extensions
+bool name_sub
+method before_start()
+method on_download_complete()
+method proceed_ffmpeg(dl_path, gid)
+method proceed_watermark(up_path)
+method substitute(path)
}
class TaskListener {
+method on_download_complete()
+method on_upload_complete()
}
class FilesUtils {
+async remove_excluded_files(fpath, ee)
+async remove_non_included_files(fpath, ie)
}
class GDriveDownload {
+str G_DRIVE_DIR_MIME_TYPE
+method _download_folder(folder_id, path, folder_name)
+method _download_file(file_id, path, filename, mime_type)
}
class GDriveClone {
+int total_files
+int proc_bytes
+method _clone_folder(folder_name, folder_id, dest_id)
+method _copy_file(file_id, dest_id)
}
class RcloneTransfer {
-int _sa_index
-int _sa_number
-bool _use_service_accounts
+method _get_updated_command(source, destination, method)
}
Config <.. Globals : initializes
Globals <.. Listener : uses
Listener <|-- TaskListener
TaskListener --> FilesUtils : calls
TaskListener --> Listener : uses fields
GDriveDownload --> Listener : uses included_extensions
GDriveClone --> Listener : uses included_extensions
RcloneTransfer --> Listener : uses included_extensions
FilesUtils ..> Globals : uses excluded_extensions
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Summary of ChangesHello @5hojib, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request focuses on a significant refactoring of core components, primarily renaming the Telegram client module and standardizing file paths across the application. It introduces new file filtering capabilities with Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Hey - I've found 8 issues, and left some high level feedback:
- Hard-coding
cpu_no = 8,threads = 4, andcores = "0,1,2,3"inbot/__init__.pyinstead of deriving them fromos.cpu_count()makes the FFmpeg/taskset logic non-portable and potentially suboptimal on different machines; consider computing these dynamically from the actual CPU topology. - The new URL concatenation for index links (e.g.,
f"{INDEX_URL}findpath?id={dir_id}"instead of adding"/findpath...") assumesINDEX_URL/index_urlsalways end with a trailing slash; it would be safer to normalize or join these paths to avoid double/missing slashes with existing configs. - Several places switched from catching specific exceptions to a bare
except:(e.g.,rss_monitor,update_nzb_options, parts of the NZB status and yt-dlp code), which can unintentionally swallow critical errors likeKeyboardInterruptand make debugging harder; consider limiting these to concrete exception types where possible.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- Hard-coding `cpu_no = 8`, `threads = 4`, and `cores = "0,1,2,3"` in `bot/__init__.py` instead of deriving them from `os.cpu_count()` makes the FFmpeg/taskset logic non-portable and potentially suboptimal on different machines; consider computing these dynamically from the actual CPU topology.
- The new URL concatenation for index links (e.g., `f"{INDEX_URL}findpath?id={dir_id}"` instead of adding `"/findpath..."`) assumes `INDEX_URL`/`index_urls` always end with a trailing slash; it would be safer to normalize or join these paths to avoid double/missing slashes with existing configs.
- Several places switched from catching specific exceptions to a bare `except:` (e.g., `rss_monitor`, `update_nzb_options`, parts of the NZB status and yt-dlp code), which can unintentionally swallow critical errors like `KeyboardInterrupt` and make debugging harder; consider limiting these to concrete exception types where possible.
## Individual Comments
### Comment 1
<location> `bot/helper/mirror_leech_utils/download_utils/yt_dlp_download.py:134` </location>
<code_context>
if not from_queue:
await self._listener.on_download_start()
- if self._listener.multi <= 1:
+ if self._listener.multi <= 1 and not listener.is_rss:
await send_status_message(self._listener.message)
</code_context>
<issue_to_address>
**issue (bug_risk):** Undefined `listener` reference will raise at runtime in `_on_download_start`.
`listener` isn’t defined in this scope, so this condition will raise a `NameError` on first execution. To match the other helpers that gate status messages on RSS, this should reference `self._listener.is_rss`:
```python
if self._listener.multi <= 1 and not self._listener.is_rss:
await send_status_message(self._listener.message)
```
</issue_to_address>
### Comment 2
<location> `bot/modules/bot_settings.py:226-231` </location>
<code_context>
drives_ids.insert(0, value)
elif key == "INDEX_URL":
if drives_names and drives_names[0] == "Main":
- index_urls[0] = value.strip("/")
+ index_urls[0] = value
else:
- index_urls.insert(0, value.strip("/"))
</code_context>
<issue_to_address>
**issue (bug_risk):** Changing INDEX_URL normalization breaks later concatenation with `findpath`.
Previously `INDEX_URL` was normalized with `strip('/')` and concatenated as `f"{INDEX_URL}/findpath..."`, which guaranteed a single `/`.
Now you store `value` as‑is and build links as `f"{INDEX_URL}findpath?id=..."`, so behavior depends on whether the configured URL ends with `/` (e.g. `https://index` → `https://indexfindpath...`).
Either normalize and keep `"/findpath"`, or always force a trailing slash when storing/formatting, e.g.:
```python
INDEX_URL = INDEX_URL.rstrip("/") + "/"
share_url = f"{INDEX_URL}findpath?id={dir_id}"
```
</issue_to_address>
### Comment 3
<location> `bot/helper/listeners/task_listener.py:659-662` </location>
<code_context>
INDEX_URL = Config.INDEX_URL
if INDEX_URL:
- share_url = f"{INDEX_URL}/findpath?id={dir_id}"
+ share_url = f"{INDEX_URL}findpath?id={dir_id}"
buttons.url_button("Index Link", share_url)
if mime_type.startswith(("image", "video", "audio")):
- share_urls = (
- f"{INDEX_URL}/findpath?id={dir_id}&view=true"
- )
+ share_urls = f"{INDEX_URL}findpath?id={dir_id}&view=true"
buttons.url_button("🌐 View Link", share_urls)
button = buttons.build_menu(2)
</code_context>
<issue_to_address>
**issue (bug_risk):** Direct concatenation of INDEX_URL with `findpath` assumes trailing slash and can produce invalid URLs.
If `INDEX_URL` doesn’t already end with `/`, this will produce an invalid URL like `https://indexfindpath?...`, and the recent change to stop stripping slashes makes this more likely.
Please normalize `INDEX_URL` once (e.g., ensure a single trailing `/`) or use `urljoin` when building these links so correctness doesn’t depend on how the config value is typed.
</issue_to_address>
### Comment 4
<location> `bot/helper/common.py:655-656` </location>
<code_context>
else:
self.tag, id_ = text[1].split("Tag: ")[1].split()
- self.user = self.message.from_user = await self.client.get_users(id_)
+ self.user = self.message.from_user = await self.client.get_users(
+ int(id_)
+ )
self.user_id = self.user.id
</code_context>
<issue_to_address>
**question (bug_risk):** Casting RSS tag ID to int may break non‑numeric identifiers.
Previously `get_users(id_)` worked with whatever was parsed from the tag (e.g., username or stringified ID). Forcing `int(id_)` assumes the tag is always a numeric Telegram ID and will raise `ValueError` for `@username` or other non-numeric values, breaking existing feeds.
If both numeric IDs and usernames should be supported, you could fall back to the original string on `ValueError`:
```python
try:
id_or_username = int(id_)
except ValueError:
id_or_username = id_
self.user = self.message.from_user = await self.client.get_users(id_or_username)
```
Alternatively, if only numeric IDs are valid, enforce and validate that earlier in the parsing logic with a clear error message.
</issue_to_address>
### Comment 5
<location> `bot/__init__.py:73` </location>
<code_context>
LOGGER = getLogger(__name__)
-cpu_no = os.cpu_count()
+cpu_no = 8
+threads = 4
+cores = "0,1,2,3"
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Hard‑coding `cpu_no`, `threads`, and `cores` reduces portability and may misconfigure ffmpeg usage.
These fixed values are now fed directly into ffmpeg/`taskset`. On machines with different core counts, this can oversubscribe CPUs or reference non‑existent cores, causing failures. Please derive them from `os.cpu_count()` (optionally with a config override) and compute `threads`/`cores` dynamically with sensible limits.
Suggested implementation:
```python
LOGGER = getLogger(__name__)
_cpu_count = os.cpu_count() or 1
# Allow overriding detected CPU count via environment, but never exceed the real count
cpu_no = int(os.getenv("CPU_NO", _cpu_count))
if cpu_no > _cpu_count:
cpu_no = _cpu_count
# Default threads: do not exceed detected/overridden CPU count; allow env override
_default_threads = max(1, min(cpu_no, _cpu_count))
threads = int(os.getenv("FFMPEG_THREADS", _default_threads))
if threads > cpu_no:
threads = cpu_no
# Core pinning: allow explicit override, otherwise pin to the first `threads` cores
_cores_override = os.getenv("FFMPEG_CORES")
if _cores_override:
cores = _cores_override
else:
cores = ",".join(str(i) for i in range(min(threads, _cpu_count)))
```
1. Ensure `import os` is present near the top of `bot/__init__.py`:
- If missing, add `import os` alongside the other imports.
2. If your project uses a configuration system instead of environment variables, you may want to replace the `os.getenv(...)` calls with the appropriate config accessors (e.g., `settings.CPU_NO`, etc.) while keeping the same fallback and clamping logic.
</issue_to_address>
### Comment 6
<location> `docs/CONFIGURATIONS.md:22` </location>
<code_context>
| `SUDO_USERS` | `str` | User IDs with sudo permission. Separate by spaces. |
| `UPLOAD_PATHS` | `dict` | Dict with upload paths. Example: `{"path 1": "remote:", "path 2": "gdrive id", ...}` |
| `DEFAULT_UPLOAD` | `str` | `rc` for `RCLONE_PATH`, `gd` for `GDRIVE_ID`. Default: `rc`. [Read More](https://github.com/anasty17/mirror-leech-telegram-bot/tree/master#upload). |
+| `INCLUDED_EXTENSIONS` | `str` | File extensions to be uploaded/cloned. `EXCLUDED_EXTENSIONS` will be ignore if you filled this ! Separate them by spaces. |
| `EXCLUDED_EXTENSIONS` | `str` | File extensions to skip during processing. Separate by spaces. |
| `INCOMPLETE_TASK_NOTIFIER`| `bool` | Notify after restart for incomplete tasks. Requires `DATABASE_URL` and the bot to be in a supergroup. Default: `False`. |
</code_context>
<issue_to_address>
**issue (typo):** Fix grammar: use "ignored" instead of "ignore" in this sentence.
Change "`EXCLUDED_EXTENSIONS` will be ignore" to "`EXCLUDED_EXTENSIONS` will be ignored" to correct the grammar.
```suggestion
| `INCLUDED_EXTENSIONS` | `str` | File extensions to be uploaded/cloned. `EXCLUDED_EXTENSIONS` will be ignored if you filled this ! Separate them by spaces. |
```
</issue_to_address>
### Comment 7
<location> `bot/helper/common.py:289` </location>
<code_context>
self.up_dest = Config.UPLOAD_PATHS[self.up_dest]
- if self.ffmpeg_cmds and not isinstance(self.ffmpeg_cmds, list):
+ if self.ffmpeg_cmds:
if self.user_dict.get("FFMPEG_CMDS", None):
ffmpeg_dict = deepcopy(self.user_dict["FFMPEG_CMDS"])
</code_context>
<issue_to_address>
**issue (complexity):** Consider extracting small helper functions for FFmpeg command resolution, upload destination normalization, permission checks, and FFmpeg command building to simplify branching and reduce duplication.
You can reduce the new complexity significantly by extracting a few small helpers. Here are concrete, focused refactors that keep behavior but simplify branching.
---
### 1. FFmpeg command resolution (`self.ffmpeg_cmds`)
Right now `before_start` mixes:
* source selection (user vs config)
* normalization of `self.ffmpeg_cmds` entries
* placeholder substitution
You can split this into 3 helpers:
```python
def _get_ffmpeg_source_cmds(self) -> dict:
return (
deepcopy(self.user_dict.get("FFMPEG_CMDS"))
or deepcopy(Config.FFMPEG_CMDS or {})
)
def _flatten_ffmpeg_keys(self) -> list[str]:
keys = []
for key in self.ffmpeg_cmds:
if isinstance(key, tuple):
keys.extend(key)
else:
keys.append(key)
return keys
def _expand_ffmpeg_cmds(self, source_cmds: dict, keys: list[str]) -> list[str]:
cmds: list[str] = []
ff_vars = self.user_dict.get("FFMPEG_VARIABLES", {})
for key in keys:
if key not in source_cmds:
continue
for ind, template in enumerate(source_cmds[key]):
variables = set(findall(r"\{(.*?)\}", template))
if not variables:
cmds.append(template)
continue
values = ff_vars.get(key, {}).get(str(ind), {})
if Counter(variables) == Counter(values.keys()):
cmds.append(template.format(**values))
return cmds
```
Then in `before_start`:
```python
if self.ffmpeg_cmds:
source_cmds = _get_ffmpeg_source_cmds(self)
if not source_cmds:
self.ffmpeg_cmds = []
else:
keys = _flatten_ffmpeg_keys(self)
self.ffmpeg_cmds = _expand_ffmpeg_cmds(self, source_cmds, keys)
```
This removes the nested `ffmpeg_dict is None` logic and makes each concern testable in isolation.
---
### 2. Upload destination normalization (`self.up_dest`)
The destination and token-prefix logic can be encapsulated so `before_start` doesn’t have to know about gdrive vs rclone vs prefixes:
```python
def resolve_upload_destination(up_dest: str, user_dict: dict) -> str:
# resolve alias
user_paths = user_dict.get("UPLOAD_PATHS") or {}
if up_dest in user_paths:
up_dest = user_paths[up_dest]
elif Config.UPLOAD_PATHS and up_dest in Config.UPLOAD_PATHS:
up_dest = Config.UPLOAD_PATHS[up_dest]
if up_dest in ("rcl", "gdl", "gofile"):
return up_dest
# normalize tokens/prefixes
if is_gdrive_id(up_dest):
if not up_dest.startswith(("mtp:", "tp:", "sa:")) and user_dict.get("USER_TOKENS"):
up_dest = f"mtp:{up_dest}"
elif is_rclone_path(up_dest):
if not up_dest.startswith("mrcc:") and user_dict.get("USER_TOKENS"):
up_dest = f"mrcc:{up_dest}"
up_dest = up_dest.strip("/")
return up_dest
```
Use it like:
```python
if chosen_service not in ["yt", "gofile"] and not self.up_dest:
raise ValueError(...)
self.up_dest = resolve_upload_destination(self.up_dest, self.user_dict)
if self.up_dest not in ["rcl", "gdl"]:
await self.is_token_exists(self.up_dest, "up")
```
This clarifies the flow: resolve alias → normalize path → verify token.
---
### 3. Transmission / permission checks
The two branches (user session vs bot session) can share a common validator and reduce repetition of “turn off user_transmission / hybrid_leech” logic:
```python
async def _validate_chat_admin(self, chat, client_user_id, require_strict: bool, what: str) -> bool:
if chat is None:
LOGGER.warning("%s: chat not found", what)
return False
if chat.type.name not in ["SUPERGROUP", "CHANNEL", "GROUP", "FORUM"]:
LOGGER.warning("%s: destination is not a group/channel", what)
return False
if not chat.is_admin:
LOGGER.warning("%s: client is not admin in this chat", what)
return False
member = await chat.get_member(client_user_id)
if not (member.privileges.can_manage_chat and member.privileges.can_delete_messages):
if require_strict:
LOGGER.warning(
"%s: enable manage chat and delete messages for this client",
what,
)
return False
return True
```
Then:
```python
if self.user_transmission:
chat = await TgClient.user.get_chat(self.up_dest)
ok = await self._validate_chat_admin(
chat, TgClient.user.me.id, True, "user session"
)
if not ok:
self.user_transmission = False
self.hybrid_leech = False
if not self.user_transmission or self.hybrid_leech:
chat = await self.client.get_chat(self.up_dest)
if chat is None:
if self.user_transmission:
self.hybrid_leech = False
else:
raise ValueError("Chat not found!")
elif chat.type.name in ["SUPERGROUP", "CHANNEL", "GROUP", "FORUM"]:
ok = await self._validate_chat_admin(
chat, self.client.me.id, False, "bot"
)
if not ok and not self.user_transmission:
raise ValueError("You don't have enough privileges in this chat!")
if not ok:
self.hybrid_leech = False
else:
...
```
This keeps all the rules but centralizes them in one place.
---
### 4. `proceed_ffmpeg` command building
You now have duplicated “replace `mltb` → resolve telegram links” both for single-file and multi-file branches. A helper that builds the actual command for a given file path keeps `proceed_ffmpeg` easier to reason about:
```python
def _build_ffmpeg_cmd_for_file(self, base_cmd: list[str], file_path: str, inputs: dict) -> list[str]:
cmd = base_cmd.copy()
input_indexes = [i for i, v in enumerate(cmd) if v == "-i"]
for index in input_indexes:
value = cmd[index + 1]
if value.startswith("mltb"):
cmd[index + 1] = file_path
elif is_telegram_link(value):
msg = asyncio.get_event_loop().run_until_complete(get_tg_link_message(value))[0]
file_dir = asyncio.get_event_loop().run_until_complete(temp_download(msg))
inputs[index + 1] = file_dir
cmd[index + 1] = file_dir
return cmd
```
Use it for single-file:
```python
var_cmd = self._build_ffmpeg_cmd_for_file(cmd, file_path, inputs)
res = await ffmpeg.ffmpeg_cmds(var_cmd, file_path)
```
And for multi-file loop:
```python
for f_path in self.files_to_proceed:
if not self._matches_ext(f_path, ext, is_video, is_audio):
continue
self.proceed_count += 1
var_cmd = self._build_ffmpeg_cmd_for_file(cmd, f_path, inputs)
...
res = await ffmpeg.ffmpeg_cmds(var_cmd, f_path)
```
You can move the ext/type check into `_matches_ext` for further clarity if desired.
---
These extractions are small, localized changes that should keep all behavior intact while reducing branching and duplication in the touched areas.
</issue_to_address>
### Comment 8
<location> `bot/helper/mirror_leech_utils/download_utils/yt_dlp_download.py:239` </location>
<code_context>
else:
self._ext = f".{audio_format}"
+ if not self._listener.is_leech or self._listener.thumbnail_layout:
+ self.opts["writethumbnail"] = False
+
</code_context>
<issue_to_address>
**issue (complexity):** Consider separating thumbnail download, embedding, and storage decisions into dedicated flags and helper methods instead of overloading the `writethumbnail` option in multiple places.
You can decouple the thumbnail concerns and make this easier to reason about without changing behavior by:
1. Introducing internal, derived flags (`_download_thumbnail`, `_embed_thumbnail`) instead of overloading `writethumbnail`.
2. Isolating the `outtmpl` construction (and `start_path`) into a helper that uses those flags.
3. Keeping `keep_thumb` only as “where to store thumbnails”, not “whether we write them”.
### 1. Derive internal thumbnail flags once
Right now `self.opts["writethumbnail"]` is used as:
- user option,
- “we already have thumbnail” flag for `EmbedThumbnail`,
- implicit driver of `keep_thumb` via `_set_options`.
You can keep all behavior but separate concerns:
```python
def _compute_thumbnail_flags(self, options):
# Preserve existing override behavior
writethumbnail_opt = options.get("writethumbnail", self.opts.get("writethumbnail", False))
# Listener rules: same logic as today
if not self._listener.is_leech or self._listener.thumbnail_layout:
writethumbnail_effective = False
else:
writethumbnail_effective = bool(writethumbnail_opt)
# Directory decision kept separate
self.keep_thumb = bool(options.get("writethumbnail")) or self.keep_thumb
# Internal flags used everywhere else
self._download_thumbnail = writethumbnail_effective
self._embed_thumbnail = self._ext in [
".mp3", ".mkv", ".mka", ".ogg", ".opus",
".flac", ".m4a", ".mp4", ".mov", ".m4v",
]
# Reflect effective value back to yt-dlp options
self.opts["writethumbnail"] = writethumbnail_effective
```
Call this early in `add_download`:
```python
if options:
self._compute_thumbnail_flags(options)
self._set_options(options)
else:
self._compute_thumbnail_flags({})
```
And drop the inline override:
```python
- if not self._listener.is_leech or self._listener.thumbnail_layout:
- self.opts["writethumbnail"] = False
```
### 2. Centralize `outtmpl` building
Currently `outtmpl` is built with `thumbnail` in all branches. You can only add the `thumbnail` key when `_download_thumbnail` is true, and reuse long templates to avoid duplication:
```python
def _build_outtmpl(self, path, base_name, options):
start_path = path if self.keep_thumb else f"{path}/yt-dlp-thumb"
if self.is_playlist:
default_tpl = "%(title,fulltitle,alt_title)s%(season_number& |)s%(season_number&S|)s%(season_number|)02d%(episode_number&E|)s%(episode_number|)02d%(height& |)s%(height|)s%(height&p|)s%(fps|)s%(fps&fps|)s%(tbr& |)s%(tbr|)d.%(ext)s"
self.opts["outtmpl"] = {
"default": f"{path}/{self._listener.name}/{default_tpl}",
}
if self._download_thumbnail:
self.opts["outtmpl"]["thumbnail"] = f"{start_path}/{default_tpl}"
return
if "download_ranges" in options:
section_tpl = "%(section_number|)s%(section_number&.|)s%(section_title|)s%(section_title&-|)s%(title,fulltitle,alt_title)s %(section_start)s to %(section_end)s.%(ext)s"
self.opts["outtmpl"] = {
"default": f"{path}/{base_name}/{section_tpl}",
}
if self._download_thumbnail:
self.opts["outtmpl"]["thumbnail"] = f"{start_path}/{section_tpl}"
return
meta_related = any(
key in options
for key in [
"writedescription",
"writeinfojson",
"writeannotations",
"writedesktoplink",
"writewebloclink",
"writelink",
"writeurllink",
"writesubtitles",
"write_all_thumbnails",
]
)
if meta_related:
self.opts["outtmpl"] = {
"default": f"{path}/{base_name}/{self._listener.name}",
}
else:
self.opts["outtmpl"] = {
"default": f"{path}/{self._listener.name}",
}
if self._download_thumbnail:
self.opts["outtmpl"]["thumbnail"] = f"{start_path}/{base_name}.%(ext)s"
```
Then replace the existing `outtmpl` block with:
```python
base_name, ext = ospath.splitext(self._listener.name)
# ... length trimming, etc ...
self._build_outtmpl(path, base_name, options)
```
### 3. Simplify postprocessor configuration
With the above flags, the postprocessor setup becomes clearer and avoids using `writethumbnail` as a proxy:
```python
if self._download_thumbnail:
self.opts["postprocessors"].append(
{
"format": "jpg",
"key": "FFmpegThumbnailsConvertor",
"when": "before_dl",
}
)
if self._embed_thumbnail:
self.opts["postprocessors"].append(
{
"already_have_thumbnail": self._download_thumbnail,
"key": "EmbedThumbnail",
}
)
```
You keep all existing behavior:
- Listener state still overrides user `writethumbnail`.
- `keep_thumb` still controls directory location.
- `EmbedThumbnail` still knows whether a thumbnail was downloaded.
But each responsibility—“should we download?”, “should we embed?”, “where do we store?”—is encoded in a single, explicitly named place instead of via cross-coupled `writethumbnail`/`keep_thumb`/`outtmpl`.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
bot/helper/mirror_leech_utils/download_utils/yt_dlp_download.py
Outdated
Show resolved
Hide resolved
| self.user = self.message.from_user = await self.client.get_users( | ||
| int(id_) |
There was a problem hiding this comment.
question (bug_risk): Casting RSS tag ID to int may break non‑numeric identifiers.
Previously get_users(id_) worked with whatever was parsed from the tag (e.g., username or stringified ID). Forcing int(id_) assumes the tag is always a numeric Telegram ID and will raise ValueError for @username or other non-numeric values, breaking existing feeds.
If both numeric IDs and usernames should be supported, you could fall back to the original string on ValueError:
try:
id_or_username = int(id_)
except ValueError:
id_or_username = id_
self.user = self.message.from_user = await self.client.get_users(id_or_username)Alternatively, if only numeric IDs are valid, enforce and validate that earlier in the parsing logic with a clear error message.
|
|
||
| LOGGER = getLogger(__name__) | ||
|
|
||
| cpu_no = os.cpu_count() |
There was a problem hiding this comment.
suggestion (bug_risk): Hard‑coding cpu_no, threads, and cores reduces portability and may misconfigure ffmpeg usage.
These fixed values are now fed directly into ffmpeg/taskset. On machines with different core counts, this can oversubscribe CPUs or reference non‑existent cores, causing failures. Please derive them from os.cpu_count() (optionally with a config override) and compute threads/cores dynamically with sensible limits.
Suggested implementation:
LOGGER = getLogger(__name__)
_cpu_count = os.cpu_count() or 1
# Allow overriding detected CPU count via environment, but never exceed the real count
cpu_no = int(os.getenv("CPU_NO", _cpu_count))
if cpu_no > _cpu_count:
cpu_no = _cpu_count
# Default threads: do not exceed detected/overridden CPU count; allow env override
_default_threads = max(1, min(cpu_no, _cpu_count))
threads = int(os.getenv("FFMPEG_THREADS", _default_threads))
if threads > cpu_no:
threads = cpu_no
# Core pinning: allow explicit override, otherwise pin to the first `threads` cores
_cores_override = os.getenv("FFMPEG_CORES")
if _cores_override:
cores = _cores_override
else:
cores = ",".join(str(i) for i in range(min(threads, _cpu_count)))- Ensure
import osis present near the top ofbot/__init__.py:- If missing, add
import osalongside the other imports.
- If missing, add
- If your project uses a configuration system instead of environment variables, you may want to replace the
os.getenv(...)calls with the appropriate config accessors (e.g.,settings.CPU_NO, etc.) while keeping the same fallback and clamping logic.
| | `SUDO_USERS` | `str` | User IDs with sudo permission. Separate by spaces. | | ||
| | `UPLOAD_PATHS` | `dict` | Dict with upload paths. Example: `{"path 1": "remote:", "path 2": "gdrive id", ...}` | | ||
| | `DEFAULT_UPLOAD` | `str` | `rc` for `RCLONE_PATH`, `gd` for `GDRIVE_ID`. Default: `rc`. [Read More](https://github.com/anasty17/mirror-leech-telegram-bot/tree/master#upload). | | ||
| | `INCLUDED_EXTENSIONS` | `str` | File extensions to be uploaded/cloned. `EXCLUDED_EXTENSIONS` will be ignore if you filled this ! Separate them by spaces. | |
There was a problem hiding this comment.
issue (typo): Fix grammar: use "ignored" instead of "ignore" in this sentence.
Change "EXCLUDED_EXTENSIONS will be ignore" to "EXCLUDED_EXTENSIONS will be ignored" to correct the grammar.
| | `INCLUDED_EXTENSIONS` | `str` | File extensions to be uploaded/cloned. `EXCLUDED_EXTENSIONS` will be ignore if you filled this ! Separate them by spaces. | | |
| | `INCLUDED_EXTENSIONS` | `str` | File extensions to be uploaded/cloned. `EXCLUDED_EXTENSIONS` will be ignored if you filled this ! Separate them by spaces. | |
| else: | ||
| self._ext = f".{audio_format}" | ||
|
|
||
| if not self._listener.is_leech or self._listener.thumbnail_layout: |
There was a problem hiding this comment.
issue (complexity): Consider separating thumbnail download, embedding, and storage decisions into dedicated flags and helper methods instead of overloading the writethumbnail option in multiple places.
You can decouple the thumbnail concerns and make this easier to reason about without changing behavior by:
- Introducing internal, derived flags (
_download_thumbnail,_embed_thumbnail) instead of overloadingwritethumbnail. - Isolating the
outtmplconstruction (andstart_path) into a helper that uses those flags. - Keeping
keep_thumbonly as “where to store thumbnails”, not “whether we write them”.
1. Derive internal thumbnail flags once
Right now self.opts["writethumbnail"] is used as:
- user option,
- “we already have thumbnail” flag for
EmbedThumbnail, - implicit driver of
keep_thumbvia_set_options.
You can keep all behavior but separate concerns:
def _compute_thumbnail_flags(self, options):
# Preserve existing override behavior
writethumbnail_opt = options.get("writethumbnail", self.opts.get("writethumbnail", False))
# Listener rules: same logic as today
if not self._listener.is_leech or self._listener.thumbnail_layout:
writethumbnail_effective = False
else:
writethumbnail_effective = bool(writethumbnail_opt)
# Directory decision kept separate
self.keep_thumb = bool(options.get("writethumbnail")) or self.keep_thumb
# Internal flags used everywhere else
self._download_thumbnail = writethumbnail_effective
self._embed_thumbnail = self._ext in [
".mp3", ".mkv", ".mka", ".ogg", ".opus",
".flac", ".m4a", ".mp4", ".mov", ".m4v",
]
# Reflect effective value back to yt-dlp options
self.opts["writethumbnail"] = writethumbnail_effectiveCall this early in add_download:
if options:
self._compute_thumbnail_flags(options)
self._set_options(options)
else:
self._compute_thumbnail_flags({})And drop the inline override:
- if not self._listener.is_leech or self._listener.thumbnail_layout:
- self.opts["writethumbnail"] = False2. Centralize outtmpl building
Currently outtmpl is built with thumbnail in all branches. You can only add the thumbnail key when _download_thumbnail is true, and reuse long templates to avoid duplication:
def _build_outtmpl(self, path, base_name, options):
start_path = path if self.keep_thumb else f"{path}/yt-dlp-thumb"
if self.is_playlist:
default_tpl = "%(title,fulltitle,alt_title)s%(season_number& |)s%(season_number&S|)s%(season_number|)02d%(episode_number&E|)s%(episode_number|)02d%(height& |)s%(height|)s%(height&p|)s%(fps|)s%(fps&fps|)s%(tbr& |)s%(tbr|)d.%(ext)s"
self.opts["outtmpl"] = {
"default": f"{path}/{self._listener.name}/{default_tpl}",
}
if self._download_thumbnail:
self.opts["outtmpl"]["thumbnail"] = f"{start_path}/{default_tpl}"
return
if "download_ranges" in options:
section_tpl = "%(section_number|)s%(section_number&.|)s%(section_title|)s%(section_title&-|)s%(title,fulltitle,alt_title)s %(section_start)s to %(section_end)s.%(ext)s"
self.opts["outtmpl"] = {
"default": f"{path}/{base_name}/{section_tpl}",
}
if self._download_thumbnail:
self.opts["outtmpl"]["thumbnail"] = f"{start_path}/{section_tpl}"
return
meta_related = any(
key in options
for key in [
"writedescription",
"writeinfojson",
"writeannotations",
"writedesktoplink",
"writewebloclink",
"writelink",
"writeurllink",
"writesubtitles",
"write_all_thumbnails",
]
)
if meta_related:
self.opts["outtmpl"] = {
"default": f"{path}/{base_name}/{self._listener.name}",
}
else:
self.opts["outtmpl"] = {
"default": f"{path}/{self._listener.name}",
}
if self._download_thumbnail:
self.opts["outtmpl"]["thumbnail"] = f"{start_path}/{base_name}.%(ext)s"Then replace the existing outtmpl block with:
base_name, ext = ospath.splitext(self._listener.name)
# ... length trimming, etc ...
self._build_outtmpl(path, base_name, options)3. Simplify postprocessor configuration
With the above flags, the postprocessor setup becomes clearer and avoids using writethumbnail as a proxy:
if self._download_thumbnail:
self.opts["postprocessors"].append(
{
"format": "jpg",
"key": "FFmpegThumbnailsConvertor",
"when": "before_dl",
}
)
if self._embed_thumbnail:
self.opts["postprocessors"].append(
{
"already_have_thumbnail": self._download_thumbnail,
"key": "EmbedThumbnail",
}
)You keep all existing behavior:
- Listener state still overrides user
writethumbnail. keep_thumbstill controls directory location.EmbedThumbnailstill knows whether a thumbnail was downloaded.
But each responsibility—“should we download?”, “should we embed?”, “where do we store?”—is encoded in a single, explicitly named place instead of via cross-coupled writethumbnail/keep_thumb/outtmpl.
There was a problem hiding this comment.
Code Review
This pull request introduces a wide range of improvements, including new features like configurable file extensions, bug fixes for robustness, and significant enhancements to media processing and resource usage. The changes are extensive and touch upon many parts of the codebase, aligning with the new runtime layout. My review focuses on improving robustness, fixing a few potential bugs, and addressing some maintainability and security concerns. Overall, this is a substantial and valuable update.
| drives_names.append(temp[0].replace("_", " ")) | ||
| if len(temp) > 2: | ||
| index_urls.append(temp[2]).strip("/") | ||
| index_urls.append(temp[2]) |
There was a problem hiding this comment.
| msg += f"<b><a href={furl}>Drive Link</a></b>" | ||
| if index_url: | ||
| url = f"{index_url}/findpath?id={file.get('id')}" | ||
| url = f"{index_url}findpath?id={file.get('id')}" |
There was a problem hiding this comment.
Removing the slash here makes the URL construction fragile. If index_url is provided without a trailing slash, this will result in a broken URL (e.g., http://example.comfindpath...). It's better to handle this robustly by ensuring a slash is always present.
| url = f"{index_url}findpath?id={file.get('id')}" | |
| url = f"{index_url.rstrip('/')}/findpath?id={file.get('id')}" |
| elif key == "INDEX_URL": | ||
| if drives_names and drives_names[0] == "Main": | ||
| index_urls[0] = value.strip("/") | ||
| index_urls[0] = value |
There was a problem hiding this comment.
| drives_names.append(temp[0].replace("_", " ")) | ||
| if len(temp) > 2: | ||
| index_urls.append(temp[2].strip("/")) | ||
| index_urls.append(temp[2]) |
There was a problem hiding this comment.
Removing .strip("/") makes the URL construction fragile. If a user provides an INDEX_URL without a trailing slash, the generated links will be broken (e.g., http://example.comfindpath...). To make this more robust, you should normalize the URL here. A good approach is to ensure the URL always has a single trailing slash.
| index_urls.append(temp[2]) | |
| index_urls.append(temp[2].rstrip('/') + '/') |
| del self.options[index] | ||
| if bulk_start or bulk_end: | ||
| del self.options[index + 1] | ||
| del self.options[index] |
There was a problem hiding this comment.
This logic for deleting the argument of the -b flag seems incorrect. If -b is present, del self.options[index] removes it. Then, if bulk_start or bulk_end is true, del self.options[index] is called again. This will delete the element that is now at index, which would be the next option, not necessarily the argument for -b. This could lead to unexpected behavior by deleting the wrong argument. The previous logic del self.options[index + 1] was also flawed if -b was the last argument. A more robust way to handle this would be to rebuild the options list, excluding the -b flag and its argument if it exists.
bot/core/startup.py
Outdated
| try: | ||
| no = (await sabnzbd_client.get_config())["config"]["misc"] | ||
| nzb_options.update(no) | ||
| except: |
There was a problem hiding this comment.
Using a bare except: is generally discouraged as it can catch and hide unexpected errors, including system-exiting exceptions like KeyboardInterrupt or SystemExit. It's better to catch specific exceptions that you expect to handle, such as aiohttp.ClientError or other connection-related errors from sabnzbd_client.
| except: | |
| except Exception: |
| WORKDIR /usr/src/app | ||
| RUN chmod 777 /usr/src/app | ||
| WORKDIR /app | ||
| RUN chmod 777 /app |
There was a problem hiding this comment.
Using chmod 777 is insecure as it gives read, write, and execute permissions to everyone on the /app directory. This could lead to security vulnerabilities. It's better to use more restrictive permissions. For example, if the application needs to be run by a non-root user, you could change the ownership of the directory and set more appropriate permissions.
bot/modules/rss.py
Outdated
| html = res.text | ||
| break | ||
| except Exception: | ||
| except: |
There was a problem hiding this comment.
Using a bare except: is discouraged as it can catch and hide unexpected errors, including system-exiting exceptions like KeyboardInterrupt. It's better to catch specific exceptions that you expect to handle, such as httpx.RequestError or other connection-related errors.
| except: | |
| except Exception: |
bot/modules/rss.py
Outdated
| await sleep(10) | ||
| except Exception: | ||
| raise RssShutdownException("Rss Monitor Stopped!") from None | ||
| except: |
There was a problem hiding this comment.
Using a bare except: is discouraged as it can catch and hide unexpected errors, including system-exiting exceptions like KeyboardInterrupt. It's better to catch RssShutdownException specifically if that's the intended exception to catch here, or Exception for general application errors.
| except: | |
| except Exception: |
bot/modules/users_settings.py
Outdated
| x = x.lstrip(".") | ||
| value.append(x.strip().lower()) | ||
| elif option == "INDEX_URL": | ||
| value = value |
Signed-off-by: 5hojib <yesiamshojib@gmail.com>
* fix: resolve all ruff issues This commit resolves all outstanding ruff linting and formatting issues across the codebase. It addresses a variety of problems, including unused variables, bare excepts, incorrect type comparisons, and import ordering. These changes improve code quality, readability, and maintainability. * format: auto-format code by ruff. Signed-off-by: 5hojib <yesiamshojib@gmail.com> * update * format: auto-format code by ruff. Signed-off-by: 5hojib <yesiamshojib@gmail.com> --------- Signed-off-by: 5hojib <yesiamshojib@gmail.com> Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com> Co-authored-by: 5hojib <yesiamshojib@gmail.com>
Summary by Sourcery
Add support for user- and globally-configurable included file extensions, improve media processing and resource usage, harden integrations with external services, and update paths and client configuration for the new runtime layout.
New Features:
Bug Fixes:
Enhancements:
Build:
Documentation: