Skip to content

Commit 5710282

Browse files
committed
readme update
1 parent ba20666 commit 5710282

2 files changed

Lines changed: 219 additions & 40 deletions

File tree

README.md

Lines changed: 85 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -1,41 +1,67 @@
1-
# ArxivFlow
2-
Workflow - Periodic Track on Arxiv.org Paper
1+
# ArxivFlow - Periodic Track on arXiv Paper
2+
3+
English | [中文](README_CN.md)
34

45
Author: Tiger, member from [HKUST Dial](https://github.com/HKUSTDial)
56

67
Last update: September 09, 2025
78

8-
# Objectives
9-
This workflow serves for tracking daily updates in Arxiv.org. Paper info will be preprocessed and concluded by a series of modules. Finally, it will post to a group chat in Feishu for reading. The target audience is for education and research community.
9+
## 🎯 Objectives
10+
This workflow serves for tracking daily updates in arXiv.org. Paper info will be preprocessed and concluded by a series of modules. Finally, it will post to a group chat in Feishu for reading. The target audience is for education and research community.
11+
12+
> 💰 Cost: less than 0.05 CNY per workflow execution.
13+
14+
## ✨ Key Features
15+
16+
- 📚 Automatically fetch latest arXiv papers
17+
- 🤖 AI-powered paper summarization and filtering
18+
- 📱 Auto-send to Feishu group chat
19+
- ⏰ GitHub Actions automated scheduling
20+
- 🛠️ Local debugging script support
21+
22+
## 📋 Prerequisites
23+
24+
Before getting started, please ensure you have prepared the following accounts and services:
25+
26+
1. **[Dify](https://dify.ai/) account** - Free registration for building AI workflows
27+
2. **LLM Provider API** - Recommended [DeepSeek API](https://platform.deepseek.com/api_keys) (cost-effective)
28+
3. **[Jina](https://jina.ai/) API key** - For web content extraction, new users get 1M free credits
29+
4. **Feishu Group Bot Webhook** - For message pushing
30+
31+
## 🚀 Quick Start
32+
33+
### Step 1: Setup Dify Workflow
34+
35+
1. **Open Dify Console**
36+
- Login to Dify and find the "Studio" tab
37+
![](image/dify_studio.png)
1038

11-
> Cost: less than 0.05 CNY per workflow execution.
39+
2. **Import Workflow**
40+
- Create a new workflow by importing [this DSL file](dsl/ArxivDairy.yml)
41+
- This DSL file contains the complete logic for paper fetching, processing, and pushing
1242

13-
# Prerequisites
14-
- A [Dify](https://dify.ai/) account, sign up for free plan.
15-
- A LLM provider API (I use [DeepSeek API](https://platform.deepseek.com/api_keys))
16-
- A [Jina](https://jina.ai/) API key (1M Free credits for new users)
17-
- A Group Bot webhook (For Feishu group chat robot)
43+
3. **Configure Environment Variables**
44+
- Configure necessary environment variables in workflow settings
45+
![](image/env.png)
46+
- See detailed configuration in **Environment Variables Configuration** section below
1847

19-
# How to build your own workflow?
48+
4. **Get API Token**
49+
- Get your workflow API token from workflow settings
50+
- This token will be used for automated scheduling
2051

21-
## Step 1: Setup Dify Workflow
22-
1. Open your Dify Cloud and find the "Studio" tab.![](image/dify_studio.png)
23-
2. Create a new workflow via import [this](dsl/ArxivDairy.yml) DSL file
24-
3. Configure the environment variables in the workflow. ![](image/env.png)
25-
4. Get your workflow API token from the workflow settings
52+
### Step 2: Setup Automated Scheduler (Recommended)
2653

27-
## Step 2: Setup Automated Scheduler (Recommended)
28-
The project now includes an integrated scheduler that runs your workflow automatically. Since your Dify workflow already handles Feishu messaging internally, the scheduler simply executes the workflow and logs the results.
54+
The project provides an integrated scheduler that can trigger Dify-side workflows on schedule.
2955

30-
### Quick Setup:
31-
1. **Configure GitHub Secrets**: Go to your repository Settings > Secrets and variables > Actions > New repository secret, and add:
32-
- `DIFY_TOKENS`: Your Dify workflow API token(s) - separate multiple tokens with `;`
33-
- `DIFY_BASE_URL`: (Optional) Dify API URL, defaults to `https://api.dify.ai/v1`
34-
- `DIFY_INPUTS`: (Optional) JSON format input variables if your workflow requires them
56+
#### Quick Setup:
3557

36-
2. **Enable GitHub Actions**: Go to the Actions tab in your repository and enable workflows
58+
1. **Configure GitHub Secrets**:
59+
- Go to repository Settings > Secrets and variables > Actions > New repository secret
60+
- Add secret `DIFY_TOKENS`: Your Dify workflow API token (separate multiple tokens with `;`)
3761

38-
3. **Automatic Execution**: The scheduler runs daily at 06:30 Beijing Time automatically
62+
2. **Enable GitHub Actions**: Go to repository Actions tab and enable workflows
63+
64+
3. **Automatic Execution**: The scheduler will automatically run according to timing rules defined in [dify-scheduler.yml](.github/workflows/dify-scheduler.yml). For syntax details, see [cron.help](https://cron.help/).
3965

4066
### Manual Execution:
4167
- **GitHub Actions**: Go to Actions tab > "Dify ArxivFlow Scheduler" > "Run workflow"
@@ -47,7 +73,8 @@ The project now includes an integrated scheduler that runs your workflow automat
4773
npm start
4874
```
4975

50-
## Final Result
76+
### 📱 Final Result
77+
5178
![](image/feishu_demo.png)
5279

5380
The scheduler will automatically:
@@ -56,29 +83,47 @@ The scheduler will automatically:
5683
- ❌ Report any errors to GitHub Actions logs
5784
- 🔄 Support multiple workflows if needed
5885

59-
# Scheduler Configuration
60-
61-
## Environment Variables
62-
Configure these as GitHub repository secrets or local environment variables:
86+
## 🔧 Environment Variables Configuration
6387

64-
### Required Variables:
65-
- `DIFY_TOKENS`: Your Dify workflow API token(s). For multiple workflows, separate with `;`
88+
### GitHub Actions Secrets (Required):
89+
- `DIFY_TOKENS`: Your Dify workflow API token, separate multiple workflows with `;`
6690

67-
### Optional Variables:
91+
### Optional Configuration:
6892
- `DIFY_BASE_URL`: Dify API base URL (default: `https://api.dify.ai/v1`)
69-
- `DIFY_INPUTS`: JSON format input variables for workflows (default: `{}`)
93+
- `DIFY_INPUTS`: Workflow input variables in JSON format (default: `{}`)
94+
95+
### Dify Workflow Internal Environment Variables:
96+
- `FEISHU_DEV` / `FEISHU_PROD`: Feishu Group Bot Webhook for testing/production environments
97+
- `JINA`: API key for crawling arXiv search results
98+
- `KEYWORDS`: Keywords for arXiv paper search, comma-separated
99+
- The number of KEYWORDS and sending frequency needs to match the timing rules in GitHub Actions
100+
- Example: If sending 4 pushes daily, KEYWORDS needs 4 keywords, and timing rules need 4 time points
101+
- `PAPER_NUM_MAX`: Maximum number of papers per message (limited by Feishu message length)
102+
103+
## 🛠️ Debugging Scripts
70104

71-
## Original Dify Workflow Env Vars:
72-
- `FEISHU_DEV` / `FEISHU_PROD`: Webhook of Feishu Group Bot for testing/deployment
73-
- `JINA`: Web crawler API key for Arxiv.org
74-
- `KEYWORDS`: Comma-separated keywords for Arxiv query
75-
- `PAPER_NUM_MAX`: Maximum number of papers per message (Feishu message limits)
105+
The `/scripts` folder contains scripts for local debugging and testing, simulating the processes used in Dify Workflow:
76106

107+
- **`jina_extract.py`**: Simulates Jina API calls and paper information extraction logic
108+
- **`sample.text`**: Sample data returned by Jina API for local testing
109+
- **`extracted_papers.json`**: Example of structured paper data after extraction, serves as input for downstream LLM analysis in workflow
77110

78-
# Acknowledgement
111+
These scripts help you test and debug paper extraction logic without consuming API credits.
112+
113+
### Usage for Local Development:
114+
```bash
115+
cd scripts
116+
python jina_extract.py
117+
```
118+
119+
## 🤝 Acknowledgement
79120
- Dify Official Guidance: [Link](https://docs.dify.ai/docs/workflow/overview)
80121
- Feishu - How to use Bot in Group Chat: [Link (Chinese)](https://www.feishu.cn/hc/zh-CN/articles/360024984973-%E5%9C%A8%E7%BE%A4%E7%BB%84%E4%B8%AD%E4%BD%BF%E7%94%A8%E6%9C%BA%E5%99%A8%E4%BA%BA?from=in-im-bot)
81122
- AWS Workshop: Lab3-使用Dify构建AI Workflow: [Link (Chinese)](https://catalog.us-east-1.prod.workshops.aws/workshops/2c19fcb1-1f1c-4f52-b759-0ca4d2ae2522/zh-CN)
82123
- arXiv Category: [Link](https://arxiv.org/category_taxonomy)
83124
- Dify Schedule Project: [Link](https://github.com/leochen-g/dify-schedule) - Inspiration for the automated scheduler implementation
84125

126+
## 📄 License
127+
128+
MIT License - See [LICENSE](LICENSE) file
129+

README_CN.md

Lines changed: 134 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,134 @@
1+
# ArxivFlow - 定期追踪 arXiv论文更新
2+
3+
[English](README.md) | 中文
4+
5+
6+
作者:Tiger,来自 [HKUST Dial](https://github.com/HKUSTDial) 实验室
7+
8+
最后更新:2025年9月9日
9+
10+
## 🎯 项目目标
11+
12+
这个工作流用于追踪 arXiv.org 的日常论文更新。论文信息将通过一系列模块进行预处理和总结,最终发布到飞书群聊中供阅读。目标用户是教育和研究社区成员。
13+
14+
> 💰 成本:每次工作流执行成本低于 0.05 人民币
15+
16+
## ✨ 主要功能
17+
18+
- 📚 自动抓取 arXiv 最新论文
19+
- 🤖 AI 智能总结和筛选论文
20+
- 📱 自动发送到飞书群聊
21+
- ⏰ GitHub Actions 自动定时执行
22+
- 🛠️ 本地调试脚本支持
23+
24+
## 📋 前置条件
25+
26+
在开始之前,请确保您已准备好以下账号和服务:
27+
28+
1. **[Dify](https://dify.ai/) 账号** - 免费注册,用于构建 AI 工作流
29+
2. **LLM 提供商 API** - 推荐使用 [DeepSeek API](https://platform.deepseek.com/api_keys)(性价比高)
30+
3. **[Jina](https://jina.ai/) API 密钥** - 用于网页内容提取,新用户有100万免费额度
31+
4. **飞书群机器人 Webhook** - 用于消息推送
32+
33+
## 🚀 快速开始
34+
35+
### 第1步:设置 Dify 工作流
36+
37+
1. **打开 Dify 控制台**
38+
- 登录 Dify 并找到 "Studio" 选项卡
39+
![](image/dify_studio.png)
40+
41+
2. **导入工作流**
42+
- 创建新工作流,通过导入 [DSL 文件](dsl/ArxivDairy.yml)
43+
- 这个 DSL 文件包含了完整的论文抓取、处理和推送逻辑
44+
45+
3. **配置环境变量**
46+
- 在工作流设置中配置必要的环境变量
47+
![](image/env.png)
48+
- 详细配置说明见下方**环境变量配置**部分
49+
50+
4. **获取 API 令牌**
51+
- 在工作流设置中获取您的工作流 API 令牌
52+
- 这个令牌用于后续的自动化调度
53+
54+
### 第2步:设置自动化调度器(推荐)
55+
56+
项目提供一个集成的调度器,可以定时触发DiFy端侧工作流。
57+
58+
#### 快速配置:
59+
60+
1. **配置 GitHub Secrets**
61+
- 进入仓库的 Settings > Secrets and variables > Actions > New repository secret
62+
- 添加密钥`DIFY_TOKENS`: 您的 Dify 工作流 API 令牌(多个令牌用 `;` 分隔)
63+
64+
2. **启用 GitHub Actions**:进入仓库的 Actions 选项卡并启用工作流
65+
66+
3. **自动执行**:调度器将依据 [dify-scheduler.yml](.github/workflows/dify-scheduler.yml) 中的定时规则自动运行,定义语法详见[cron.help](https://cron.help/)
67+
68+
#### 手动执行:
69+
70+
- **GitHub Actions**:Actions 选项卡 > "Dify ArxivFlow Scheduler" > "Run workflow"
71+
- **本地测试**
72+
```bash
73+
npm install
74+
# 设置环境变量
75+
export DIFY_TOKENS="your_workflow_token_here"
76+
npm start
77+
```
78+
79+
## 🔧 环境变量配置
80+
81+
### GitHub Actions 密钥(必需):
82+
- `DIFY_TOKENS`: 您的 Dify 工作流 API 令牌,多个工作流用 `;` 分隔
83+
84+
### 可选配置:
85+
- `DIFY_BASE_URL`: Dify API 基础 URL(默认:`https://api.dify.ai/v1`
86+
- `DIFY_INPUTS`: 工作流输入变量,JSON 格式(默认:`{}`
87+
88+
### Dify 工作流内部环境变量:
89+
- `FEISHU_DEV` / `FEISHU_PROD`: 飞书群机器人 Webhook,用于测试/生产环境
90+
- `JINA`: 用于arXiv搜索结果的爬取API 密钥
91+
- `KEYWORDS`: 用于arXiv搜索论文关键词,逗号分隔
92+
- KEYWORDS 数目和时间发送频率需要与GitHub Actions中的定时规则相匹配
93+
- 例如:每天发送4次推送,那么KEYWORDS需要设置为4个关键词,定时规则也需要有4个时间点
94+
- `PAPER_NUM_MAX`: 每条消息的最大论文数量(受飞书消息长度限制)
95+
96+
## 🛠️ 调试脚本说明
97+
98+
`/scripts` 文件夹包含用于本地调试和测试的脚本,模拟 Dify 工作流中的处理过程:
99+
100+
- **`jina_extract.py`**: 模拟 Jina API 调用和论文信息提取逻辑
101+
- **`sample.text`**: Jina API 返回的示例数据,用于本地测试
102+
- **`extracted_papers.json`**: 提取后的结构化论文数据示例,作为工作流后续大模型LLM分析的输入信息
103+
104+
这些脚本帮助您在不消耗 API 额度的情况下测试和调试论文提取逻辑。
105+
106+
### 本地调试使用方法:
107+
```bash
108+
cd scripts
109+
python jina_extract.py
110+
```
111+
112+
## 📱 最终效果
113+
114+
![](image/feishu_demo.png)
115+
116+
调度器将自动:
117+
- ✅ 每日执行您的 Dify 工作流
118+
- 📊 记录执行结果和状态
119+
- ❌ 将任何错误报告到 GitHub Actions 日志
120+
- 🔄 根据需要支持多个工作流
121+
122+
123+
## 🤝 致谢
124+
125+
- Dify 官方指南:[链接](https://docs.dify.ai/docs/workflow/overview)
126+
- 飞书 - 如何在群组中使用机器人:[链接](https://www.feishu.cn/hc/zh-CN/articles/360024984973-%E5%9C%A8%E7%BE%A4%E7%BB%84%E4%B8%AD%E4%BD%BF%E7%94%A8%E6%9C%BA%E5%99%A8%E4%BA%BA?from=in-im-bot)
127+
- AWS 工作坊:Lab3-使用Dify构建AI Workflow:[链接](https://catalog.us-east-1.prod.workshops.aws/workshops/2c19fcb1-1f1c-4f52-b759-0ca4d2ae2522/zh-CN)
128+
- arXiv 分类体系:[链接](https://arxiv.org/category_taxonomy)
129+
- Dify 调度项目:[链接](https://github.com/leochen-g/dify-schedule) - 自动化调度器实现的灵感来源
130+
131+
## 📄 许可证
132+
133+
MIT License - 详见 [LICENSE](LICENSE) 文件
134+

0 commit comments

Comments
 (0)