[REFACTOR] BE - ai report by jxhee99 · Pull Request #101 · APIL0g/APILog

jxhee99 · 2026-02-10T05:34:36Z

🧾 ApiLog

✏️ Summary

데이터 수집 시 타임아웃과 cpu 과부하 문제 해결을 위해 리펙토링 진행했습니다.

📢 Motivation

📌 Type of Change

✨ New Feature / 새로운 기능
🐞 Bug Fix / 버그 수정
💅 Style / UI Update / 스타일·UI 변경
🧹 Refactor (non-breaking) / 리팩토링(비호환X)
🧾 Docs / 문서
✅ Test Added / Updated / 테스트 추가·수정
⚙️ Build / CI / Config / 빌드·CI·설정
🔥 Breaking Change / 호환성 깨짐
♻️ Other / 기타 (설명)

🛠️ Implementation Details

ai report 백엔드 전면 개선
기존 972줄이었던 service.py는 473줄로 줄었고, 불필요한 추상화·중복·미사용 기능을 모두 제거했습니다.

데이터 수집: HTTP 7회 엔드포인트 요청 → SQL 3회 (단일 커넥션)
실패 처리: 목업 데이터 반환 → 명시적 에러로 변경
LLM 응답 전체를 통째로 파싱 → 한 섹션이 잘못되면 전체 리포트 실패반환 했던 방식에서 각 섹션을 개별 검증, 잘못된 섹션만 빈 배열로 대체
Ollama 제거

🧪 Testing

InfluxDB 실행 + 더미데이터 시드로 데이터 생성 후 api key 삽입 후 테스트 진행했습니다.

🖼️ Screenshots / Demos

✅ Checklist

Tests pass locally / 로컬 테스트 통과
Lint & type checks pass / 린트·타입체크 통과
Docs updated if needed / 필요 시 문서 업데이트
Commit messages follow convention / 커밋 규칙 준수

gemini-code-assist · 2026-02-10T05:34:55Z

Summary of Changes

Hello @jxhee99, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

이 Pull Request는 AI 리포트 백엔드의 성능과 안정성을 향상시키기 위한 대규모 리팩토링을 포함합니다. 데이터 수집 방식의 근본적인 개선과 LLM 응답 처리 로직의 정교화를 통해 기존의 타임아웃 및 CPU 과부하 문제를 해결하고, 전반적인 코드 품질과 유지보수성을 높이는 데 중점을 두었습니다.

Highlights

데이터 수집 최적화: 기존 7회의 HTTP 엔드포인트 요청 방식에서 단일 InfluxDB 연결을 통한 3회의 SQL 쿼리 방식으로 데이터 수집 로직을 변경하여 타임아웃 및 CPU 과부하 문제를 해결했습니다.
LLM 응답 처리 개선: LLM 응답 전체를 파싱하여 한 섹션이 잘못되면 전체 리포트가 실패하던 방식에서, 각 섹션을 개별적으로 검증하고 잘못된 섹션만 빈 배열로 대체하도록 개선하여 견고성을 높였습니다.
오류 처리 명확화: 실패 시 목업 데이터를 반환하던 방식에서 명시적인 에러를 반환하도록 변경하여 문제 진단을 용이하게 했습니다.
Ollama 통합 제거: Ollama 관련 코드를 제거하여 LLM 제공자 로직을 간소화하고 의존성을 줄였습니다.
코드베이스 간소화: 주요 서비스 파일인 service.py의 코드 라인 수를 972줄에서 473줄로 대폭 줄여 불필요한 추상화, 중복, 미사용 기능을 제거했습니다.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog

back/app/config.py
- AI_REPORT_LLM_PROVIDER의 기본값을 'auto'로 업데이트했습니다.
back/app/plugins/widgets/ai_report/schemas.py
- InteractionInsight 스키마를 ErrorAnalysis 스키마로 대체했습니다.
- Prediction 및 RadarScoreItem 스키마를 제거했습니다.
- ReportMeta 스키마에 'partial_failures' 필드를 추가하고 'prompt_version'을 'v3'으로 업데이트했습니다.
- ReportResponse 스키마에서 'interaction_insights', 'predictions', 'radar_scores' 필드를 'error_analysis'로 대체했습니다.
back/app/plugins/widgets/ai_report/service.py
- InfluxDBClient3 임포트를 추가하고 불필요한 라이브러리 임포트를 제거했습니다.
- 데이터 수집 로직을 HTTP 요청에서 InfluxDB SQL 쿼리 기반으로 전면 개편했습니다.
- LLM 제공자 감지 및 호출 로직을 간소화하여 Ollama 관련 코드를 제거했습니다.
- LLM 응답에서 JSON을 추출하는 로직을 더욱 견고하게 개선했습니다.
- LLM 프롬프트 구성 및 스키마 힌트를 새로운 리포트 구조에 맞춰 업데이트했습니다.
- 리포트 섹션별 유효성 검사 로직을 추가하고, LLM 실패 시 폴백 대신 명시적인 에러 리포트를 반환하도록 변경했습니다.
- 리포트 최종화 로직을 간소화하고 'partial_failures'를 메타데이터에 포함하도록 업데이트했습니다.

Activity

AI 리포트 백엔드 전면 개선을 진행했습니다.
기존 972줄이었던 service.py 파일을 473줄로 줄였습니다.
데이터 수집 방식을 HTTP 7회 엔드포인트 요청에서 SQL 3회(단일 커넥션)로 변경했습니다.
실패 처리 방식을 목업 데이터 반환에서 명시적 에러로 변경했습니다.
LLM 응답 파싱 방식을 전체 실패에서 각 섹션 개별 검증 및 잘못된 섹션만 빈 배열로 대체하도록 개선했습니다.
Ollama 기능을 제거했습니다.
InfluxDB 실행 및 더미데이터 시드를 이용한 API 키 삽입 후 로컬 테스트를 완료했습니다.
UI 변경 사항을 보여주는 스크린샷을 첨부했습니다.
로컬 테스트 통과 및 커밋 메시지 컨벤션 준수를 확인했습니다.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request refactors the AI report generation process by updating the report schema to remove 'predictions' and 'radar scores' and introduce 'error analysis'. It streamlines the LLM provider resolution, removes Ollama-specific call logic, and significantly overhauls data collection from HTTP API calls to direct InfluxDB queries using InfluxDBClient3. The JSON extraction logic for LLM responses has been simplified, and the prompt construction has been updated to reflect the new report structure. Review comments highlight several issues: the site_id parameter is ignored, leading to data aggregation across all sites and potential unauthorized data exposure; user inputs are directly concatenated into the LLM prompt, creating a prompt injection vulnerability; the simplified JSON parsing logic for LLM output is less robust and may fail with braces inside string literals; and the InfluxDB queries, while currently safe, should be parameterized to follow SQL injection prevention best practices.

gemini-code-assist · 2026-02-10T05:37:57Z


-    del from_ts, to_ts, bucket, site_id  # Inputs are handled via widget bundle collection.
-    bundle = _collect_widget_data()
+    bundle = _collect_report_bundle()


The generate_report function receives a site_id parameter, but it is ignored in the call to _collect_report_bundle(). Furthermore, the SQL queries in _collect_report_bundle do not filter by site_id. Since the analytics data is stored with a site_id tag in InfluxDB, this lack of filtering causes the AI report to aggregate data from all sites in the database. In a multi-tenant environment, this leads to unauthorized data exposure where one user can see a summary of another user's site data.

Currently, this service is operated locally for a single site, so it is not a multi-tenant issue.

/gemini Currently, this service is operated locally for a single site, so it is not a multi-tenant issue.

Thank you for the clarification. I understand that the service is currently operated locally for a single site.

However, the core issue remains: the _collect_report_bundle function, which is responsible for gathering all analytics data for the AI report, does not filter its SQL queries by site_id. This means that even in a single-site setup, if multiple sites were to ever write data to the same InfluxDB instance, the report would aggregate data from all sites.

This design choice creates a significant data isolation vulnerability. If the application were to scale to a multi-tenant environment, or if more than one site's data were ever present in the InfluxDB, a user requesting a report for their site_id would inadvertently receive a summary based on data from other sites as well. This constitutes unauthorized data exposure, which is a high severity security concern.

To ensure proper data segregation and future-proof the application for potential multi-tenancy, I strongly recommend the following:

Reintroduce site_id to generate_report's signature: The generate_report function should accept site_id: Optional[str].

Modify _collect_report_bundle to accept and use site_id: Update _collect_report_bundle to take site_id as a parameter and incorporate it into the WHERE clauses of all relevant SQL queries (e.g., summary_sql, trend_sql, exit_sql).

For example, the _collect_report_bundle function could be modified like this:

def _collect_report_bundle(site_id: Optional[str] = None, days: int = 7) -> Dict[str, Any]: # ... existing code ... site_filter = f"AND site_id = '{site_id}'" if site_id else "" summary_sql = f""" SELECT device_type, browser_family, country_code, path, COUNT(*) AS cnt, COUNT(DISTINCT session_id) AS sessions, AVG(dwell_ms) AS avg_dwell_ms, SUM(CASE WHEN error_flag = true THEN 1 ELSE 0 END) AS error_count FROM events WHERE time >= TIMESTAMP '{from_str}' AND time < TIMESTAMP '{to_str}' AND session_id IS NOT NULL AND session_id <> '' {site_filter} GROUP BY device_type, browser_family, country_code, path """ # ... similar modifications for trend_sql and exit_sql ... # ... existing code ...

And then generate_report would call it like:

def generate_report( # ... other parameters ... site_id: Optional[str] = None, # ... other parameters ... ) -> Dict[str, Any]: # ... existing code ... bundle = _collect_report_bundle(site_id=site_id) # ... existing code ...

Addressing this now will prevent significant refactoring and potential security incidents in the future.

gemini-code-assist · 2026-02-10T05:37:57Z

+def _build_messages(bundle: Dict[str, Any], prompt: str, language: str, audience: str, word_limit: int) -> List[Dict[str, str]]:
    soft_prompt = (prompt or "").strip()[:400]
    user_prompt = (
        f"Language: {language}\n"
        f"Audience: {audience}\n"
        f"WordLimit: {word_limit}\n"
        f"UserHint(LightlyIncorporate): {soft_prompt}\n\n"
-        "Build an AI report that does the following:\n"
-        "- `diagnostics`: 2~4 핵심 환경별 문제를 위젯 데이터를 근거로 설명.\n"
-        "- `page_issues`: 체류 시간 대비 이탈이 높은 페이지만 골라 가설을 작성.\n"
-        "- `interaction_insights`: 버튼/클릭 패턴을 기반으로 개선 방향을 제안.\n"
-        "- `ux_recommendations`: 즉시 실행 가능한 UX 조치와 검증 방법을 제시.\n"
-        "- `tech_recommendations`: 기술 조치와 추적 방법을 명시.\n"
-        "- `priorities`: 노력 대비 효과 기준으로 High/Medium/Low 분류.\n"
-        "- `metrics_to_track`: 개선 후 7일간 모니터링할 위젯과 목표 변화를 명확히 기재.\n"
-        "- `predictions`: 최소 2개 이상 반환하고, 조치 실행 시 baseline 대비 expected 값을 숫자로 제시.\n"
-        "- `radar_scores`: five axes 0-100 점수, 서로 다른 지표 근거 사용.\n\n"
-        "예외 없이 `predictions` 배열의 모든 항목에는 `metric`(string), `baseline`(number), `expected`(number), "
-        "`unit`(string, %, sessions 등), `narrative`(string) 필드를 모두 포함하세요. "
-        "`expected` 값을 비워 두거나 생략하면 전체 응답이 거부됩니다.\n\n"
+        "Build an AI report with these sections:\n"
+        "- `diagnostics`: 2~4 core issues by device/browser/country, citing actual numbers from the data.\n"
+        "- `page_issues`: pages with high exit rate AND low dwell time from page_exit_rate data.\n"
+        "- `error_analysis`: paths + browsers with notable error rates from error_analysis data.\n"
+        "- `ux_recommendations`: actionable UX fixes with validation methods.\n"
+        "- `tech_recommendations`: technical fixes with monitoring approach.\n"
+        "- `priorities`: rank recommendations by effort vs impact as High/Medium/Low.\n"
+        "- `metrics_to_track`: which metrics to monitor after improvements, using only fields that exist in the data.\n\n"
        "Respond with JSON only, conforming to this schema:\n"
-        f"{json.dumps(schema_hint, ensure_ascii=False)}\n\n"
+        f"{json.dumps(_SCHEMA_HINT, ensure_ascii=False)}\n\n"
        f"WIDGET_API_BUNDLE:\n{json.dumps(bundle, ensure_ascii=False)}"
    )
-    return [{"role": "system", "content": system_prompt}, {"role": "user", "content": user_prompt}]
+    return [{"role": "system", "content": _SYSTEM_PROMPT}, {"role": "user", "content": user_prompt}]


User-provided inputs (prompt, language, audience) are directly concatenated into the LLM prompt without sanitization or escaping. This makes the application vulnerable to prompt injection, where an attacker can manipulate the prompt field in the request to override system instructions, potentially leading to the generation of malicious content or the leakage of internal data structures in the AI report.

gemini-code-assist · 2026-02-10T05:37:57Z

+    # find matching close brace (simple depth count, ignore strings for speed)
+    depth = 0
+    end = -1
+    for i in range(start, len(blob)):
+        if blob[i] == "{":
+            depth += 1
+        elif blob[i] == "}":
+            depth -= 1
+            if depth == 0:
+                end = i
+                break


The logic to find the matching closing brace } for a JSON object is based on a simple depth counter that doesn't account for braces inside string literals (e.g., {"key": "a value with { or }"}). This can lead to incorrect slicing of the JSON string and parsing failures if the LLM returns such content. The previous implementation was more robust as it tracked whether the parser was inside a string.

# find matching close brace (string-aware) depth = 0 end = -1 in_string = False escape = False for i in range(start, len(blob)): char = blob[i] if in_string: if escape: escape = False elif char == '\\': escape = True elif char == '"': in_string = False else: if char == '"': in_string = True elif char == '{': depth += 1 elif char == '}': depth -= 1 if depth == 0: end = i break

gemini-code-assist · 2026-02-10T05:37:57Z

+    summary_sql = f"""
+    SELECT
+        device_type, browser_family, country_code, path,
+        COUNT(*) AS cnt,
+        COUNT(DISTINCT session_id) AS sessions,
+        AVG(dwell_ms) AS avg_dwell_ms,
+        SUM(CASE WHEN error_flag = true THEN 1 ELSE 0 END) AS error_count
+    FROM events
+    WHERE time >= TIMESTAMP '{from_str}' AND time < TIMESTAMP '{to_str}'
+      AND session_id IS NOT NULL AND session_id <> ''
+    GROUP BY device_type, browser_family, country_code, path
+    """


The SQL query is constructed using an f-string to embed the date range. While safe here because the dates are internally generated, it's a security best practice to use parameterized queries to prevent potential SQL injection. The influxdb-client-3 library supports this via the params argument.

You should modify the query to use placeholders and pass the values in the params dictionary when calling client.query.

Example of the updated call:

params = {"from_str": from_str, "to_str": to_str} summary_rows = client.query(summary_sql, params=params).to_pylist()

Suggested change

summary_sql = f"""

SELECT

device_type, browser_family, country_code, path,

COUNT(*) AS cnt,

COUNT(DISTINCT session_id) AS sessions,

AVG(dwell_ms) AS avg_dwell_ms,

SUM(CASE WHEN error_flag = true THEN 1 ELSE 0 END) AS error_count

FROM events

WHERE time >= TIMESTAMP '{from_str}' AND time < TIMESTAMP '{to_str}'

AND session_id IS NOT NULL AND session_id <> ''

GROUP BY device_type, browser_family, country_code, path

"""

summary_sql = """

SELECT

device_type, browser_family, country_code, path,

COUNT(*) AS cnt,

COUNT(DISTINCT session_id) AS sessions,

AVG(dwell_ms) AS avg_dwell_ms,

SUM(CASE WHEN error_flag = true THEN 1 ELSE 0 END) AS error_count

FROM events

WHERE time >= $from_str AND time < $to_str

AND session_id IS NOT NULL AND session_id <> ''

GROUP BY device_type, browser_family, country_code, path

"""

sonarqubecloud · 2026-02-10T05:38:04Z

Quality Gate passed

Issues
6 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
2.6% Duplication on New Code

See analysis details on SonarQube Cloud

qwon999 · 2026-02-10T05:48:17Z

LGTM😎

[REFACTOR] BE - ai report

89f62e9

jxhee99 requested review from KAKIRU, lsj0822, now-eat-krap, parkhh16 and qwon999 February 10, 2026 05:34

jxhee99 self-assigned this Feb 10, 2026

[REFACTOR] FE - ai report

310f234

gemini-code-assist Bot reviewed Feb 10, 2026

View reviewed changes

qwon999 merged commit 724e40d into develop Feb 10, 2026
1 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REFACTOR] BE - ai report#101

[REFACTOR] BE - ai report#101
qwon999 merged 2 commits into
developfrom
fix/ai-report

jxhee99 commented Feb 10, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Feb 10, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Feb 10, 2026

Uh oh!

qwon999 Feb 10, 2026

Uh oh!

lsj0822 Feb 10, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot Feb 10, 2026

Uh oh!

gemini-code-assist Bot Feb 10, 2026

Uh oh!

gemini-code-assist Bot Feb 10, 2026

Uh oh!

gemini-code-assist Bot Feb 10, 2026

Uh oh!

sonarqubecloud Bot commented Feb 10, 2026

Uh oh!

qwon999 commented Feb 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jxhee99 commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🧾 ApiLog

✏️ Summary

📢 Motivation

📌 Type of Change

🛠️ Implementation Details

🧪 Testing

🖼️ Screenshots / Demos

✅ Checklist

Uh oh!

gemini-code-assist Bot commented Feb 10, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

qwon999 Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

lsj0822 Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud Bot commented Feb 10, 2026

Quality Gate passed

Uh oh!

qwon999 commented Feb 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jxhee99 commented Feb 10, 2026 •

edited

Loading

lsj0822 Feb 10, 2026 •

edited

Loading