Skip to content

Commit 41a9ba6

Browse files
authored
Feat/social network analytics (#27)
1 parent 32e8e39 commit 41a9ba6

13 files changed

Lines changed: 2119 additions & 0 deletions

File tree

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
name: Social Network Analytics CI
2+
3+
on:
4+
push:
5+
paths:
6+
- social-network-analytics/**
7+
- .github/workflows/social-network-analytics.yml
8+
pull_request:
9+
paths:
10+
- social-network-analytics/**
11+
- .github/workflows/social-network-analytics.yml
12+
13+
jobs:
14+
test:
15+
runs-on: ubuntu-latest
16+
timeout-minutes: 15
17+
permissions:
18+
contents: read
19+
strategy:
20+
fail-fast: false
21+
matrix:
22+
runner: [curl, java]
23+
24+
env:
25+
ARCADEDB_URL: http://localhost:2480
26+
ARCADEDB_USER: root
27+
ARCADEDB_PASS: arcadedb
28+
29+
steps:
30+
- name: Checkout
31+
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
32+
with:
33+
fetch-depth: 1
34+
35+
- name: Set up Java
36+
if: matrix.runner == 'java'
37+
uses: actions/setup-java@be666c2fcd27ec809703dec50e508c2fdc7f6654 # v5.2.0
38+
with:
39+
java-version: '21'
40+
distribution: 'temurin'
41+
42+
- name: Cache Maven repository
43+
if: matrix.runner == 'java'
44+
uses: actions/cache@cdf6c1fa76f9f475f3d7449005a359c84ca0f306 # v5.0.3
45+
with:
46+
path: ~/.m2
47+
key: ${{ runner.os }}-m2-${{ hashFiles('social-network-analytics/java/pom.xml') }}
48+
restore-keys: ${{ runner.os }}-m2-
49+
50+
- name: Start ArcadeDB
51+
working-directory: social-network-analytics
52+
run: docker compose up -d
53+
54+
- name: Setup database
55+
working-directory: social-network-analytics
56+
run: ./setup.sh
57+
58+
- name: Run curl queries
59+
if: matrix.runner == 'curl'
60+
working-directory: social-network-analytics
61+
run: ./queries/queries.sh
62+
63+
- name: Build and run Java
64+
if: matrix.runner == 'java'
65+
working-directory: social-network-analytics/java
66+
run: |
67+
mvn package --no-transfer-progress
68+
java -jar target/social-network-analytics.jar
69+
70+
- name: Teardown
71+
if: always()
72+
working-directory: social-network-analytics
73+
run: docker compose down

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ and runnable demos via both `curl` and a Java program.
1414
| [graph-rag](./graph-rag/) | Graph RAG system combining knowledge graphs with vector search for retrieval-augmented generation | Graph traversal, Vector similarity, Full-text indexing, Neo4j Bolt, LangChain4j |
1515
| [fraud-detection](./fraud-detection/) | Fraud detection system unifying graph, vector, and time-series signals | Graph traversal, Vector similarity, Time-series, Cypher |
1616
| [realtime-analytics](./realtime-analytics/) | Unified IoT and service monitoring platform | Time-series, Graph traversal, Cypher |
17+
| [social-network-analytics](./social-network-analytics/) | Social network analytics with materialized view dashboards | Materialized views, Graph traversal, Time-series, Polyglot (SQL + OpenCypher) |
1718

1819
## Structure
1920

Lines changed: 205 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,205 @@
1+
# Social Network Analytics Use Case — Design
2+
3+
**Date:** 2026-03-06
4+
**Branch:** feat/social-network-analytics
5+
**ArcadeDB version:** 26.3.1
6+
7+
## Overview
8+
9+
A social network analytics platform demonstrating ArcadeDB's materialized views (all three refresh modes), graph traversal, time-series engagement tracking, and polyglot querying (SQL + OpenCypher). Users create posts, follow each other, join groups, and interact with content — materialized views pre-compute trending posts, user post counts, and influence scores.
10+
11+
**Key ArcadeDB features:** Materialized views (MANUAL, INCREMENTAL, PERIODIC), Graph traversal, Time-series, Polyglot querying (SQL + OpenCypher)
12+
13+
## Repository Structure
14+
15+
```
16+
social-network-analytics/
17+
├── docker-compose.yml
18+
├── setup.sh
19+
├── sql/
20+
│ ├── 01-schema.sql
21+
│ ├── 02-data.sql
22+
│ └── 03-materialized-views.sql
23+
├── queries/
24+
│ └── queries.sh
25+
├── java/
26+
│ ├── pom.xml
27+
│ └── src/main/java/com/arcadedb/examples/SocialNetworkAnalytics.java
28+
└── README.md
29+
```
30+
31+
## Docker Compose
32+
33+
- Single service: `arcadedata/arcadedb:26.3.1`
34+
- HTTP API port exposed: `2480`
35+
- Root password via `JAVA_OPTS: "-Darcadedb.server.rootPassword=arcadedb"`
36+
- Healthcheck: `curl -sf http://localhost:2480/api/v1/ready`, interval 5s, retries 20
37+
38+
## Schema (`sql/01-schema.sql`)
39+
40+
### Vertex Types (4)
41+
42+
| Type | Properties | Purpose |
43+
|------|-----------|---------|
44+
| `User` | `name STRING`, `handle STRING`, `joinedAt DATETIME`, `bio STRING` | People in the network |
45+
| `Post` | `title STRING`, `body STRING`, `createdAt DATETIME`, `category STRING` | Content created by users |
46+
| `Topic` | `name STRING`, `description STRING` | Hashtags/topics for categorization |
47+
| `Group` | `name STRING`, `description STRING`, `createdAt DATETIME` | Communities |
48+
49+
### Edge Types (6)
50+
51+
| Type | From -> To | Purpose |
52+
|------|-----------|---------|
53+
| `FOLLOWS` | User -> User | Social graph |
54+
| `CREATED` | User -> Post | Authorship |
55+
| `LIKED` | User -> Post | Engagement |
56+
| `SHARED` | User -> Post | Content spread |
57+
| `TAGGED` | Post -> Topic | Content categorization |
58+
| `MEMBER_OF` | User -> Group | Community membership |
59+
60+
### Document Type (1)
61+
62+
| Type | Properties | Purpose |
63+
|------|-----------|---------|
64+
| `EngagementMetric` | `postRid STRING`, `timestamp DATETIME`, `likes INTEGER`, `shares INTEGER`, `comments INTEGER` | Time-series engagement snapshots per post |
65+
66+
## Materialized Views (`sql/03-materialized-views.sql`)
67+
68+
Applied after data load so the initial refresh has data to work with.
69+
70+
### 1. `TrendingPosts` — PERIODIC (every 1 minute)
71+
72+
Pre-computes hottest posts by aggregating engagement metrics. Periodic refresh suits trending content: frequent enough for dashboards, without post-commit overhead on every interaction.
73+
74+
```sql
75+
CREATE MATERIALIZED VIEW TrendingPosts
76+
AS SELECT postRid, sum(likes) AS totalLikes, sum(shares) AS totalShares,
77+
sum(comments) AS totalComments,
78+
sum(likes) + sum(shares) * 2 + sum(comments) * 3 AS score
79+
FROM EngagementMetric
80+
GROUP BY postRid
81+
REFRESH EVERY 1 MINUTE
82+
```
83+
84+
### 2. `UserPostCounts` — INCREMENTAL (post-commit)
85+
86+
Tracks how many posts each user has created. Incremental refresh is the right fit — simple, low-cost aggregation that should always be current.
87+
88+
```sql
89+
CREATE MATERIALIZED VIEW UserPostCounts
90+
AS SELECT in AS userRid, count(*) AS postCount
91+
FROM CREATED
92+
GROUP BY in
93+
REFRESH INCREMENTAL
94+
```
95+
96+
### 3. `InfluenceScores` — MANUAL
97+
98+
Computes a composite influence score per user: follower count + total engagement on their posts. Expensive computation best refreshed on demand after bulk loads or before generating reports.
99+
100+
```sql
101+
CREATE MATERIALIZED VIEW InfluenceScores
102+
AS SELECT u.name AS userName, u.handle AS handle,
103+
count(DISTINCT f.@rid) AS followers,
104+
sum(e.likes + e.shares + e.comments) AS totalEngagement
105+
FROM User u
106+
LET followers = (SELECT FROM FOLLOWS WHERE in = u.@rid),
107+
posts = (SELECT FROM CREATED WHERE out = u.@rid),
108+
engagement = (SELECT FROM EngagementMetric WHERE postRid IN posts.in)
109+
GROUP BY u.name, u.handle
110+
REFRESH MANUAL
111+
```
112+
113+
*Note: The exact SQL for InfluenceScores may need adjustment during implementation based on ArcadeDB's LET/subquery support in materialized view definitions. Will validate and simplify as needed.*
114+
115+
## Time-Series Design
116+
117+
`EngagementMetric` is a document type used as a time-series bucket. Each record is a snapshot of engagement on a post at a point in time.
118+
119+
Multiple entries per post across several timestamps (hour 1, 2, 3) show engagement growing over time, feeding the `TrendingPosts` materialized view and enabling time-series drill-down queries.
120+
121+
## Queries — Polyglot Strategy
122+
123+
Five labeled sections mixing SQL and OpenCypher based on what each language does best.
124+
125+
### SQL Queries (materialized views, time-series, aggregations)
126+
127+
**1. Trending Content Dashboard** — reads from the periodic materialized view
128+
```sql
129+
SELECT * FROM TrendingPosts ORDER BY score DESC LIMIT 10
130+
```
131+
132+
**2. Engagement Time-Series** — drill into a post's engagement over time
133+
```sql
134+
SELECT timestamp, likes, shares, comments
135+
FROM EngagementMetric WHERE postRid = '<rid>' ORDER BY timestamp
136+
```
137+
138+
**3. Influence Leaderboard** — reads from the manual-refresh view
139+
```sql
140+
SELECT * FROM InfluenceScores ORDER BY totalEngagement DESC LIMIT 5
141+
```
142+
143+
### OpenCypher Queries (graph traversals)
144+
145+
**4. Viral Spread Chain** — how a post propagated through shares
146+
```cypher
147+
MATCH (author:User)-[:CREATED]->(p:Post)<-[:SHARED]-(sharer:User)<-[:FOLLOWS]-(audience:User)
148+
WHERE id(p) = '<rid>'
149+
RETURN author.name, sharer.name, collect(audience.name) AS reachedAudience
150+
```
151+
152+
**5. Community Overlap** — users in the same group who follow each other
153+
```cypher
154+
MATCH (a:User)-[:MEMBER_OF]->(g:Group)<-[:MEMBER_OF]-(b:User)
155+
WHERE (a)-[:FOLLOWS]->(b)
156+
RETURN g.name, a.name, b.name
157+
```
158+
159+
## Sample Data (`sql/02-data.sql`)
160+
161+
### Volumes
162+
163+
| Type | Count | Notes |
164+
|------|-------|-------|
165+
| User | 8 | Mix of high-influence and casual users |
166+
| Post | 12 | Spread across users, 2-3 categories |
167+
| Topic | 4 | Tech, Music, Sports, Travel |
168+
| Group | 3 | Developers, Photographers, Gamers |
169+
| FOLLOWS | ~20 | Asymmetric — some users have many followers |
170+
| CREATED | 12 | One per post |
171+
| LIKED | ~25 | Skewed — some posts get many likes |
172+
| SHARED | ~10 | Concentrated on "viral" posts |
173+
| TAGGED | ~15 | Posts tagged with 1-2 topics |
174+
| MEMBER_OF | ~12 | Users belong to 1-2 groups |
175+
| EngagementMetric | ~36 | 3 time-series snapshots per post |
176+
177+
### Data Story
178+
179+
- **Alice** and **Bob** are high-influence users with many followers and popular posts
180+
- Alice's post about "AI Trends" goes viral — many shares, growing engagement over time
181+
- A cluster of users in the Developers group follow each other, creating community overlap
182+
- Engagement metrics show clear trends: some posts peak early, others grow steadily
183+
184+
## Java Program
185+
186+
`SocialNetworkAnalytics.java` follows the existing `tryRun()`/`printHeader()` pattern using HTTP API (`arcadedb-network`). Five query sections matching the five shell query sections. Uses `query "sql"` for materialized view and time-series queries, `query "opencypher"` for graph traversals.
187+
188+
## CI Workflow
189+
190+
`.github/workflows/social-network-analytics.yml` following the standard matrix pattern:
191+
192+
```yaml
193+
matrix:
194+
runner: [curl, java]
195+
```
196+
197+
Steps: checkout, setup Java 21 (gated), cache ~/.m2 (gated), docker compose up, setup.sh, run queries or build/run fat JAR, docker compose down (always).
198+
199+
## Success Criteria
200+
201+
1. All three materialized view refresh modes demonstrated and working
202+
2. Materialized views return correct pre-computed data when queried
203+
3. Time-series engagement data feeds into the TrendingPosts view
204+
4. SQL queries read from materialized views; OpenCypher queries traverse the graph
205+
5. Both curl and Java runners pass in CI

0 commit comments

Comments
 (0)