|
| 1 | +# Social Network Analytics Use Case — Design |
| 2 | + |
| 3 | +**Date:** 2026-03-06 |
| 4 | +**Branch:** feat/social-network-analytics |
| 5 | +**ArcadeDB version:** 26.3.1 |
| 6 | + |
| 7 | +## Overview |
| 8 | + |
| 9 | +A social network analytics platform demonstrating ArcadeDB's materialized views (all three refresh modes), graph traversal, time-series engagement tracking, and polyglot querying (SQL + OpenCypher). Users create posts, follow each other, join groups, and interact with content — materialized views pre-compute trending posts, user post counts, and influence scores. |
| 10 | + |
| 11 | +**Key ArcadeDB features:** Materialized views (MANUAL, INCREMENTAL, PERIODIC), Graph traversal, Time-series, Polyglot querying (SQL + OpenCypher) |
| 12 | + |
| 13 | +## Repository Structure |
| 14 | + |
| 15 | +``` |
| 16 | +social-network-analytics/ |
| 17 | +├── docker-compose.yml |
| 18 | +├── setup.sh |
| 19 | +├── sql/ |
| 20 | +│ ├── 01-schema.sql |
| 21 | +│ ├── 02-data.sql |
| 22 | +│ └── 03-materialized-views.sql |
| 23 | +├── queries/ |
| 24 | +│ └── queries.sh |
| 25 | +├── java/ |
| 26 | +│ ├── pom.xml |
| 27 | +│ └── src/main/java/com/arcadedb/examples/SocialNetworkAnalytics.java |
| 28 | +└── README.md |
| 29 | +``` |
| 30 | + |
| 31 | +## Docker Compose |
| 32 | + |
| 33 | +- Single service: `arcadedata/arcadedb:26.3.1` |
| 34 | +- HTTP API port exposed: `2480` |
| 35 | +- Root password via `JAVA_OPTS: "-Darcadedb.server.rootPassword=arcadedb"` |
| 36 | +- Healthcheck: `curl -sf http://localhost:2480/api/v1/ready`, interval 5s, retries 20 |
| 37 | + |
| 38 | +## Schema (`sql/01-schema.sql`) |
| 39 | + |
| 40 | +### Vertex Types (4) |
| 41 | + |
| 42 | +| Type | Properties | Purpose | |
| 43 | +|------|-----------|---------| |
| 44 | +| `User` | `name STRING`, `handle STRING`, `joinedAt DATETIME`, `bio STRING` | People in the network | |
| 45 | +| `Post` | `title STRING`, `body STRING`, `createdAt DATETIME`, `category STRING` | Content created by users | |
| 46 | +| `Topic` | `name STRING`, `description STRING` | Hashtags/topics for categorization | |
| 47 | +| `Group` | `name STRING`, `description STRING`, `createdAt DATETIME` | Communities | |
| 48 | + |
| 49 | +### Edge Types (6) |
| 50 | + |
| 51 | +| Type | From -> To | Purpose | |
| 52 | +|------|-----------|---------| |
| 53 | +| `FOLLOWS` | User -> User | Social graph | |
| 54 | +| `CREATED` | User -> Post | Authorship | |
| 55 | +| `LIKED` | User -> Post | Engagement | |
| 56 | +| `SHARED` | User -> Post | Content spread | |
| 57 | +| `TAGGED` | Post -> Topic | Content categorization | |
| 58 | +| `MEMBER_OF` | User -> Group | Community membership | |
| 59 | + |
| 60 | +### Document Type (1) |
| 61 | + |
| 62 | +| Type | Properties | Purpose | |
| 63 | +|------|-----------|---------| |
| 64 | +| `EngagementMetric` | `postRid STRING`, `timestamp DATETIME`, `likes INTEGER`, `shares INTEGER`, `comments INTEGER` | Time-series engagement snapshots per post | |
| 65 | + |
| 66 | +## Materialized Views (`sql/03-materialized-views.sql`) |
| 67 | + |
| 68 | +Applied after data load so the initial refresh has data to work with. |
| 69 | + |
| 70 | +### 1. `TrendingPosts` — PERIODIC (every 1 minute) |
| 71 | + |
| 72 | +Pre-computes hottest posts by aggregating engagement metrics. Periodic refresh suits trending content: frequent enough for dashboards, without post-commit overhead on every interaction. |
| 73 | + |
| 74 | +```sql |
| 75 | +CREATE MATERIALIZED VIEW TrendingPosts |
| 76 | + AS SELECT postRid, sum(likes) AS totalLikes, sum(shares) AS totalShares, |
| 77 | + sum(comments) AS totalComments, |
| 78 | + sum(likes) + sum(shares) * 2 + sum(comments) * 3 AS score |
| 79 | + FROM EngagementMetric |
| 80 | + GROUP BY postRid |
| 81 | + REFRESH EVERY 1 MINUTE |
| 82 | +``` |
| 83 | + |
| 84 | +### 2. `UserPostCounts` — INCREMENTAL (post-commit) |
| 85 | + |
| 86 | +Tracks how many posts each user has created. Incremental refresh is the right fit — simple, low-cost aggregation that should always be current. |
| 87 | + |
| 88 | +```sql |
| 89 | +CREATE MATERIALIZED VIEW UserPostCounts |
| 90 | + AS SELECT in AS userRid, count(*) AS postCount |
| 91 | + FROM CREATED |
| 92 | + GROUP BY in |
| 93 | + REFRESH INCREMENTAL |
| 94 | +``` |
| 95 | + |
| 96 | +### 3. `InfluenceScores` — MANUAL |
| 97 | + |
| 98 | +Computes a composite influence score per user: follower count + total engagement on their posts. Expensive computation best refreshed on demand after bulk loads or before generating reports. |
| 99 | + |
| 100 | +```sql |
| 101 | +CREATE MATERIALIZED VIEW InfluenceScores |
| 102 | + AS SELECT u.name AS userName, u.handle AS handle, |
| 103 | + count(DISTINCT f.@rid) AS followers, |
| 104 | + sum(e.likes + e.shares + e.comments) AS totalEngagement |
| 105 | + FROM User u |
| 106 | + LET followers = (SELECT FROM FOLLOWS WHERE in = u.@rid), |
| 107 | + posts = (SELECT FROM CREATED WHERE out = u.@rid), |
| 108 | + engagement = (SELECT FROM EngagementMetric WHERE postRid IN posts.in) |
| 109 | + GROUP BY u.name, u.handle |
| 110 | + REFRESH MANUAL |
| 111 | +``` |
| 112 | + |
| 113 | +*Note: The exact SQL for InfluenceScores may need adjustment during implementation based on ArcadeDB's LET/subquery support in materialized view definitions. Will validate and simplify as needed.* |
| 114 | + |
| 115 | +## Time-Series Design |
| 116 | + |
| 117 | +`EngagementMetric` is a document type used as a time-series bucket. Each record is a snapshot of engagement on a post at a point in time. |
| 118 | + |
| 119 | +Multiple entries per post across several timestamps (hour 1, 2, 3) show engagement growing over time, feeding the `TrendingPosts` materialized view and enabling time-series drill-down queries. |
| 120 | + |
| 121 | +## Queries — Polyglot Strategy |
| 122 | + |
| 123 | +Five labeled sections mixing SQL and OpenCypher based on what each language does best. |
| 124 | + |
| 125 | +### SQL Queries (materialized views, time-series, aggregations) |
| 126 | + |
| 127 | +**1. Trending Content Dashboard** — reads from the periodic materialized view |
| 128 | +```sql |
| 129 | +SELECT * FROM TrendingPosts ORDER BY score DESC LIMIT 10 |
| 130 | +``` |
| 131 | + |
| 132 | +**2. Engagement Time-Series** — drill into a post's engagement over time |
| 133 | +```sql |
| 134 | +SELECT timestamp, likes, shares, comments |
| 135 | +FROM EngagementMetric WHERE postRid = '<rid>' ORDER BY timestamp |
| 136 | +``` |
| 137 | + |
| 138 | +**3. Influence Leaderboard** — reads from the manual-refresh view |
| 139 | +```sql |
| 140 | +SELECT * FROM InfluenceScores ORDER BY totalEngagement DESC LIMIT 5 |
| 141 | +``` |
| 142 | + |
| 143 | +### OpenCypher Queries (graph traversals) |
| 144 | + |
| 145 | +**4. Viral Spread Chain** — how a post propagated through shares |
| 146 | +```cypher |
| 147 | +MATCH (author:User)-[:CREATED]->(p:Post)<-[:SHARED]-(sharer:User)<-[:FOLLOWS]-(audience:User) |
| 148 | +WHERE id(p) = '<rid>' |
| 149 | +RETURN author.name, sharer.name, collect(audience.name) AS reachedAudience |
| 150 | +``` |
| 151 | + |
| 152 | +**5. Community Overlap** — users in the same group who follow each other |
| 153 | +```cypher |
| 154 | +MATCH (a:User)-[:MEMBER_OF]->(g:Group)<-[:MEMBER_OF]-(b:User) |
| 155 | +WHERE (a)-[:FOLLOWS]->(b) |
| 156 | +RETURN g.name, a.name, b.name |
| 157 | +``` |
| 158 | + |
| 159 | +## Sample Data (`sql/02-data.sql`) |
| 160 | + |
| 161 | +### Volumes |
| 162 | + |
| 163 | +| Type | Count | Notes | |
| 164 | +|------|-------|-------| |
| 165 | +| User | 8 | Mix of high-influence and casual users | |
| 166 | +| Post | 12 | Spread across users, 2-3 categories | |
| 167 | +| Topic | 4 | Tech, Music, Sports, Travel | |
| 168 | +| Group | 3 | Developers, Photographers, Gamers | |
| 169 | +| FOLLOWS | ~20 | Asymmetric — some users have many followers | |
| 170 | +| CREATED | 12 | One per post | |
| 171 | +| LIKED | ~25 | Skewed — some posts get many likes | |
| 172 | +| SHARED | ~10 | Concentrated on "viral" posts | |
| 173 | +| TAGGED | ~15 | Posts tagged with 1-2 topics | |
| 174 | +| MEMBER_OF | ~12 | Users belong to 1-2 groups | |
| 175 | +| EngagementMetric | ~36 | 3 time-series snapshots per post | |
| 176 | + |
| 177 | +### Data Story |
| 178 | + |
| 179 | +- **Alice** and **Bob** are high-influence users with many followers and popular posts |
| 180 | +- Alice's post about "AI Trends" goes viral — many shares, growing engagement over time |
| 181 | +- A cluster of users in the Developers group follow each other, creating community overlap |
| 182 | +- Engagement metrics show clear trends: some posts peak early, others grow steadily |
| 183 | + |
| 184 | +## Java Program |
| 185 | + |
| 186 | +`SocialNetworkAnalytics.java` follows the existing `tryRun()`/`printHeader()` pattern using HTTP API (`arcadedb-network`). Five query sections matching the five shell query sections. Uses `query "sql"` for materialized view and time-series queries, `query "opencypher"` for graph traversals. |
| 187 | + |
| 188 | +## CI Workflow |
| 189 | + |
| 190 | +`.github/workflows/social-network-analytics.yml` following the standard matrix pattern: |
| 191 | + |
| 192 | +```yaml |
| 193 | +matrix: |
| 194 | + runner: [curl, java] |
| 195 | +``` |
| 196 | +
|
| 197 | +Steps: checkout, setup Java 21 (gated), cache ~/.m2 (gated), docker compose up, setup.sh, run queries or build/run fat JAR, docker compose down (always). |
| 198 | +
|
| 199 | +## Success Criteria |
| 200 | +
|
| 201 | +1. All three materialized view refresh modes demonstrated and working |
| 202 | +2. Materialized views return correct pre-computed data when queried |
| 203 | +3. Time-series engagement data feeds into the TrendingPosts view |
| 204 | +4. SQL queries read from materialized views; OpenCypher queries traverse the graph |
| 205 | +5. Both curl and Java runners pass in CI |
0 commit comments