Skip to content

fix: count(), tally(), and summarise() on keyed tables and zoomed dm now set counted/grouped columns as primary key#2389

Draft
Copilot wants to merge 6 commits intomainfrom
copilot/fix-count-primary-keys
Draft

fix: count(), tally(), and summarise() on keyed tables and zoomed dm now set counted/grouped columns as primary key#2389
Copilot wants to merge 6 commits intomainfrom
copilot/fix-count-primary-keys

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Feb 21, 2026

count() on dm_keyed_tbl dropped all primary keys instead of setting the counted columns as the new PK. Similarly, tally() and summarise() ignored group vars. The same issue affected zoomed dm objects via dm_update_zoomed() and dm_insert_zoomed().

dm <- dm_nycflights13()
flights <- dm_get_tables(dm, keyed = TRUE)$flights

# Before: no PK (Keys: — | 0 | 0)
# After:  origin as PK (Keys: `origin` | 0 | 0)
flights %>% count(origin)
  • count.dm_keyed_tbl(): Determine counting columns via group_by() + group_vars() on the ... args, pass as new pk — mirrors the approach in summarise.dm_keyed_tbl()
  • tally.dm_keyed_tbl(): Use group_vars(x) as new pk, consistent with summarise
  • count.dm_zoomed(): Set counted columns as new PK via a new_pk attribute, picked up by update_zoomed_pk() during dm_update_zoomed() / dm_insert_zoomed()
  • tally.dm_zoomed(): Set group vars as new PK via the same mechanism
  • summarise.dm_zoomed(): Set group vars as new PK via the same mechanism (previously only preserved original PK if group columns matched, but did not set non-PK group columns as new PK)
  • replace_zoomed_tbl(): Transfer new_pk attribute from zoomed table to col_tracker (keeping zoomed table clean)
  • update_zoomed_pk(): Check for new_pk attribute on tracked_cols before falling back to existing PK-preservation logic
  • Added tests covering single/multiple columns, existing PK columns, and no-column cases for both keyed tables and zoomed dm across count(), tally(), and summarise()
  • Updated existing snapshots to reflect the corrected summarise() PK behavior
Original prompt

This section details on the original issue you should resolve

<issue_title>count() doesn't keep primary keys</issue_title>
<issue_description>I tried implementing dplyr_reconstruct(), with no success so far.

library(conflicted)
library(dm)
library(tidyverse)

# pkgload::load_all()
library(dm)
library(dplyr)

dm <- dm_nycflights13()

tbls <-
  dm %>%
  dm_get_tables(keyed = TRUE)

flights <- tbls$flights
airports <- tbls$airports

by_origin <-
  flights %>%
  count(origin)

by_origin
#> # A tibble: 3 × 2
#> # Keys:     — | 0 | 4
#>   origin     n
#>   <chr>  <int>
#> 1 EWR      641
#> 2 JFK      602
#> 3 LGA      518

# PK for by_origin is missing
dm(flights, airports, by_origin) %>%
  dm_get_all_pks()
#> # A tibble: 1 × 2
#>   table    pk_col
#>   <chr>    <keys>
#> 1 airports faa

Created on 2022-07-21 by the reprex package (v2.0.1)</issue_description>

Comments on the Issue (you are @copilot in this section)

@krlmlr The PR above can be extended.

More action items:

  • rename() can't currently work, need rename.keyed_tbl()
  • check all other verbs
@TSchiefer FWIW, `count()` seems to behave properly when using a `dm_zoomed`:
suppressPackageStartupMessages({
  library(dm)
  library(dplyr)
})

dm_nycflights13() %>% dm_zoom_to(airlines) %>% count(carrier) %>% dm_update_zoomed() %>% dm_get_all_pks()
#> # A tibble: 4 × 3
#>   table    pk_col            autoincrement
#>   <chr>    <keys>            <lgl>        
#> 1 airlines carrier           FALSE        
#> 2 airports faa               FALSE        
#> 3 planes   tailnum           FALSE        
#> 4 weather  origin, time_hour FALSE
dm_nycflights13() %>% dm_zoom_to(airlines) %>% count(name) %>% dm_update_zoomed() %>% dm_get_all_pks()
#> # A tibble: 3 × 3
#>   table    pk_col            autoincrement
#>   <chr>    <keys>            <lgl>        
#> 1 airports faa               FALSE        
#> 2 planes   tailnum           FALSE        
#> 3 weather  origin, time_hour FALSE

Created on 2023-08-21 with reprex v2.0.2</comment_new>
<comment_new>@TSchiefer
Would you expect the columns used for count() to constitute a new PK? I would think that in your example no PK was actually dropped, but a potential one wasn't set.</comment_new>
<comment_new>@krlmlr
This is about dm_get_tables(keyed = TRUE) .</comment_new>
<comment_new>@TSchiefer
Yes, but I am not certain which behavior you'd like. If you use count() on a PK in a dm_keyed_tbl it will be kept as a PK:

library(conflicted)
library(dm)
library(tidyverse)

# pkgload::load_all()
library(dm)
library(dplyr)

dm <- dm_nycflights13()

tbls <-
  dm %>%
  dm_get_tables(keyed = TRUE)

flights <- tbls$flights
airports <- tbls$airports

by_faa <-
  airports %>%
  count(faa)

by_faa
#> # A tibble: 86 × 2
#> # Keys:     `faa` | 1 | 0
#>    faa       n
#>    <chr> <int>
#>  1 ALB       1
#>  2 ATL       1
#>  3 AUS       1
#>  4 BDL       1
#>  5 BHM       1
#>  6 BNA       1
#>  7 BOS       1
#>  8 BTV       1
#>  9 BUF       1
#> 10 BUR       1
#> # ℹ 76 more rows

# PK for by_origin is missing
dm(flights, airports, by_faa) %>%
  dm_get_all_pks()
#> # A tibble: 2 × 3
#>   table    pk_col autoincrement
#>   <chr>    <keys> <lgl>        
#> 1 airports faa    FALSE        
#> 2 by_faa   faa    FALSE

Created on 2023-08-21 with reprex v2.0.2

This seems consistent to me with the behavior for dm_zoomed. We can also discuss some other time.</comment_new>
<comment_new>@krlmlr
In the original post, I expect by_origin to have origin as a new primary key.</comment_new>


💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

Copilot AI changed the title [WIP] Fix count function to retain primary keys fix: count() and tally() on keyed tables now set counted/grouped columns as primary key Feb 21, 2026
Copilot AI requested a review from krlmlr February 21, 2026 17:16
@krlmlr
Copy link
Copy Markdown
Collaborator

krlmlr commented Feb 21, 2026

@copilot: Also fix for zooming, with tests.

@krlmlr krlmlr marked this pull request as ready for review February 21, 2026 17:25
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Feb 21, 2026

@copilot: Also fix for zooming, with tests.

Fixed in b138227. Both count.dm_zoomed() and tally.dm_zoomed() now set the counted/grouped columns as the new PK when dm_update_zoomed() (or dm_insert_zoomed()) is called. The implementation stores the new PK info as an attribute on the col_tracker, which update_zoomed_pk() checks before falling back to the existing PK-preservation logic. Added tests for both zoomed count and tally PK behavior.

Copilot AI changed the title fix: count() and tally() on keyed tables now set counted/grouped columns as primary key fix: count() and tally() on keyed tables and zoomed dm now set counted/grouped columns as primary key Feb 21, 2026
@krlmlr
Copy link
Copy Markdown
Collaborator

krlmlr commented Feb 21, 2026

@copilot: Add tests (and implement if needed) for summarize() that likely suffers from the same problem.

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Feb 21, 2026

@copilot: Add tests (and implement if needed) for summarize() that likely suffers from the same problem.

Fixed in b416792. summarise.dm_zoomed() had the same issue — when grouping by a non-PK column and summarising, the group vars were not set as the new PK. Added the same new_pk attribute mechanism, plus tests covering group-by-non-PK, group-by-PK, and no-groups cases. Updated existing snapshots that reflected the old (incorrect) behavior.

Copilot AI changed the title fix: count() and tally() on keyed tables and zoomed dm now set counted/grouped columns as primary key fix: count(), tally(), and summarise() on keyed tables and zoomed dm now set counted/grouped columns as primary key Feb 21, 2026
@krlmlr
Copy link
Copy Markdown
Collaborator

krlmlr commented Feb 21, 2026

This should become easier when zooming is implemented with keyed tables: #2391. Keeping open as reference and to review tests.

@krlmlr krlmlr marked this pull request as draft February 21, 2026 18:02
@krlmlr krlmlr removed their assignment Feb 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

count() doesn't keep primary keys

2 participants