You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/logical-operators/ExposesMetadataColumns.md
+4-3Lines changed: 4 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,11 +4,11 @@ title: ExposesMetadataColumns
4
4
5
5
# ExposesMetadataColumns Logical Operators
6
6
7
-
`ExposesMetadataColumns` is an [extension](#contract) of the [LogicalPlan](LogicalPlan.md) abstraction for [logical operators](#implementations) that can [withMetadataColumns](#withMetadataColumns).
7
+
`ExposesMetadataColumns` is an [extension](#contract) of the [LogicalPlan](LogicalPlan.md) abstraction for [logical operators](#implementations) that can [add extra metadata columns to output columns](#withMetadataColumns).
Copy file name to clipboardExpand all lines: docs/logical-operators/LogicalPlan.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -208,7 +208,7 @@ resolved: Boolean
208
208
??? note "Lazy Value"
209
209
`resolved` is a Scala **lazy value** to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.
`LogicalRelation` is created using [apply](#apply)factory.
22
+
`LogicalRelation` is created using [apply](#apply)utility.
21
23
22
-
## <spanid="apply"> apply Utility
24
+
## Create LogicalRelation { #apply }
23
25
24
26
```scala
25
27
apply(
@@ -41,44 +43,51 @@ val baseRelation: BaseRelation = ???
41
43
val data = spark.baseRelationToDataFrame(baseRelation)
42
44
```
43
45
46
+
---
47
+
44
48
`apply` is used when:
45
49
46
-
*`SparkSession` is requested for a [DataFrame for a BaseRelation](../SparkSession.md#baseRelationToDataFrame)
47
50
*[CreateTempViewUsing](CreateTempViewUsing.md) command is executed
48
51
*`FallBackFileSourceV2` logical resolution rule is executed
49
-
*[ResolveSQLOnFile](../logical-analysis-rules/ResolveSQLOnFile.md) and [FindDataSourceTable](../logical-analysis-rules/FindDataSourceTable.md) logical evaluation rules are executed
52
+
*`FileStreamSource` ([Spark Structured Streaming]({{ book.structured_streaming }}/datasources/file/FileStreamSource/#getBatch)) is requested to `getBatch`
50
53
*`HiveMetastoreCatalog` is requested to [convert a HiveTableRelation](../hive/HiveMetastoreCatalog.md#convertToLogicalRelation)
51
-
*`FileStreamSource` ([Spark Structured Streaming]({{ book.structured_streaming }}/connectors/file/FileStreamSource/)) is requested to `getBatch`
54
+
*[ResolveDataSource](../logical-analysis-rules/ResolveDataSource.md) logical analysis rule is executed (to [resolve a V1BatchSource](../logical-analysis-rules/ResolveDataSource.md#loadV1BatchSource))
55
+
*[ResolveSQLOnFile](../logical-analysis-rules/ResolveSQLOnFile.md) and [FindDataSourceTable](../logical-analysis-rules/FindDataSourceTable.md) logical evaluation rules are executed
56
+
*`SparkSession` is requested for a [DataFrame for a BaseRelation](../SparkSession.md#baseRelationToDataFrame)
52
57
53
-
## <spanid="refresh"> refresh
58
+
## Refresh (Files of HadoopFsRelation) { #refresh }
54
59
55
-
```scala
56
-
refresh():Unit
57
-
```
60
+
??? note "LogicalPlan"
61
+
62
+
```scala
63
+
refresh(): Unit
64
+
```
58
65
59
-
`refresh` is part of [LogicalPlan](LogicalPlan.md#refresh) abstraction.
66
+
`refresh` is part of [LogicalPlan](LogicalPlan.md#refresh) abstraction.
60
67
61
68
`refresh` requests the [FileIndex](../files/HadoopFsRelation.md#location) (of the [HadoopFsRelation](#relation)) to refresh.
62
69
63
-
!!! note
70
+
??? note "HadoopFsRelation Supported Only"
64
71
`refresh` does the work for [HadoopFsRelation](../files/HadoopFsRelation.md) relations only.
65
72
66
-
## <spanid="simpleString"> Simple Text Representation
73
+
## Simple Text Representation { #simpleString }
67
74
68
-
```scala
69
-
simpleString(
70
-
maxFields: Int):String
71
-
```
75
+
??? note "QueryPlan"
72
76
73
-
`simpleString` is part of the [QueryPlan](../catalyst/QueryPlan.md#simpleString) abstraction.
77
+
```scala
78
+
simpleString(
79
+
maxFields: Int): String
80
+
```
81
+
82
+
`simpleString` is part of the [QueryPlan](../catalyst/QueryPlan.md#simpleString) abstraction.
74
83
75
84
`simpleString` is made up of the [output schema](#output) (truncated to `maxFields`) and the [relation](#relation):
`computeStats` is part of the [LeafNode](LeafNode.md#computeStats) abstraction.
104
+
```scala
105
+
computeStats(): Statistics
106
+
```
98
107
99
-
---
108
+
`computeStats` is part of the [LeafNode](LeafNode.md#computeStats) abstraction.
100
109
101
110
`computeStats` takes the optional [CatalogTable](#catalogTable).
102
111
103
112
If available, `computeStats` requests the `CatalogTable` for the [CatalogStatistics](../CatalogTable.md#stats) that, if available, is requested to [toPlanStats](#toPlanStats) (with the `planStatsEnabled` flag enabled when either [spark.sql.cbo.enabled](../SQLConf.md#cboEnabled) or [spark.sql.cbo.planStats.enabled](../SQLConf.md#planStatsEnabled) is enabled).
104
113
105
114
Otherwise, `computeStats` creates a [Statistics](../cost-based-optimization/Statistics.md) with the `sizeInBytes` only to be the [sizeInBytes](../BaseRelation.md#sizeInBytes) of the [BaseRelation](#relation).
106
115
116
+
## Metadata Output Columns { #metadataOutput }
117
+
118
+
??? note "LogicalPlan"
119
+
120
+
```scala
121
+
metadataOutput: Seq[AttributeReference]
122
+
```
123
+
124
+
`metadataOutput` is part of the [LogicalPlan](LogicalPlan.md#metadataOutput) abstraction.
125
+
126
+
`metadataOutput` checks out whether this [BaseRelation](#relation) is a [HadoopFsRelation](../files/HadoopFsRelation.md).
127
+
If so, `metadataOutput` requests the [FileFormat](../files/HadoopFsRelation.md#fileFormat) (of this [BaseRelation](#relation)) for [metadata columns](../files/FileFormat.md#createFileMetadataCol).
128
+
129
+
Otherwise, `metadataOutput` returns no metadata columns (`Nil`).
130
+
131
+
??? note "Lazy Value"
132
+
`metadataOutput` is a Scala **lazy value** to guarantee that the code to initialize it is executed once only (when accessed for the first time) and the computed value never changes afterwards.
133
+
134
+
Learn more in the [Scala Language Specification]({{ scala.spec }}/05-classes-and-objects.html#lazy).
135
+
136
+
## Add Metadata Columns to Output Columns { #withMetadataColumns }
137
+
138
+
??? note "ExposesMetadataColumns"
139
+
140
+
```scala
141
+
withMetadataColumns(): LogicalRelation
142
+
```
143
+
144
+
`withMetadataColumns` is part of the [ExposesMetadataColumns](ExposesMetadataColumns.md#withMetadataColumns) abstraction.
145
+
146
+
`withMetadataColumns` determines whether thare are any extra [metadata columns](#metadataOutput) to be added to this [output columns](#output).
147
+
148
+
If so, `withMetadataColumns` creates a new `LogicalRelation` with the extra [metadata columns](#metadataOutput) added. Otherwise, `withMetadataColumns` does nothing.
149
+
107
150
## Demo
108
151
109
152
The following are two logically-equivalent batch queries described using different Spark APIs: Scala and SQL.
@@ -114,23 +157,23 @@ val path = "../datasets/people.csv"
0 commit comments