Skip to content

Commit 7a0eef0

Browse files
authored
Merge pull request #7297 from IQSS/develop
v5.1
2 parents 993d0a3 + a23548d commit 7a0eef0

65 files changed

Lines changed: 2864 additions & 1582 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ oauth-credentials.md
3434

3535
/src/main/webapp/oauth2/newAccount.html
3636
scripts/api/setup-all.sh*
37+
scripts/api/setup-all.*.log
3738

3839
# ctags generated tag file
3940
tags

conf/solr/7.7.2/schema_dv_mdb_copies.xml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -133,9 +133,13 @@
133133
<copyField source="studyAssayOtherMeasurmentType" dest="_text_" maxChars="3000"/>
134134
<copyField source="studyAssayOtherOrganism" dest="_text_" maxChars="3000"/>
135135
<copyField source="studyAssayPlatform" dest="_text_" maxChars="3000"/>
136+
<copyField source="studyAssayOtherPlatform" dest="_text_" maxChars="3000"/>
136137
<copyField source="studyAssayTechnologyType" dest="_text_" maxChars="3000"/>
138+
<copyField source="studyAssayOtherTechnologyType" dest="_text_" maxChars="3000"/>
137139
<copyField source="studyDesignType" dest="_text_" maxChars="3000"/>
140+
<copyField source="studyOtherDesignType" dest="_text_" maxChars="3000"/>
138141
<copyField source="studyFactorType" dest="_text_" maxChars="3000"/>
142+
<copyField source="studyOtherFactorType" dest="_text_" maxChars="3000"/>
139143
<copyField source="subject" dest="_text_" maxChars="3000"/>
140144
<copyField source="subtitle" dest="_text_" maxChars="3000"/>
141145
<copyField source="targetSampleActualSize" dest="_text_" maxChars="3000"/>
@@ -154,4 +158,4 @@
154158
<copyField source="universe" dest="_text_" maxChars="3000"/>
155159
<copyField source="weighting" dest="_text_" maxChars="3000"/>
156160
<copyField source="westLongitude" dest="_text_" maxChars="3000"/>
157-
</schema>
161+
</schema>

conf/solr/7.7.2/schema_dv_mdb_fields.xml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -133,9 +133,13 @@
133133
<field name="studyAssayOtherMeasurmentType" type="text_en" multiValued="true" stored="true" indexed="true"/>
134134
<field name="studyAssayOtherOrganism" type="text_en" multiValued="true" stored="true" indexed="true"/>
135135
<field name="studyAssayPlatform" type="text_en" multiValued="true" stored="true" indexed="true"/>
136+
<field name="studyAssayOtherPlatform" type="text_en" multiValued="true" stored="true" indexed="true"/>
136137
<field name="studyAssayTechnologyType" type="text_en" multiValued="true" stored="true" indexed="true"/>
138+
<field name="studyAssayOtherTechnologyType" type="text_en" multiValued="true" stored="true" indexed="true"/>
137139
<field name="studyDesignType" type="text_en" multiValued="true" stored="true" indexed="true"/>
140+
<field name="studyOtherDesignType" type="text_en" multiValued="true" stored="true" indexed="true"/>
138141
<field name="studyFactorType" type="text_en" multiValued="true" stored="true" indexed="true"/>
142+
<field name="studyOtherFactorType" type="text_en" multiValued="true" stored="true" indexed="true"/>
139143
<field name="subject" type="text_en" multiValued="true" stored="true" indexed="true"/>
140144
<field name="subtitle" type="text_en" multiValued="false" stored="true" indexed="true"/>
141145
<field name="targetSampleActualSize" type="text_en" multiValued="false" stored="true" indexed="true"/>
@@ -154,4 +158,4 @@
154158
<field name="universe" type="text_en" multiValued="true" stored="true" indexed="true"/>
155159
<field name="weighting" type="text_en" multiValued="false" stored="true" indexed="true"/>
156160
<field name="westLongitude" type="text_en" multiValued="true" stored="true" indexed="true"/>
157-
</fields>
161+
</fields>

doc/release-notes/5.0-release-notes.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -302,13 +302,15 @@ Add the below JVM options beneath the -Ddataverse settings:
302302

303303
For production environments:
304304

305-
`/usr/local/payara5/bin/asadmin create-jvm-options "\-Ddoi.dataciterestapiurlstring=https://api.datacite.org"`
305+
`/usr/local/payara5/bin/asadmin create-jvm-options "\-Ddoi.dataciterestapiurlstring=https\://api.datacite.org"`
306306

307307
For test environments:
308308

309-
`/usr/local/payara5/bin/asadmin create-jvm-options "\-Ddoi.dataciterestapiurlstring=https://api.test.datacite.org"`
309+
`/usr/local/payara5/bin/asadmin create-jvm-options "\-Ddoi.dataciterestapiurlstring=https\://api.test.datacite.org"`
310310

311-
The JVM option `doi.mdcbaseurlstring` should be deleted if it was previously set.
311+
The JVM option `doi.mdcbaseurlstring` should be deleted if it was previously set, for example:
312+
313+
`/usr/local/payara5/bin/asadmin delete-jvm-options "\-Ddoi.mdcbaseurlstring=https\://api.test.datacite.org"`
312314

313315
4. (Recommended for installations using DataCite) Pre-register DOIs
314316

Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
# Dataverse 5.1
2+
3+
This release brings new features, enhancements, and bug fixes to Dataverse. Thank you to all of the community members who contributed code, suggestions, bug reports, and other assistance across the project.
4+
5+
## Release Highlights
6+
7+
### Large File Upload for Installations Using AWS S3
8+
9+
The added support for multipart upload through the API and UI (Issue #6763) will allow files larger than 5 GB to be uploaded to Dataverse when an installation is running on AWS S3. Previously, only non-AWS S3 storage configurations would allow uploads larger than 5 GB.
10+
11+
### Dataset-Specific Stores
12+
13+
In previous releases, configuration options were added that allow each dataverse to have a specific store enabled. This release adds even more granularity, with the ability to set a dataset-level store.
14+
15+
## Major Use Cases
16+
17+
Newly-supported use cases in this release include:
18+
19+
- Users can now upload files larger than 5 GB on installations running AWS S3 (Issue #6763, PR #6995)
20+
- Administrators will now be able to specify a store at the dataset level in addition to the Dataverse level (Issue #6872, PR #7272)
21+
- Users will have their dataset's directory structure retained when uploading a dataset with shapefiles (Issue #6873, PR #7279)
22+
- Users will now be able to download zip files through the experimental Zipper service when the set of downloaded files have duplicate names (Issue [#80](https://github.com/IQSS/dataverse.harvard.edu/issues/80), PR #7276)
23+
- Users will now be able to download zip files with the proper file structure through the experiment Zipper service (Issue #7255, PR #7258)
24+
- Administrators will be able to use new APIs to keep the Solr index and the DB in sync, allowing easier resolution of an issue that would occasionally cause stale search results to not load. (Issue #4225, PR #7211)
25+
26+
## Notes for Dataverse Installation Administrators
27+
28+
### New API for setting a Dataset-level Store
29+
30+
- This release adds a new API for setting a dataset-specific store. Learn more in the Managing Dataverse and Datasets section of the [Admin Guide](http://guides.dataverse.org/en/5.1/admin/solr-search-index.html).
31+
32+
### Multipart Upload Storage Monitoring, Recommended Use for Multipart Upload
33+
34+
Charges may be incurred for storage reserved for multipart uploads that are not completed or cancelled. Administrators may want to do periodic manual or automated checks for open multipart uploads. Learn more in the Big Data Support section of the [Developers Guide](http://guides.dataverse.org/en/5.1/developer/big-data-support.html).
35+
36+
While multipart uploads can support much larger files, and can have advantages in terms of robust transfer and speed, they are more complex than single part direct uploads. Administrators should consider taking advantage of the options to limit use of multipart uploads to specific users by using multiple stores and configuring access to stores with high file size limits to specific Dataverses (added in 4.20) or Datasets (added in this release).
37+
38+
### New APIs for keeping Solr records in sync
39+
40+
This release adds new APIs to keep the Solr index and the DB in sync, allowing easier resolution of an issue that would occasionally cause search results to not load. Learn more in the Solr section of the [Admin Guide](http://guides.dataverse.org/en/5.1/admin/solr-search-index.html).
41+
42+
### Documentation for Purging the Ingest Queue
43+
44+
At times, it may be necessary to cancel long-running Ingest jobs in the interest of system stability. The Troubleshooting section of the [Admin Guide](http://guides.dataverse.org/en/5.1/admin/) now has specific steps.
45+
46+
### Biomedical Metadata Block Updated
47+
48+
The Life Science Metadata block (biomedical.tsv) was updated. "Other Design Type", "Other Factor Type", "Other Technology Type", "Other Technology Platform" boxes were added. See the "Additional Upgrade Steps" below if you use this in your installation.
49+
50+
## Notes for Tool Developers and Integrators
51+
52+
### Spaces in File Names
53+
54+
Dataverse Installations using S3 storage will no longer replace spaces in file names of downloaded files with the + character. If your tool or integration has any special handling around this, you may need to make further adjustments to maintain backwards compatibility while also supporting Dataverse installations on 5.1+.
55+
56+
## Complete List of Changes
57+
58+
For the complete list of code changes in this release, see the [5.1 Milestone](https://github.com/IQSS/dataverse/milestone/90?closed=1) in Github.
59+
60+
For help with upgrading, installing, or general questions please post to the [Dataverse Google Group](https://groups.google.com/forum/#!forum/dataverse-community) or email [email protected].
61+
62+
## Installation
63+
64+
If this is a new installation, please see our [Installation Guide](http://guides.dataverse.org/en/5.1/installation/)
65+
66+
## Upgrade Instructions
67+
68+
0. These instructions assume that you've already successfully upgraded from Dataverse 4.x to Dataverse 5 following the instructions in the [Dataverse 5 Release Notes](https://github.com/IQSS/dataverse/releases/tag/v5.0).
69+
70+
1. Undeploy the previous version.
71+
72+
<payara install path>/payara/bin/asadmin list-applications
73+
<payara install path>/payara/bin/asadmin undeploy dataverse
74+
75+
2. Stop payara and remove the generated directory, start.
76+
77+
- service payara stop
78+
- remove the generated directory: rm -rf <payara install path>payara/payara/domains/domain1/generated
79+
- service payara start
80+
81+
3. Deploy this version.
82+
<payara install path>/payara/bin/asadmin deploy <path>dataverse-5.1.war
83+
84+
4. Restart payara
85+
86+
### Additional Upgrade Steps
87+
88+
1. Update Biomedical Metadata Block (if used), Reload Solr, ReExportAll
89+
90+
`wget https://github.com/IQSS/dataverse/releases/download/5.1/biomedical.tsv`
91+
`curl http://localhost:8080/api/admin/datasetfield/load -X POST --data-binary @biomedical.tsv -H "Content-type: text/tab-separated-values"`
92+
- copy schema_dv_mdb_fields.xml and schema_dv_mdb_copies.xml to solr server, for example into /usr/local/solr/solr-7.7.2/server/solr/collection1/conf/ directory
93+
- reload Solr, for example, http://localhost:8983/solr/admin/cores?action=RELOAD&core=collection1
94+
- Run ReExportall to update JSON Exports
95+
<http://guides.dataverse.org/en/5.1/admin/metadataexport.html?highlight=export#batch-exports-through-the-api>

doc/sphinx-guides/source/admin/dataverses-datasets.rst

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,8 @@ The available drivers can be listed with::
5959

6060
curl -H "X-Dataverse-key: $API_TOKEN" http://$SERVER/api/admin/dataverse/storageDrivers
6161
62+
(Individual datasets can be configured to use specific file stores as well. See the "Datasets" section below.)
63+
6264

6365
Datasets
6466
--------
@@ -130,3 +132,23 @@ Diagnose Constraint Violations Issues in Datasets
130132

131133
To identify invalid data values in specific datasets (if, for example, an attempt to edit a dataset results in a ConstraintViolationException in the server log), or to check all the datasets in the Dataverse for constraint violations, see :ref:`Dataset Validation <dataset-validation-api>` in the :doc:`/api/native-api` section of the User Guide.
132134

135+
Configure a Dataset to store all new files in a specific file store
136+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
137+
138+
Configure a dataset to use a specific file store (this API can only be used by a superuser) ::
139+
140+
curl -H "X-Dataverse-key: $API_TOKEN" -X PUT -d $storageDriverLabel http://$SERVER/api/datasets/$dataset-id/storageDriver
141+
142+
The current driver can be seen using::
143+
144+
curl http://$SERVER/api/datasets/$dataset-id/storageDriver
145+
146+
It can be reset to the default store as follows (only a superuser can do this) ::
147+
148+
curl -H "X-Dataverse-key: $API_TOKEN" -X DELETE http://$SERVER/api/datasets/$dataset-id/storageDriver
149+
150+
The available drivers can be listed with::
151+
152+
curl -H "X-Dataverse-key: $API_TOKEN" http://$SERVER/api/admin/dataverse/storageDrivers
153+
154+

doc/sphinx-guides/source/admin/mail-groups.rst

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -33,11 +33,9 @@ To list just that Mail Domain Group, you can include the alias in the curl comma
3333
Creating a Mail Domain Group
3434
----------------------------
3535

36-
Mail Domain Groups can be created with a simple JSON file:
36+
Mail Domain Groups can be created with a simple JSON file such as domainGroup1.json:
3737

3838
.. code-block:: json
39-
:caption: domainGroup1.json
40-
:name: domainGroup1.json
4139
4240
{
4341
"name": "Users from @example.org",
@@ -60,7 +58,7 @@ To load it into your Dataverse installation, either use a ``POST`` or ``PUT`` re
6058
Updating a Mail Domain Group
6159
----------------------------
6260

63-
Editing a group is done by replacing it. Grab your group definition like the :ref:`above example <domainGroup1.json>`,
61+
Editing a group is done by replacing it. Grab your group definition like the domainGroup1.json example above,
6462
change it as you like and ``PUT`` it into your installation:
6563

6664
``curl -X PUT -H 'Content-type: application/json' http://localhost:8080/api/admin/groups/domain/domainGroup1 --upload-file domainGroup1.json``

doc/sphinx-guides/source/admin/solr-search-index.rst

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,18 @@ There are two ways to perform a full reindex of the Dataverse search index. Star
1414
Clear and Reindex
1515
+++++++++++++++++
1616

17+
18+
Index and Database Consistency
19+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
20+
21+
Get a list of all database objects that are missing in Solr, and Solr documents that are missing in the database:
22+
23+
``curl http://localhost:8080/api/admin/index/status``
24+
25+
Remove all Solr documents that are orphaned (ie not associated with objects in the database):
26+
27+
``curl http://localhost:8080/api/admin/index/clear-orphans``
28+
1729
Clearing Data from Solr
1830
~~~~~~~~~~~~~~~~~~~~~~~
1931

@@ -81,4 +93,4 @@ If you suspect something isn't indexed properly in solr, you may bypass the Data
8193

8294
``curl "http://localhost:8983/solr/collection1/select?q=dsPersistentId:doi:10.15139/S3/HFV0AO"``
8395

84-
to see the JSON you were hopefully expecting to see passed along to Dataverse.
96+
to see the JSON you were hopefully expecting to see passed along to Dataverse.

doc/sphinx-guides/source/admin/troubleshooting.rst

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,26 @@ A User Needs Their Account to Be Converted From Institutional (Shibboleth), ORCI
4343

4444
See :ref:`converting-shibboleth-users-to-local` and :ref:`converting-oauth-users-to-local`.
4545

46+
.. _troubleshooting-ingest:
47+
48+
Ingest
49+
------
50+
51+
Long-Running Ingest Jobs Have Exhausted System Resources
52+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
53+
54+
Ingest is both CPU- and memory-intensive, and depending on your system resources and the size and format of tabular data files uploaded, may render Dataverse unresponsive or nearly inoperable. It is possible to cancel these jobs by purging the ingest queue.
55+
56+
``/usr/local/payara5/mq/bin/imqcmd -u admin query dst -t q -n DataverseIngest`` will query the DataverseIngest destination. The password, unless you have changed it, matches the username.
57+
58+
``/usr/local/payara5/mq/bin/imqcmd -u admin purge dst -t q -n DataverseIngest`` will purge the DataverseIngest queue, and prompt for your confirmation.
59+
60+
Finally, list destinations to verify that the purge was successful::
61+
62+
``/usr/local/payara5/mq/bin/imqcmd -u admin list dst``
63+
64+
If you are still running Glassfish, substitute glassfish4 for payara5 above. If you have installed Dataverse in some other location, adjust the above paths accordingly.
65+
4666
.. _troubleshooting-payara:
4767

4868
Payara

doc/sphinx-guides/source/api/native-api.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1654,6 +1654,11 @@ The fully expanded example above (without environment variables) looks like this
16541654
16551655
Calling the destroy endpoint is permanent and irreversible. It will remove the dataset and its datafiles, then re-index the parent dataverse in Solr. This endpoint requires the API token of a superuser.
16561656
1657+
Configure a Dataset to Use a Specific File Store
1658+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1659+
1660+
``/api/datasets/$dataset-id/storageDriver`` can be used to check, configure or reset the designated file store (storage driver) for a dataset. Please see the :doc:`/admin/dataverses-datasets` section of the guide for more information on this API.
1661+
16571662
Files
16581663
-----
16591664

0 commit comments

Comments
 (0)