The openCost team hosted three very fruitful online events in March to discuss the metadata scheme for capturing cost data for journal articles with the community. We would like to take this opportunity to thank all participants once again for the lively interest as well as the many comments and suggestions on our schema. The results of the three online events are summarized here.
- External data transfer and internal data processing
- Cooperation with Folio
- Cost data in the repository or library system
- JSON
- Transformative agreements und Memberships
- Collective invoice for framework agreement with OA publishers
- DFG-Monitoring
- Partial costs/cost splitting
- Voucher element
- Second publication
- Terminology
- Bibliographic block "bibliographic_information"
- Metadata schema: field "institution"
- Metadata schema: field "oa_status"
- Metadata schema: field "amount_paid"
- Metadata schema: field "date_paid"
- Metadata schema: field "publisher"
External data transfer and internal data processing
During the online discussions, a wish was expressed to the openCost team to develop an extended form of the schema to cover local or internal components as well (e.g. local components from SAP such as differentiation between centralized and decentralized payments). The internal data storage is usually broader than what should or can be exchanged. Although the planned openCost schema is primarily intended for external data exchange, theoretically an internal format could also be derived from the openCost format if certain elements are added to it (see: Issue 36).
Cooperation with Folio
To what extent is the project in exchange with Folio? Are colleagues from the Folio group actively involved in the openCost discussions, and if so, who?
Folio will have a module in which open access publication fees can be included. The openCost scheme is to be implemented there, and this data can then be output via the OAI interface.
- Contact person: Björn Muschall (UB Leipzig)
- Contact person OA subproject: Christina Prell (UB Regensburg)
Cost data in the repository or library system
A fundamental discussion arose as to whether the cost data should be stored in the repository or in the library system. The openCost team noted that cost data is part of the metadata of an article. Therefore, they should also be in the repository. In the future the data could be exchanged between the library system and the repository via the openCost format.
JSON
Is it intended to additionally generate a JSON representation of the data?
We are first developing the XML schema. Background is the OAI interface and the fact that the project partners use repositories, which in connection with OAI-PMH have mostly established xml as exchange format. On the other hand, we do not want to complicate the discussion by putting two schemas up for disposition. In a further step, we want to convert the XML into a JSON format (siehe: Issue 41).
Transformative agreements und Memberships
How do you currently capture transformative agreements and memberships in the schema?
The current draft focuses on fee-based single articles. In a next step, we will look at how transformative agreements and memberships can be integrated into the schema. At article level, this should be able to be specified in the future via the “part_of_contract” field. Currently, there are still difficulties in uniquely identifying transformative agreements, as not all of them have a unique ID (e.g. ESAC ID) that can be referenced and stored in the schema.
Collective invoice for framework agreement with OA publishers
How to deal with such collective invoices?
As long as the individual articles have a specific amount, this is not problematic. An invoice number can be specified multiple times (siehe: Issue 38).
DFG-Monitoring
Should there also be the possibility to capture the funding number in openCost?
There is still a need for further discussion here. Our preliminary consideration is that openCost forms the cost block as a supplement to other formats such as DataCite. Funding information can be obtained via the DOI. It was noted by participants that one should not rely on getting the funding information completely automated (e.g. Crossref not complete). Therefore, there should be an opportunity to add it manually at some point. At which point this should be done is also still up for discussion (see: Issue 39).
Partial costs/cost splitting
Due to higher costs, the topic of cost splitting will become increasingly important in the future. Therefore, many of the participants would welcome the possibility to mark partial amounts as such without having to name concrete sums.
In this regard, it was noted that a distinction must be made between internal and external for partial amounts and postings. With internal cost splitting, only one institution reports the complete amount. In the case of external splitting, a note on the partial amount is good in principle. However, there is the problem that in many cases the individual institutions do not know anything about cost splitting. It is also often unknown how many other institutions have paid and what the individual partial amounts look like (siehe: Issue 2).
Voucher element
Can articles, conference contributions and the like for which nothing has been paid also be deposited (incl. justification)?
In principle, there is nothing against including an explicit zero value in the scheme. For example, we could introduce a voucher element. Memberships, etc., for which no classic APCs are calculated and therefore a zero value would have to be specified, will be regulated in the future via the “part_of_contract” field (see: Issue 35).
Second publication
Would the costs of the first publication recorded in the metadata be carried along if the article is second published on an institutional repository?
No separate entry would be made here, since the secondary publication in the repository represents a quasi inventory of the article. The green secondary publication is then done at no cost. However, if we think in the direction of the information budget and Diamond OA, the costs for operating the repository and the technical infrastructure should also be taken into account. Here we would then need additional cost types or a possibility to map this via “memberships”.
Terminology
Regarding “item”: In the discussion it was suggested to rename “item” to “invoice”, because item could be confused with an article in a journal or a book.
On the “type” attribute of “amount_paid”: Here the suggestion was made to unify the terms: “cover” should be called “cover charges” (just like “color charges” and “page charges”). It was also suggested to rename the term “charges” to “charge” (see: Issue 45 und Issue 46). In addition, one participant made a counter-proposal to name the cost types “apc” and “hybrid-oa”: “apc-full-oa” and “apc-hybrid-oa” (see: Issue 37).
Regarding “oa_status”: Alternative designation proposals of the discussion participants instead of “open”, “hybrid”, “closed”: “open-full”, “closed”, “open-hybrid” (see: Issue 37).
Bibliographic block “bibliographic_information”
It was emphasized by the openCost team that the bibliographic block is only an auxiliary construction if there is no DOI. Instead of expanding the bibliographic block further, we would like to steer the discussion in the direction of how to get publishers to assign DOIs for scientific publications by default.
Regarding the “isPartof” field: The community suggested that IDs such as ISSNs should be included here if possible, as this would allow for more unique identification. It was also suggested to define a minimum standard of bibliographic data for this block, but to allow the possibility to provide all bibliographic data. For now, the information seems sufficient to the participants, but if the scheme is further developed, it would be necessary to consider whether there are contexts for which additional fields are needed, such as year, author, etc. In addition, it was noted that defining the field as “required” made sense for journal articles, but could be problematic for other types of publications (see: Issue 34).
Metadata schema: field “institution”
Will there be a standardization of institution names?Standardization is not necessarily relevant for machine readability; unique IDs such as ROR are more important. However, the ROR ID is not mandated, but is recommended or suggested as the primary institution identifier and is the ideal case, as the ROR ID can be enriched.
Regarding parent-child relationships between institutions, it was asked if there is a benchmark. Should the institutions be specified as granular and small as possible, or should the parent organization be specified instead?The ROR ID of the organization that paid should be specified.
Metadata schema: field “oa_status”
Note: The property describes the property of the journal and not the article.
Regarding the value “hybrid”: With regard to the value “hybrid”, it was noted that the designation is terminologically not quite clean, since an article itself is not hybrid, but OA. Furthermore, there would be a risk of confusion with the designation in the case of books. Here, the term “hybrid” often means OA digital and print on paper. In the openCost scheme, the term “hybrid” currently only refers to OA articles published in hybrid journals. All other articles that appear in hybrid journals receive the value “closed”.
Regarding the value “open”: It was suggested to rename the value to “gold”. In addition, it was asked whether the field is necessary at all if “apc” and “hybrid-oa” can be specified for the cost type. One participant commented that he thought it was plausible to specify the OA status at this point. If, for example, only a cover charge or similar were specified, it would not be possible to specify the OA status using cost types alone.
Diamond OA: Is it planned to map Diamond OA as well? In any case, it should be possible to include articles for which you do not pay anything. Whether this value is then called Diamond OA, however, is still up for discussion, since different definitions of Diamond OA are used.
Metadata schema: field “amount_paid”
Regarding the cost type “vat”: The value added tax is a separate cost type and can be specified with the “vat” element. We received feedback that this is not clear enough for many participants in the schema and that this should be made clearer in the documentation.
In addition, it was also noted with regard to the vat cost type that it is not necessarily clear in the current schema which VAT amount belongs to which other cost type. The openCost team is now considering whether we should link this more strongly in the schema and perhaps build brackets to show this explicitly. In order not to complicate the schema, it is currently not intended.
Regarding the cost type “other”: Bank fees would currently have to be subsumed under “other”.
Metadata schema: field “date_paid”
Here it was discussed whether the specification “date_invoice” would not be more meaningful than the “date_paid”. From the point of view of work, it would be easier to store the invoice date. But from the point of view of cost logic, the date of the flow of funds would be the more important (budget year breaks, etc.). Thus, the wish was expressed to be able to record both dates concretely in order to avoid misunderstandings and also with a view to DFG monitoring (see: Issue 17).
Another possibility would be to define two optional date fields, one of which is mandatory. Other possible dates that could be specified at this point would be: “accepted date”, “submitted date”, “published date”. This would allow further evaluations (e.g.: time between submission date and invoice date; for some publishers this is only a few days, for others it can take several months; see: Issue 44).
Metadata schema: field “publisher”
Are the publisher names standardized, e.g. by an ROR-ID?
The ROR-ID is not really usable for specification, because many publishers are represented several times (e.g. Springer Nature six times). In the Austrian project AT2OA2 they are working in a subproject on a publisher standardization via Wikidata. However, the allocation of Wikidata IDs is still in its infancy (see: Issue 16).
In addition, the question was raised as to exactly what should be recorded in the “publisher” field. Example: For an MPDL invoice, is the MPDL recorded at this point or Springer or Wiley? This is still up for discussion.
It was also noted that from SAP’s point of view, the field would rather be named “creditor”. The name “publisher” could possibly be misleading (see: Issue 43).