| Title: | Download and wrangle publication data for Monash EBS academic staff |
|---|---|
| Description: | Provide journal rankings from Monash Business School, ABDC, CORE, Scimago and ERA 2010. Fetch publication details from ORCID, Google Scholar, PURE and from DOIs. Find CRAN packages and download statistics for specified authors. |
| Authors: | Rob Hyndman [aut, cre], Michael Lydeamore [aut], Sherry Tee [aut], Parnika Khattri [aut] |
| Maintainer: | Rob Hyndman <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.0 |
| Built: | 2026-05-07 23:04:26 UTC |
| Source: | https://github.com/numbats/spiderorchid |
This is a dataset that contains the quality list of rankings of the Australian Business Deans Council (ABDC) from 2025. You can read more about this list here.
data(abdc)data(abdc)
An object of class tbl_df (inherits from tbl, data.frame) with 2651 rows and 7 columns.
A data frame with 2651 observations on the following 7 variables:
title: Title of the journal
publisher: Publishing house
issn: International Standard Serial Number
issn_online: ISSN Online - as ISSN, but for the online, rather than print version
year_inception: Year the journal started
field_of_research: Field of Research Code as provided by the Australian Bureau of Statistics
rank: In order of best to lowest rank: A*, A, B, or C
https://abdc.edu.au/abdc-journal-quality-list/
library(dplyr) abdc |> filter(field_of_research == "4905") |> arrange(rank) |> select(title, rank)library(dplyr) abdc |> filter(field_of_research == "4905") |> arrange(rank) |> select(title, rank)
Two datasets are provided: core and core_journals, which contains lists of
conference and journal rankings respectively, according to the CORE executive committee.
The details of the CORE organisation, and its procedure for ranking are provided below.
core core_journalscore core_journals
An object of class tbl_df (inherits from tbl, data.frame) with 982 rows and 2 columns.
An object of class tbl_df (inherits from tbl, data.frame) with 639 rows and 4 columns.
CORE is an association of university departments of computer science in Australia and New Zealand. Prior to 2004 it was known as the Computer Science Association, CSA.
The CORE Conference Ranking provides assessments of major conferences in the computing disciplines. The rankings are managed by the CORE Executive Committee, with periodic rounds for submission of requests for addition or reranking of conferences. Decisions are made by academic committees based on objective data requested as part of the submission process. Conference rankings are determined by a mix of indicators, including citation rates, paper submission and acceptance rates, and the visibility and research track record of the key people hosting the conference and managing its technical program. A more detailed statement categorizing the ranks A*, A, B, and C can be found here.
core is a data frame with 982 observations and two variables:
title:Title of the conference
rank:Conferences are assigned to one of the following categories:
A*: flagship conference, a leading venue in a discipline area
A: excellent conference, and highly respected in a discipline area
B: good conference, and well regarded in a discipline area
C: other ranked conference venues that meet minimum standards
core_journals is a data frame with 639 observations and 4 variables:
title:Title of the journal
field_of_research: Field of Research Code as provided by the Australian Bureau of Statistics
issn: International Standard Serial Number
rank: In order of best to lowest rank: A*, A, B, or C
https://www.core.edu.au/conference-portal
core core_journalscore core_journals
This dataset contains publications since 2018, downloaded from PURE on 16 January 2025.
Additional data can be updated using the fetch_pure() function.
ebs_pureebs_pure
A data frame with 7 variables:
character. The unique identifier for the publication in PURE.
integer. The year of publication.
character. The authors of the publication.
character. The title of the publication.
character. The journal where the publication appeared.
character. The subtype of the publication.
character. A bibliographic citation in the Harvard format.
character. The DOI of the publication where available.
https://research.monash.edu/en/organisations/econometrics-business-statistics/publications/
This is a dataset that contains the list of journal rankings from the ARC Excellence in Research for Australia 2010 round.
data(era2010)data(era2010)
An object of class tbl_df (inherits from tbl, data.frame) with 20712 rows and 5 columns.
A data frame with 20712 rows and the following 5 variables:
eraid: ERA ID of the journal
title: Title of the journal
issn: International Standard Serial Number
field_of_research: Field of Research Code as provided by the Australian Bureau of Statistics at the time. Note that the codes have since changed.
rank: In order of best to lowest rank: A*, A, B, or C
https://www.righttoknow.org.au/request/journal_list_relating_to_the_201
library(dplyr) era2010 |> filter(field_of_research == "0104") |> arrange(rank)library(dplyr) era2010 |> filter(field_of_research == "0104") |> arrange(rank)
This function searches for CRAN packages by author names. Note that some authors may use different name variations on CRAN (e.g., "Di Cook" and "Dianne Cook"), so it may be necessary to call the function with several variations.
fetch_cran( author_names = NULL, package_names = NULL, downloads_from = "2000-01-01", downloads_to = Sys.Date() )fetch_cran( author_names = NULL, package_names = NULL, downloads_from = "2000-01-01", downloads_to = Sys.Date() )
author_names |
A character vector containing the authors' names in the form used on CRAN. |
package_names |
A character vector of package names. Ignored if |
downloads_from |
A date or character string in the format "YYYY-MM-DD" specifying the date from which to start counting downloads. Default is "2000-01-01". |
downloads_to |
A date or character string in the format "YYYY-MM-DD" specifying the last date for counting downloads. Default is current date. |
A data frame returning meta data about a package including total downloads
between downloads_from and downloads_to.
## Not run: cran2024 <- fetch_cran( author_names = c("Michael Lydeamore", "Di Cook", "Dianne Cook", "Hyndman"), downloads_from = "2024-01-01", downloads_to = "2024-12-31" ) ## End(Not run)## Not run: cran2024 <- fetch_cran( author_names = c("Michael Lydeamore", "Di Cook", "Dianne Cook", "Hyndman"), downloads_from = "2024-01-01", downloads_to = "2024-12-31" ) ## End(Not run)
Retrieves publications for a given list of DOIs using the DOI API and formats them into a structured tibble.
fetch_doi(doi)fetch_doi(doi)
doi |
A character vector of DOIs. |
A tibble containing the article information.
fetch_doi("10.1016/j.ijforecast.2023.10.010")fetch_doi("10.1016/j.ijforecast.2023.10.010")
Retrieves publications for given ORCID IDs, and returns them as a tibble. Only publications with DOIs are returned. The function uses the ORCID API to fetch the DOIs, and then uses the DOI API to fetch the publication details for each DOI.
fetch_orcid(orcid_ids)fetch_orcid(orcid_ids)
orcid_ids |
A character vector of ORCID IDs. |
This function requires authentication on ORCID. If you have not previously
authenticated, it will prompt you to do so when first run. If you just
follow the prompts, you will be authenticated, but only for downloading your
own papers. If you want to download papers from other ORCID IDs, you will
need to authenticate with a 2-legged OAuth. Follow the instructions at
https://info.orcid.org/register-a-client-application-production-member-api/.
To avoid having to do this in each session, store the token obtained from
orcid_auth() in your .Renviron file by running usethis::edit_r_environ().
It should be of the form ORCID_TOKEN=<your token>.
A tibble containing all publications for the specified ORCID IDs.
## Not run: fetch_orcid(c("0000-0003-2531-9408", "0000-0001-5738-1471")) ## End(Not run)## Not run: fetch_orcid(c("0000-0003-2531-9408", "0000-0001-5738-1471")) ## End(Not run)
To download data from PURE, it is necessary to have access to the
API via Simon Angus and the Astro team https://astro.monash.edu/.
The API key is stored in the environment variable PURE_API_KEY.
You can add it to your environment using edit_r_environ().
This is end-point restricted to Monash IP addresses. So either use it on campus
or invoke the VPN before using it off campus.
fetch_pure(years)fetch_pure(years)
years |
A numeric vector of publication years. All publications between the minimum year and the maximum year are returned. |
A data frame containing the data fetched from the PURE API covering the specified publication years.
Retrieves publications for given Google Scholar IDs, and returns them as a tibble. This function retrieves publications for a given Google Scholar ID and formats them into a structured tibble.
fetch_scholar(scholar_id)fetch_scholar(scholar_id)
scholar_id |
A character vector of Google Scholar IDs. |
A tibble containing all publications for the specified Google Scholar IDs.
## Not run: fetch_scholar("vamErfkAAAAJ") ## End(Not run)## Not run: fetch_scholar("vamErfkAAAAJ") ## End(Not run)
Given a list of journal titles, this function will return their ranking from various lists. Data sets used are:
Monash Business School
Australian Business Deans' Council
ERA 2010
CORE
SCImago
This function is used in the Journal rankings shiny app.
journal_ranking( title, source = c("monash", "abdc", "era2010", "core", "scimago"), fuzzy = TRUE, only_best = length(title) > 1, ... )journal_ranking( title, source = c("monash", "abdc", "era2010", "core", "scimago"), fuzzy = TRUE, only_best = length(title) > 1, ... )
title |
A character vector containing (partial) journal names. |
source |
A character string indicating which ranking data base to use. Default |
fuzzy |
Should fuzzy matching be used. If |
only_best |
If |
... |
Other arguments are passed to |
A data frame containing the journal title, rank and source for each matching journal.
Rob J Hyndman
# Return ranking for individual journals or conferences journal_ranking("Annals of Statistics") journal_ranking("Annals of Statistics", "abdc") journal_ranking("International Conference on Machine Learning") journal_ranking("International Conference on Machine Learning", "core") journal_ranking("R Journal", "scimago", only_best = TRUE)# Return ranking for individual journals or conferences journal_ranking("Annals of Statistics") journal_ranking("Annals of Statistics", "abdc") journal_ranking("International Conference on Machine Learning") journal_ranking("International Conference on Machine Learning", "core") journal_ranking("R Journal", "scimago", only_best = TRUE)
This is a dataset that contains the list of quality journal rankings from the Monash Business School. In most cases, it follows ABDC with A* equal to Group 1 and A equal to Group 2. The "Group 1+" category contains a small set of the highest rank journals. The data set is updated from time to time when journals not on the ABDC list are classified. See https://www.intranet.monash/business/research-services/research-standards for the latest information.
data(monash)data(monash)
An object of class tbl_df (inherits from tbl, data.frame) with 4489 rows and 2 columns.
A data frame with 4489 observations on the following 2 variables:
title: Title of the journal
rank: In order of best to lowest rank: Group 1+, Group 1, Group 2
Monash Business School
library(dplyr) library(stringr) monash |> filter(str_detect(title, "Statist")) |> arrange(rank)library(dplyr) library(stringr) monash |> filter(str_detect(title, "Statist")) |> arrange(rank)
This data was taken from https://www.scimagojr.com/journalrank.php
data(scimago)data(scimago)
An object of class tbl_df (inherits from tbl, data.frame) with 32193 rows and 29 columns.
A tibble with 32193 rows and 29 variables:
Year of SCImago Journal Ranking calculation.
Rank of the journal among all journals.
Database ID of the journal.
Jounal's title.
Type: "journal", "book series", "trade journal", or "conference and proceedings"
ISSN journal identifier.
SCImago Journal Rank indicator. It expresses the average number of weighted citations received in the selected year by the documents published in the selected journal in the three previous years, –i.e. weighted citations received in year X to documents published in the journal in years X-1, X-2 and X-3. See detailed description of SJR (PDF).
Highest quartile of the journal among all categories it belongs to.
Hirsch index of the journal. The h index expresses the journal's number of articles (h) that have received at least h citations. It quantifies both journal scientific productivity and scientific impact and it is also applicable to scientists, countries, etc. (see H-index wikipedia definition).
Total number of published documents within a specific year. All types of documents are considered, including citable and non citable documents.
Published documents in the three previous years (selected year documents are excluded), i.e.when the year X is selected, then X-1, X-2 and X-3 published documents are retrieved. All types of documents are considered, including citable and non citable documents.
Total number of citations received by a journal to the documents published within a specific year.
Number of citations received in the seleted year by a journal to the documents published in the three previous years, –i.e. citations received in year X to documents published in years X-1, X-2 and X-3. All types of documents are considered.
Number of citable documents published by a journal in the three previous years (selected year documents are excluded). Exclusively articles, reviews and conference papers are considered..
Average citations per document in a 2 year period. It is computed considering the number of citations received by a journal in the current year to the documents published in the two previous years, –i.e. citations received in year X to documents published in years X-1 and X-2. Comparable to Journal Impact Factor.
Average number of references per document in the selected year..
Country of the publisher.
Publisher of the journal.
Categories the jounal belongs to.
Category in which the journal ranks highest by percentile.
Rank of journal in highest_category.
Highest percentile of journal in any category.
Rob Hyndman
SCImago Journal & Country Rank. Retrieved from https://www.scimagojr.com/journalrank.php
This dataset contains the mappings between researcher names and their respective ORCID and Google Scholar IDs. It is useful for identifying and linking academic profiles across different platforms.
staff_idsstaff_ids
A data frame with 4 variables:
character. The first name of the individual.
character. The last name of the individual.
character. The ORCID identifier.
character. The Google Scholar user ID.
https://www.monash.edu/business/ebs/our-people/staff-directory