Title: | Web Scraping and Bibliometric Analysis of MDPI Journals |
---|---|
Description: | Provides comprehensive tools to scrape and analyze data from the MDPI journals. It allows users to extract metrics such as submission-to-acceptance times, article types, and whether articles are part of special issues. The package can also visualize this information through plots. Additionally, 'MDPIexploreR' offers tools to explore patterns of self-citations within articles and provides insights into guest-edited special issues. |
Authors: | Pablo Gómez Barreiro [aut, cre] |
Maintainer: | Pablo Gómez Barreiro <[email protected]> |
License: | CC BY 4.0 |
Version: | 0.2.1 |
Built: | 2024-12-28 06:11:27 UTC |
Source: | https://github.com/pgomba/mdpi_explorer |
Article data extracted from MDPI journal Agriculture
agriculture
agriculture
agriculture
A data frame with 7,160 rows and 7 columns:
Article URL
Article tyope classifier
Date article was submitted to journal
Date article was accepted for publication
Article turnaround time, or Accepted-Received
Year the article was accepted
Type of issue where article is published
...
This function retrieves the URLs for all published articles from a specified journal. Users can provide the journal's code 'see MDPI_journals.rda', and the function will return the URLs of all articles available within the journal.
article_find(journal)
article_find(journal)
journal |
A string containing the name of a MDPI journal |
A vector (class: character
) containing a list of articles URLs from target journal
agr_articles<-article_find("agriculture")
agr_articles<-article_find("agriculture")
This function extracts key editorial information from one or more paper URLs. Specifically, it retrieves the submission, revision, and acceptance dates, as well as the article type. The function also calculates the turnaround time (the duration from submission to acceptance) and identifies whether the paper is part of a special issue.
article_info(vector, sleep = 2, sample_size, show_progress = TRUE)
article_info(vector, sleep = 2, sample_size, show_progress = TRUE)
vector |
A vector with urls. |
sleep |
Number of seconds between scraping iterations. 2 sec. by default |
sample_size |
A number. How many papers do you want to explore from the main vector. Leave blank for all |
show_progress |
Logical. If |
A data frame (class: data.frame
) with the following columns:
The URL of the article from which the information is retrieved.
The classification of the article (e.g., editorial, review).
The date the article was received by the publisher.
The date the article was confirmed as revised by the publisher.
The date the article was accepted for publication.
The turnaround time, calculated as the number of days between the received and accepted dates.
The year in which the article was accepted for publication.
Indicates whether the article is part of a special issue.
Indictes if article peer review is publicly available or not
url<-c("https://www.mdpi.com/2073-4336/8/4/45","https://www.mdpi.com/2073-4336/11/3/39") info<-article_info(url, 1.5)
url<-c("https://www.mdpi.com/2073-4336/8/4/45","https://www.mdpi.com/2073-4336/11/3/39") info<-article_info(url, 1.5)
Takes a vector of names to return the names without abbreviated middle names, academic titles and hyphens.
clean_names(name_vector)
clean_names(name_vector)
name_vector |
A string with names separated by commas |
A vector (class: character
) containing names
clean_names(c("Matthias M. Bauer","Thomas Garca Morrison","Wolfgang Nitsche", "Elias Biobaca L." ))
clean_names(c("Matthias M. Bauer","Thomas Garca Morrison","Wolfgang Nitsche", "Elias Biobaca L." ))
Deprecated: This function is deprecated and will be removed in a future version of the package.
Use special_issue_info()
instead. It extracts data from special issues, including guest editors' paper counts
(excluding editorials), time between last submission and issue closure, and whether guest editors served
as academic editors for any published papers.
guest_editor_info(journal_urls, sample_size, sleep = 2, show_progress = TRUE)
guest_editor_info(journal_urls, sample_size, sleep = 2, show_progress = TRUE)
journal_urls |
A list of MDPI special issues URLs |
sample_size |
A number. How many special issues do you want to explore from the main vector. Leave blank for all |
sleep |
Number of seconds between scraping iterations. 2 sec. by default |
show_progress |
Logical. If |
A data frame (class: data.frame
) with the following columns:
The URL of the special issue from which the information is retrieved.
Number of special issues contained in the special issue, not considering editorial type articles
Number of articles in the special issue with guest editorial pressence
Proportion of articles in the special issue in which a guest editor is present
Time at which the special issue was or will be closed
Time at which last article present in the special issue was submitted
Numeric vector showing number of articles in which each individual guest editor is present
Number of articles in the special issue where the academic editor is a guest editor too
Day differential between special issue closure and latest article submission
ge_issue<-"https://www.mdpi.com/journal/plants/special_issues/5F5L5569XN" ge_info<-guest_editor_info(ge_issue)
ge_issue<-"https://www.mdpi.com/journal/plants/special_issues/5F5L5569XN" ge_info<-guest_editor_info(ge_issue)
Article data extracted from MDPI journal Horticulturae
horticulturae
horticulturae
horticulturae
A data frame with 7,160 rows and 7 columns:
Article URL
Article tyope classifier
Date article was submitted to journal
Date article was accepted for publication
Article turnaround time, or Accepted-Received
Year the article was accepted
Type of issue where article is published
...
Extracts names and codes of current MDPI journals.
MDPI_journals()
MDPI_journals()
A data frame (class: data.frame
) with the following columns:
Full name of the MDPI journal
Journal code used for ID and web scraping purposes
journal_table<-MDPI_journals()
journal_table<-MDPI_journals()
Plots information obtained from article_info(). For analysis purposes, Editorial and Correction type articles are ignored.
plot_articles(articles_info, journal, type)
plot_articles(articles_info, journal, type)
articles_info |
Output dataframe from function articles_info. |
journal |
A string with the name of the journal for graph title purposes |
type |
select "summary","issues", "tat", "review" or "type" depending on desired graph |
A plot (class: ggplot
) depicting the desired information obtained from article_info
plot_articles(agriculture,"Agriculture",type="summary")
plot_articles(agriculture,"Agriculture",type="summary")
Calculates number of authors selfcitations against all references
selfcite_check(article_url, verbose = TRUE)
selfcite_check(article_url, verbose = TRUE)
article_url |
A valid MDPI article url |
verbose |
Logical. If |
A string (class: data.frame
)with the following columns:
The number of articles in references authored by any of the main article authors
Total number of references in the article
paper_url<-"https://www.mdpi.com/2223-7747/13/19/2785" sc<-selfcite_check(paper_url)
paper_url<-"https://www.mdpi.com/2223-7747/13/19/2785" sc<-selfcite_check(paper_url)
Retrieves all special issues of a specified journal with URLs. Filters results by issue status (open, closed, or all) and optional year range.
special_issue_find(journal, type = "closed", years = NULL, verbose = TRUE)
special_issue_find(journal, type = "closed", years = NULL, verbose = TRUE)
journal |
MDPI journal code |
type |
"closed", "open" or "all" special issues. "closed" by default. |
years |
A vector containing special issues closure dates to limit the search to certain years |
verbose |
Logical. If |
A vector.
special_issue_find("covid")
special_issue_find("covid")
#' Extracts data from special issues, including guest editors' paper counts excluding editorials, time between last submission and issue closure, and whether guest editors served as academic editors for any published papers.
special_issue_info(journal_urls, sample_size, sleep = 2, show_progress = TRUE)
special_issue_info(journal_urls, sample_size, sleep = 2, show_progress = TRUE)
journal_urls |
A list of MDPI special issues URLs |
sample_size |
A number. How many special issues do you want to explore from the main vector. Leave blank for all |
sleep |
Number of seconds between scraping iterations. 2 sec. by default |
show_progress |
Logical. If |
A data frame (class: data.frame
) with the following columns:
The URL of the special issue from which the information is retrieved.
Number of special issues contained in the special issue, not considering editorial type articles
Number of articles in the special issue with guest editorial pressence
Proportion of articles in the special issue in which a guest editor is present
Time at which the special issue was or will be closed
Time at which last article present in the special issue was submitted
Numeric vector showing number of articles in which each individual guest editor is present
Number of articles in the special issue where the academic editor is a guest editor too
Day differential between special issue closure and latest article submission
ge_issue<-"https://www.mdpi.com/journal/plants/special_issues/5F5L5569XN" speciali_info<-special_issue_info(ge_issue)
ge_issue<-"https://www.mdpi.com/journal/plants/special_issues/5F5L5569XN" speciali_info<-special_issue_info(ge_issue)
Retrieves all topics of a specified journal with URLs. Filters results by issue status (open, closed, or all) and optional year range.
topic_find(journal, type = "closed", years = NULL, verbose = TRUE)
topic_find(journal, type = "closed", years = NULL, verbose = TRUE)
journal |
MDPI journal code |
type |
"closed", "open" or "all" topics. "closed" by default. |
years |
A vector containing topics closure dates to limit the search to certain years |
verbose |
Logical. If |
A vector.
topic_find("covid")
topic_find("covid")
#' Extracts data from topics, including guest editors' paper counts excluding editorials, time between last submission and issue closure, and whether guest editors served as academic editors for any published papers. Includes names of journals participating in topic
topic_info(journal_urls, sample_size, sleep = 2, show_progress = TRUE)
topic_info(journal_urls, sample_size, sleep = 2, show_progress = TRUE)
journal_urls |
A list of MDPI topics URLs |
sample_size |
A number. How many topics do you want to explore from the main vector. Leave blank for all |
sleep |
Number of seconds between scraping iterations. 2 sec. by default |
show_progress |
Logical. If |
A data frame (class: data.frame
) with the following columns:
The URL of the topics contained in the topic, not considering editorial type articles
Number of articles in the topic with guest editorial pressence
Proportion of articles in the topic in which a guest editor is present
Time at which the topic was or will be closed
Time at which last article present in the topic was submitted
Numeric vector showing number of articles in which each individual guest editor is present
Number of articles in the topic where the academic editor is a guest editor too
Day differential between topic closure and latest article submission
List of journals participating in the topic
ge_issue<-"https://www.mdpi.com/topics/mechanisms_resistance_plant_diseases_volume" ge_info<-topic_info(ge_issue)
ge_issue<-"https://www.mdpi.com/topics/mechanisms_resistance_plant_diseases_volume" ge_info<-topic_info(ge_issue)