Updating data_sources.json#5938
Updating data_sources.json#5938rohitkumarbhagat wants to merge 2 commits intodatacommonsorg:masterfrom
Conversation
Summary of ChangesHello @rohitkumarbhagat, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the data offerings by expanding the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
The pull request introduces a significant number of new data sources across various categories, primarily in 'Biomedical', 'Demographics', 'Economy', and 'Education', along with a URL update in the 'Health' section. The additions are well-structured and enhance the dataset's coverage. However, I've identified a few minor issues related to URL consistency (HTTP vs. HTTPS), a typo, and a field containing multiple URLs where a single string is expected. Addressing these will improve the data's quality and consistency.
| "dataSources": [ | ||
| { | ||
| "label": "The Sequence Ontology", | ||
| "url": "http://www.sequenceontology.org/", |
There was a problem hiding this comment.
| { | ||
| "label": "DISEASES: Textmining", | ||
| "url": "https://diseases.jensenlab.org/Search", | ||
| "description": "DISEASES is a weekly updated web resource that integrates evidence on disease-gene associations from automatic text mining, manually curated literature, cancer mutation data, and genome-wide association studies. This dataset further unifies the evidence by assigning confidence scores that facilitate comparison of the different types and sources of evidence. All files start with the following four columns: gene identifier, gene name, disease identifier, and disease name. The textmining files contain the z-score, the confidence score, and a URL to a viewer of the underlying abstracts. For further details please refer to the following Open Access articles about the database: [DISEASES: Text mining and data integration of disease-gene associations](https://www.sciencedirect.com/science/article/pii/S1046202314003831) and [DISEASES 2.0: a weekly updated database of disease–gene associations from text mining and data integration](https://academic.oup.com/database/article/doi/10.1093/database/baac019/6554833?login=false). The data is made available under the [CC-BY](https://creativecommons.org/licenses/by/4.0/) license.\n\nData made available under the [CC0 1.0 Universal (CC0 1.0) Public Domain Dedication](https://creativecommons.org/publicdomain/zero/1.0/)." |
| "dataSources": [ | ||
| { | ||
| "label": "NCBI Assembly", | ||
| "url": "https://www.ncbi.nlm.nih.gov/assembly", |
There was a problem hiding this comment.
The URL for the NCBI Assembly database is missing the https:// protocol. Please update it to https://www.ncbi.nlm.nih.gov/assembly/ for consistency and security.
"description": "\"The [NCBI Assembly database](https://www.ncbi.nlm.nih.gov/assembly/) provides stable accessioning and data tracking for genome assembly data. The model underlying the database can accommodate a range of assembly structures, including sets of unordered contig or scaffold sequences, bacterial genomes consisting of a single complete chromosome, or complex structures such as a human genome with modeled allelic variation. The database provides an assembly accession and version to unambiguously identify the set of sequences that make up a particular version of an assembly, and tracks changes to updated genome assemblies. The Assembly database reports metadata such as assembly names, simple statistical reports of the assembly (number of contigs and scaffolds, contiguity metrics such as contig N50, total sequence length and total gap length) as well as the assembly update history. The Assembly database also tracks the relationship between an assembly submitted to the International Nucleotide Sequence Database Consortium (INSDC) and the assembly represented in the NCBI RefSeq project\" (Kitts et al. 2016). In this import we include the metadata for all genome assemblies documented in `assembly_summary_genbank.txt` and `assembly_summary_refseq.txt`. Assemblies are stored in GenomeAssembly nodes whose information is integrated from both the GenBank and RefSeq datasets."| { | ||
| "label": "Disease Ontology", | ||
| "url": "https://disease-ontology.org/", | ||
| "description": "The Disease Ontology was developed as a project by the Institute of Genome Sciences at the University of Maryland School of Medicine. It \"is a community driven, open source ontology that is designed to link disparate datasets through disease concepts\". It provides a \"standardized ontology for human disease with the purpose of providing the biomedical community with consistent, reusable and sustainable descriptions of human disease terms, phenotype characteristics and related medical vocabulary disease concepts\".\n\nThe data is made available under [C0 1.0 Universal (CC0 1.0) Public Domain Dedication](https://disease-ontology.org/resources/citing-do)." |
There was a problem hiding this comment.
There is a typo in the license reference: [C0 1.0 Universal should be [CC0 1.0 Universal.
"description": "The Disease Ontology was developed as a project by the Institute of Genome Sciences at the University of Maryland School of Medicine. It \"is a community driven, open source ontology that is designed to link disparate datasets through disease concepts\". It provides a \"standardized ontology for human disease with the purpose of providing the biomedical community with consistent, reusable and sustainable descriptions of human disease terms, phenotype characteristics and related medical vocabulary disease concepts\".\n\nThe data is made available under [CC0 1.0 Universal (CC0 1.0) Public Domain Dedication](https://disease-ontology.org/resources/citing-do)."| }, | ||
| { | ||
| "label": "NITI India Population Projection", | ||
| "url": "https://ndap.niti.gov.in/dataset/7208, https://ndap.niti.gov.in/dataset/7209", |
There was a problem hiding this comment.
The url field is a string and typically expects a single URL. This entry contains two URLs separated by a comma and space. Please consolidate this to a single primary URL, or if both are essential, consider listing the secondary URL within the description field for clarity and consistency with the schema.
"url": "https://ndap.niti.gov.in/dataset/7208",
No description provided.