The MEDLINE loader process in BioMedQuery saves the MEDLINE baseline files to a
MySQL database and saves the raw (xml.gz) and parsed (csv) files to a medline
directory that will be created in the provided output_dir.
WARNING: There are 900+ medline files each with approximately 30,000 articles. This process will take hours to run for the full baseline load.
The baseline files can be found here.
In [ ]:
using BioMedQuery.DBUtils
using BioMedQuery.PubMed
using BioMedQuery.Processes
BioMedQuery has utility functions to create the database and tables. Note: creating the tables using this function will drop any tables that already exist in the target database.
In [ ]:
const conn = BioMedQuery.DBUtils.init_mysql_database("127.0.0.1","root","","test_db", true);
BioMedQuery.PubMed.create_tables!(conn);
In [ ]:
@time BioMedQuery.Processes.load_medline!(conn, pwd(), test=true)
Review the output of this run in MySQL to make sure that it ran as expected.
Additionally, the sample raw and parsed file should be in the new medline
directory in the current directory.
To run a full load, use the same code as above, but do not pass the test variable.
It is also possible to break up the load by passing which files to start and stop at -
simply pass start_file=n and end_file=p.
After loading, it is recommended you add indexes to the tables, the add_mysql_keys!
function can be used to add a standard set of indexes.
In [ ]:
add_mysql_keys!(conn)
This notebook was generated using Literate.jl.