Phylogenetic methods are continually advancing, both in terms of the speed of analyses as well as in the statistical models that are applied.
The mega-phylogeny approach (Smith et al. 2008) and related methods describe an approach of mining data from online resources to assemble large supermatrices that contain few traditional markers (e.g., COI, cytB, ITS) sampled across hundreds or thousands of taxa.
Early phylogenomic studies typically compared few species (often model organisms) for which full genome data was available. The primary difficulties with using full genome data is in identifying proper phylogenetic markers. Many genomic regions are difficult to align, and it is difficult to identify homology between genes. For this reason, many studies with full genomes restrict phylogenetic analyses to the use of transcriptomes.
In [ ]:
In large-scale megaphylogenies -- data mined matrices of few traditional markers across thousands of taxa -- missing data often ranges up to >90%.
In large-scale sub-genomic data sets, like RAD-seq, the proportion of missing data often ranges between 10-90%.
Importantly, the first type of problem is more sensitive to the problem of terraces in phylogenetic tree space Sanderson et al. 2012, where many taxa in the phylogeny share no information, whereas in the latter there is typically still significant phylogenetic information for all taxa.
Maximum likelihood is a method of optimizing the likelihood of the data given a defined model $P(D|M)$, and is done by estimating the parameters of the model. Different models have different numbers of parameters.