The Wechsler Intelligence Scale for Children-4th Edition (WISC-IV) is one of the most used batteries for the measurement of intelligence worldwide, and it is often part of the assessment of children with learning disorder, such as dyslexia and dyscalculia. When conducting confirmatory factor analysis on WISC-IV data, most studies employ a higher-order model with four factors corresponding to the main indexes (verbal comprehension [VCI], perceptual reasoning [PRI], working memory [WMI], and processing speed [PSI]), and a superordinate g factor. An increasing number of studies, however, have preferred the bifactor model, in which the subtests regress directly onto all latent variables. The bifactor model allows to easily estimate the reliability of the overall g factor (ωh) along with the unique reliabilities of the index scores (ωs), i.e., the reliability of the indexes after controlling for g. Previous evidence indicated that the reliability of the specific factors is poor, thus limiting their interpretability, and suggested that overall intelligence should rather be considered. In the present study we conducted a multigroup confirmatory factor analysis to compare children with learning disorder (LD) vs. typically developing (TD) children, using a bifactor model. WISC-IV intellectual profiles of 1617 children diagnosed with LD throughout Italy were used, and compared with the Italian normative data defined on 2200 children as the TD group. Results showed acceptable fit of the multigroup bifactor model, χ2(54) = 244.60, RMSEA = .04, SRMR = .03, CFI = .98, NNFI = .96, suggesting configural invariance. Following previous suggestions, some of the loadings of the specific factors (i.e., at least those of the PRI) had to be constrained to be equal across subtests (within each group) in order to achieve model convergence. Metric invariance, however, was not supported, χ2(18) = 76.81, p < .001. A closer inspection revealed partial metric invariance, with loadings of PRI and loadings of PSI on their respective subtests that can be fixed across groups; all other loadings, however, are significantly different between groups. Importantly, the loadings of g on subtests were generally weaker – and the loadings of the specific factors on their respective subtests were generally stronger – in the LD as compared to the TD group. This was reflected by the reliability indexes; in the TD group, ωh = .72; as regards the specific indexes, ωs [VCI] = .32, ωs [PRI] = .06, ωs [WMI] = .23, ωs [PSI] = .49; on the contrary, in the LD group, ωh = .60; as regards the specific indexes, ωs [VCI] = .51, ωs [PRI] = .02, ωs [WMI] = .41, ωs [PSI] = .55. In other words, with the notable exception of the PRI index, whose subtests seems to be strongly saturated by g, in the TD group the g factor seems to explain most of the subtests variability, leaving little interpretation for the specific factors; on the contrary, in the LD group the g factor is relatively weaker, leaving more space to interpretation for the specific factors. In conclusion, although the reliability of the overall factor is quite strong in both groups, it seems that in the TD group the interpretation of the full scale intelligence quotient (FSIQ) should be emphasized, whereas in children with LD the consideration of the specific intelligence profile may be relevant, supporting previous suggestions that it may provide clinically useful information.