-
Notifications
You must be signed in to change notification settings - Fork 72
Add Bartlett's and Levene's tests for homogeneity of variances #216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Shimuuar
merged 4 commits into
haskell:master
from
PraneyaKumar:feature/137-homoscedasticity-tests
Jun 29, 2025
Merged
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
574827c
Add Bartlett's and Levene's tests for homogeneity of variances
PraneyaKumar 92565bc
Using vector instead of lists
PraneyaKumar fc1b002
Changed return type for Levene Test and added simple unit tests for B…
PraneyaKumar 1f0b35a
Merge branch 'haskell:master' into feature/137-homoscedasticity-tests
PraneyaKumar File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,92 @@ | ||
| {-# LANGUAGE FlexibleContexts #-} | ||
| {-| | ||
| Module : Statistics.Test.Bartlett | ||
| Description : Bartlett's test for homogeneity of variances. | ||
| Copyright : (c) Praneya Kumar, 2025 | ||
| License : BSD-3-Clause | ||
|
|
||
| Implements Bartlett's test to check if multiple groups have equal variances. | ||
| Assesses equality of variances assuming normal distribution, sensitive to non-normality. | ||
| -} | ||
| module Statistics.Test.Bartlett ( | ||
| bartlettTest, | ||
| module Statistics.Distribution.ChiSquared | ||
| ) where | ||
|
|
||
| import qualified Data.Vector.Generic as G | ||
| import qualified Data.Vector.Unboxed as U | ||
| import Statistics.Distribution (cumulative) | ||
| import Statistics.Distribution.ChiSquared (chiSquared, ChiSquared(..)) | ||
| import Statistics.Sample (varianceUnbiased) | ||
| import Statistics.Types (mkPValue) | ||
| import Statistics.Test.Types (Test(..)) | ||
|
|
||
| -- | Perform Bartlett's test for equal variances. | ||
| -- The input is a list of vectors, where each vector represents a group of observations. | ||
| -- Returns Either an error message or a Test ChiSquared containing the test statistic and p-value. | ||
| bartlettTest :: [U.Vector Double] -> Either String (Test ChiSquared) | ||
| bartlettTest groups | ||
| | length groups < 2 = Left "At least two groups are required for Bartlett's test." | ||
| | any ((< 2) . G.length) groups = Left "Each group must have at least two observations." | ||
| | any (<= 0) groupVariances = Left "All groups must have positive variance." | ||
| | otherwise = Right $ Test | ||
| { testSignificance = pValue | ||
| , testStatistics = tStatistic | ||
| , testDistribution = chiDist | ||
| } | ||
| where | ||
| -- Number of groups | ||
| k = length groups | ||
|
|
||
| -- Sample sizes for each group | ||
| ni = map G.length groups | ||
| ni' = map fromIntegral ni | ||
|
|
||
| -- Total number of observations across all groups | ||
| nTotal = sum ni | ||
|
|
||
| -- Variance for each group (unbiased estimate) | ||
| groupVariances = map varianceUnbiased groups | ||
|
|
||
| -- Pooled variance calculation | ||
| sumWeightedVars = sum [ (n - 1) * v | (n, v) <- zip ni' groupVariances ] | ||
| pooledVariance = sumWeightedVars / fromIntegral (nTotal - k) | ||
|
|
||
| -- Numerator of Bartlett's statistic | ||
| numerator = | ||
| fromIntegral (nTotal - k) * log pooledVariance - | ||
| sum [ (n - 1) * log v | (n, v) <- zip ni' groupVariances ] | ||
|
|
||
| -- Denominator correction term | ||
| sumReciprocals = sum [1 / (n - 1) | n <- ni'] | ||
| denomCorrection = | ||
| 1 + (sumReciprocals - 1 / fromIntegral (nTotal - k)) / (3 * (fromIntegral k - 1)) | ||
|
|
||
| -- Test statistic T | ||
| tStatistic = max 0 $ numerator / denomCorrection | ||
|
|
||
| -- Degrees of freedom and chi-squared distribution | ||
| df = k - 1 | ||
| chiDist = chiSquared df | ||
| pValue = mkPValue $ 1 - cumulative chiDist tStatistic | ||
|
|
||
|
|
||
| -- Example usage: | ||
| -- import qualified Data.Vector.Unboxed as U | ||
| -- import Statistics.Test.Bartlett | ||
|
|
||
| -- main :: IO () | ||
| -- main = do | ||
| -- let a = U.fromList [8.88, 9.12, 9.04, 8.98, 9.00, 9.08, 9.01, 8.85, 9.06, 8.99] | ||
| -- b = U.fromList [8.88, 8.95, 9.29, 9.44, 9.15, 9.58, 8.36, 9.18, 8.67, 9.05] | ||
| -- c = U.fromList [8.95, 9.12, 8.95, 8.85, 9.03, 8.84, 9.07, 8.98, 8.86, 8.98] | ||
|
|
||
| -- case bartlettTest [a,b,c] of | ||
| -- Left err -> putStrLn $ "Error: " ++ err | ||
| -- Right test -> do | ||
| -- putStrLn $ "Bartlett's Test Statistic: " ++ show (testStatistics test) | ||
| -- putStrLn $ "P-Value: " ++ show (testSignificance test) | ||
|
|
||
| -- Sample Output | ||
| -- Bartlett's Test Statistic: ~32 | ||
| -- P-Value: ~1e-5 | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,152 @@ | ||
| {-# LANGUAGE FlexibleContexts #-} | ||
|
|
||
| {-| | ||
| Module : Statistics.Test.Levene | ||
| Description : Levene's test for homogeneity of variances. | ||
| Copyright : (c) Praneya Kumar, 2025 | ||
| License : BSD-3-Clause | ||
|
|
||
| Implements Levene's test to check if multiple groups have equal variances. | ||
| Assesses equality of variances, robust to non-normality, and versatile with mean or median centering. | ||
| -} | ||
| module Statistics.Test.Levene ( | ||
| Center(..), | ||
| levenesTest | ||
| ) where | ||
|
|
||
| import qualified Data.Vector as V | ||
| import qualified Data.Vector.Unboxed as U | ||
| import qualified Data.Vector.Algorithms.Merge as VA | ||
| import Statistics.Distribution (cumulative) | ||
| import Statistics.Distribution.FDistribution (fDistribution, FDistribution) | ||
| import Statistics.Types (mkPValue) | ||
| import Statistics.Test.Types (Test(..)) | ||
| import qualified Statistics.Sample as Sample | ||
| import Control.Exception (assert) | ||
|
|
||
| -- | Center calculation method | ||
| data Center = | ||
| Mean -- ^ Use arithmetic mean | ||
| | Median -- ^ Use median | ||
| | Trimmed Double -- ^ Trimmed mean with given proportion to cut from each end | ||
| deriving (Eq, Show) | ||
|
|
||
| -- | Trim data from both ends with error handling and performance optimization | ||
| trimboth :: (Ord a, Fractional a) | ||
| => V.Vector a | ||
| -> Double | ||
| -> Either String (V.Vector a) | ||
| trimboth vec prop | ||
| | prop < 0 || prop > 1 = Left "Proportion must be between 0 and 1" | ||
| | V.null vec = Right vec | ||
| | otherwise = do | ||
| let sorted = V.modify VA.sort vec | ||
| n = V.length sorted | ||
| lowerCut = floor (prop * fromIntegral n) | ||
| upperCut = n - lowerCut | ||
| assert (upperCut >= lowerCut) $ | ||
| Right $ V.slice lowerCut (upperCut - lowerCut) sorted | ||
|
|
||
| -- | Calculate median using pre-sorted vector | ||
| vectorMedian :: (Fractional a, Ord a) | ||
| => V.Vector a | ||
| -> Either String a | ||
| vectorMedian vec | ||
| | V.null vec = Left "Empty vector in median calculation" | ||
| | otherwise = Right $ | ||
| if odd len | ||
| then sorted V.! mid | ||
| else (sorted V.! (mid - 1) + sorted V.! mid) / 2 | ||
| where | ||
| sorted = V.modify VA.sort vec | ||
| len = V.length sorted | ||
| mid = len `div` 2 | ||
|
|
||
| -- | Main Levene's test function with full error handling | ||
| levenesTest :: Double -- ^ Significance level (alpha) | ||
| -> Center -- ^ Centering method | ||
| -> [V.Vector Double] -- ^ Input samples | ||
| -> Either String (Test FDistribution) | ||
| levenesTest alpha center samples | ||
| | alpha < 0 || alpha > 1 = Left "Significance level must be between 0 and 1" | ||
| | length samples < 2 = Left "At least two samples required" | ||
| | otherwise = do | ||
| processed <- mapM processSample samples | ||
| let (deviationsList, niList) = unzip processed | ||
| deviations = V.fromList deviationsList -- V.Vector (U.Vector Double) | ||
| ni = V.fromList niList -- V.Vector Int | ||
| zbari = V.map Sample.mean deviations -- V.Vector Double | ||
| k = V.length deviations | ||
| n = V.sum ni | ||
| zbar = V.sum (V.zipWith (\z n' -> z * fromIntegral n') zbari ni) / fromIntegral n | ||
|
|
||
| -- Numerator: Sum over (ni * (zbari - zbar)^2) | ||
| numerator = V.sum $ V.zipWith (\n z -> fromIntegral n * (z - zbar) ** 2) ni zbari | ||
|
|
||
| -- Denominator: Sum over sum((dev_ij - zbari)^2) | ||
| denominator = V.sum $ V.zipWith (\dev z -> U.sum (U.map (\x -> (x - z) ** 2) dev)) deviations zbari | ||
|
|
||
| -- Handle division by zero and invalid values | ||
| if denominator <= 0 || isNaN denominator || isInfinite denominator | ||
| then Left "Invalid denominator in W-statistic calculation" | ||
| else do | ||
| let wStat = (fromIntegral (n - k) / fromIntegral (k - 1)) * (numerator / denominator) | ||
| df1 = k - 1 | ||
| df2 = n - k | ||
| fDist = fDistribution df1 df2 | ||
| pVal = mkPValue $ 1 - cumulative fDist wStat | ||
|
|
||
| -- Validate distribution parameters | ||
| if df1 < 1 || df2 < 1 | ||
| then Left "Invalid degrees of freedom" | ||
| else Right $ Test | ||
| { testStatistics = wStat | ||
| , testSignificance = pVal | ||
| , testDistribution = fDist | ||
| } | ||
| where | ||
| -- Process samples with error handling and optimized sorting | ||
| processSample vec = case center of | ||
| Mean -> do | ||
| let dev = V.map (abs . subtract (Sample.mean vec)) vec | ||
| return (U.convert dev, V.length vec) | ||
|
|
||
| Median -> do | ||
| sortedVec <- Right $ V.modify VA.sort vec | ||
| m <- vectorMedian sortedVec | ||
| let dev = V.map (abs . subtract m) sortedVec | ||
| return (U.convert dev, V.length vec) | ||
|
|
||
| Trimmed p -> do | ||
| trimmed_for_center_calculation <- trimboth vec p | ||
| let robust_center = Sample.mean trimmed_for_center_calculation | ||
| -- Calculate deviations for ALL ORIGINAL points from the robust_center | ||
| deviations_from_robust_center = V.map (abs . subtract robust_center) vec -- Use 'vec' (original data) | ||
| -- Return deviations and the ORIGINAL sample size | ||
| return (U.convert deviations_from_robust_center, V.length vec) -- Use 'V.length vec' | ||
|
|
||
|
|
||
| -- Example usage: | ||
| -- import qualified Data.Vector as V | ||
| -- import LevenesTest (Center(..), levenesTest) | ||
| -- import Statistics.Test.Types (testStatistics, testSignificance) | ||
| -- import Statistics.Types (pValue) | ||
|
|
||
| -- main :: IO () | ||
| -- main = do | ||
| -- let a = V.fromList [8.88, 9.12, 9.04, 8.98, 9.00, 9.08, 9.01, 8.85, 9.06, 8.99] | ||
| -- b = V.fromList [8.88, 8.95, 9.29, 9.44, 9.15, 9.58, 8.36, 9.18, 8.67, 9.05] | ||
| -- c = V.fromList [8.95, 9.12, 8.95, 8.85, 9.03, 8.84, 9.07, 8.98, 8.86, 8.98] | ||
|
|
||
| -- case levenesTest (Trimmed 0.05) [a, b, c] of | ||
| -- Left err -> putStrLn $ "Error: " ++ err | ||
| -- Right test -> do | ||
| -- putStrLn $ "Levene's W Statistic: " ++ show (testStatistics test) | ||
| -- putStrLn $ "P-Value: " ++ show (pValue (testSignificance test)) | ||
| -- putStrLn $ "Reject null hypothesis at α=0.05: " ++ show (testSignificance test < 0.05) | ||
|
|
||
|
|
||
| -- Sample Output | ||
| -- Levene's W Statistic: 7.905 | ||
| -- P-Value: 0.002 | ||
| -- Reject null hypothesis at α=0.05: True |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
complCumulativeshould calculates 1-CDF without loss of precisionAFAIR χ² doesn't implement it but it may at some point