robust circular location parameter of von mises distribution
ehab a. mahmood1, habshah midi2,3, abdul ghapor hussin4 and jayanthi arasan3
1department of banking and finance / university of babylon, iraq
2faculty of science / university putra malaysia, malaysia
3institute for mathematical research/ university putra malaysia, malaysia
4 faculty of defence science and technology / national defence university of malaysia
abstract
mean direction is a good measure to estimate circular location parameter in univariate circular data. however, it is bias and cause misleading when the circular data has some outliers especially with increasing ratio of outliers. trimmed mean is one of robust method to estimate location parameter. therefore, in this paper, it is focused to find a robust formula for trimming the circular data. this proposed method is compared with mean direction, median direction and m estimator for clean and contaminated data. results of simulation study and real data prove that trimmed mean direction is very successful and the best among them.
keywords
trimmed mean direction, mean direction, median direction, circular distance, m estimator.
introduction
it is well known that circular location parameter is an important measure to give an idea about the majority of the data set. mean direction is successful to estimate the location parameter but it does not provide consistent as well as efficient estimator in the presence of outliers. statistical data may has some observations that are inconsistent with the others, and cause some problems of statistical analysis, these observations are defined as outliers. the existence of outliers in statistical data causes misleading of statistical results and parameters estimation (barnett & lewis, 1978). therefore, researchers interest to identify outliers or to use robust methods.
the normal distribution for circular data is the von mises distribution. in linear data, the normal distribution has some important properties. similarly, in circular data, the von mises distribution has some important properties. therefore, in this paper, it is considered the von mises distribution. it is a symmetric unimodal distribution with circular mean µ and concentration parameter k. the probability density function of the von mises distribution is given by (mardia, 2000):
g(?,?,k)=1/(2?i_0 (k) ) e^(k cos?(?-?))
where 0??<2?, k?0 and i0 denotes the modified bessel function of the first kind and order 0, which can be defined as ( i_0 (k)=1/2? ?_0^2???e^(k cos?(?)) d?? ). the mean direction of the circular observations is calculated by (jammalamadaka & sengupta 2001):
? ?={?(?tan?^(-1) ( s/c ) if c>0,s?0 @?/2 if c=0, s>0@?tan?^(-1) ( s/c )+? if c<0 @?tan?^(-1) ( s/c )+2? if c?0, s<0 @undefined if c=0, s=0 @ )? 1
where s=?_(i=1)^n??{sin?(?_i ) } ?, c=?_(i=1)^n???{cos???(?_i )}? ?. median direction is defined as any angle ? such that half of the data points lie in the arc ?, ? + ? and the majority of the data points are nearer to ? than to ? + ? (mardia and jupp 2000).
in the literature, some researchers proposed methods to detect outliers or proposed robust methods in univariate circular data. collett (1980) suggested four statistics, namely l, c, d and m, to detect an outlier in univariate circular data. lenth (1981) adapted an m estimator method to estimate circular location parameters. he and simpson (1992) recommended the use of the circular median as an estimate of the circular mean when the data do not follow the von mises distribution. jammalamadaka & sengupta (2001) proposed three methods to detect outliers in univariate circular data. first, they use the p-p plot as way of detecting outliers in circular data. second, used lmpi for the circular data that were obtained by mixing a wrapped stable distribution with a circular uniform distribution. third, they proposed using a likelihood ratio testing (lrt) approach to identify outliers in circular data. abuzaid (2012) proposed many methods to identify outliers. he proposed to consider cluster analysis as a procedure to detect outliers in univariate circular data. in addition, he used the c and d statistics as numerical statistics, as suggested by collett (1980), and the boxplot as a graphical method to identify outliers. laha and mahesh (2015) studied the robustness of the likelihood ratio, the circular mean and the circular trimmed mean test function to test hypotheses on the mean direction of two circular distributions: the von mises and the wrapped normal distributions. however, in their simulation study, they assumed that the circular data had a single outlier, and did not consider the problem when the circular data have many outliers. kato and eguchi (2016) suggested a procedure to estimate both the location and the concentration parameters simultaneously for the general case of the von mises–fisher distribution.
however, it is still important to propose a robust method to estimate circular location parameter when the circular data have outliers. therefore, in this paper, it is proposed to apply trimmed mean direction as a method to estimate circular location parameter by finding a formula to trim the circular data set.
materials and methods
m - estimator
lenth (1981) extended the m estimator to estimate a circular location parameter for circular data. let ?1, ?2, ….., ?n be a random sample following the von mises distribution with circular mean µ and concentration parameter k. then, the ? function is given by:
?_(i=1)^n???(t(?_i-? ? k))=minimum? 2
where t(?_i-? ? k) is a periodic function that in some sense standardizes the values of (?_i-? ?) according to the concentration parameter k. by differentiating eq. (2), we obtain:
-k?_(i=1)^n??w_i sin?(?_i-? ? )=0?
where:
w_i=w(t(?_i-? ? k))=?(t(?_i-? ? k))/t(?_i-? ? k) 3
the ? function is given by :
?_h (t)={?(t |t|?c@c sign(t) |t|>c)? 4
where c is a constant. the m estimator method is summarized as follows. first, set all wi=1. second, compute the mean resultant length r ?_w according to its weight, where: r ?_w=?(c ?_w^2+s ?_w^2)?^(1/2), c ?_w=?_(i=1)^n??w_i cos???_i ?/?_(i=1)^n?w_i ? , s ?_w=?_(i=1)^n??w_i sin???_i ?/?_(i=1)^n?w_i ? . then, estimate the concentration parameter k according to the following approximate formula:
k ?=a^(-1) (r ?_w )=[2(1-r ?_w )+(1-r ?_w )^2 (0.48794-0.82905r ?_w-1.3915r ?_w^2)/r ?_w ]^(-1) 5
this approximation has an absolute error of less than 0.005 for r ?_w?0.12. next, calculate new weights using eq. (3). the iteration of this procedure continues until get convergence . finally, the circular location parameter is estimated by : cos?? ? =c ?_w/r ?_w , sin?? ? =s ?_w/r ?_w
trimmed mean direction
trimmed mean is one of the robust methods that is used to estimate location parameter in linear data. it is calculated by eliminating a proportion of the largest and smallest values, where the proportion of eliminating ? ? [0, 0.5) (maronna et al. 2006). the main concept of circular data there is no maximum or minimum because 0 = 2?. so, statistical analysis that are used in linear data can not be applied for circular data because of the circular geometry theory. therefore, cannot trim the data set according to its largest and smallest values. therefore, in this section, it is tried to propose a robust formula for trimming. it is expected that outliers lie far away from the circular mean (mardia and jupp 2000). however, the circular median is more efficient than the circular mean when the circular data have outliers (ducharme and milasevic 1987). therefore, it is proposed to consider the circular distance between observations and the circular median as a measure to trim the circular data. mahmood et al. (2016) proposed rcdu statistic to detect outliers in the univariate circular data. they suggested to calculate the circular distance dist(i) as follows:
if (0 ? med ? ?) :
| ?i – med | if | ?i – med | ? ?
dist(i) = 6
2*? - ?i + med if | ?i – med | > ?
if (? < med ? 2?) :
| ?i – med | if | ?i – med | ? ?
dist(i) = 7
2*? – med + ?i if | ?i – med | > ?
the cut-off point is given as cut rcdu = max (dist). hence, it is proposed to trim any circular data point has distance greater than the cutoff point.
results and discussion
simulation
in this section, simulation studies have been conducted to examine the performance of trimmed mean direction according to the proposed method for trimming. the data are simulated from von mises distribution with mean direction 0 and five values of the concentration parameter (k = 2, 4, 6, 8 and 10) for two sample sizes n = 20 and 60 and. three different cases are examined, clean data (without outliers), 5% and 10% of contamination. the circular observation ? has been contaminated according to the following formula :
?c = ? + ?? mod(2?) 8
where ? is the degree of contamination, ( 0 ? ? ? 1).
two measures are considered to test the performance. bias (estimated bias=|? ?-?|), circular mean deviation (cmd=?-1/n ?_(i=1)^n?|(?-|?_i-? ? |)| ). the results are compared with mean direction, median direction and m estimator. the simulation is repeated 10000 times. the results for n=20 and 60 are showed in figures 1 and 2 respectively. the results showed that the circular location parameters are unbiased except m estimator and there is no difference among cmd for the clean data. however, the mean direction, median direction and m estimator are biased of the estimation and cmd with presence of outliers and it is increasing with ratio of contamination. in contrast, the trimmed mean direction is unbiased of estimation and cmd with k>3 for all cases. hence, the proposed formula for trimming is very successful to estimate circular location parameter for clean data and with presence of outliers in univariate circular data.
fig. 1 bias and cmd of clean and contaminated with (5% and 10%) n=20
fig. 2 bias and cmd of clean and contaminated with (5% and 10%) n=60
example
data of direction of 14 frogs were collected from the mud flats of an abandoned stream meander near indianola, mississippi. this circular data has been tested by collett (1980). he detected that the observation numbered 14 is an outlier. the cut-off point 5% with concentration parameter k ?=2.18 is approximately 2.86. the estimation of mean direction with outlier, median direction, m estimator, mean direction without outlier and trimmed mean direction and cmd of them are given in table 1. it is noticed that a significant difference between the results because of the effect of outlier. besides, the results of trim as the same as the results after deleting outlier.
table 1 comparing measures of circular location parameters and cmd (frog data)
estimation cmd
mean direction original data
(with outlier) 2.55 0.653
modified data
(without outlier) 2.53 0.471
median direction original data
(with outlier) 2.45 0.651
modified data
(without outlier) 2.37 0.479
m estimator original data
(with outlier) 2.54 0.651
modified data
(without outlier) 2.51 0.470
trimmed mean direction original data
(with outlier) 2.53 0.471
modified data
(without outlier) 2.53 0.471
conclusion
it is common in practice that circular data is contaminated with outliers. in the presence of outliers, the mean direction does not provide consistent and efficient estimator for the circular location parameter. to overcome this problem, some methods of detection outliers and robust methods have been proposed. in this paper, it has suggested a robust formula for trimming to calculate trimmed mean direction. to examine the performance of the trimmed mean direction, it has conducted simulation studies for clean and contaminated data as well as it has applied for real data set. it has compared the results of mean direction, median direction and m estimator with the proposed method. it is found that the trimmed mean direction is very successful for all cases and it gives the best results according to the measures that paper depends on them.
references
abuzaid, a.h. 2012. analysis of mother’s day celebration via circular statistics. the philippine statistician 61, 39–52.
barnett, v. and lewis, t. 1978. outliers in statistical data. new york and london: wiley.
collett, d. 1980. outliers in circular data. journal of applied statistics 29, 50–57.
ducharme gr, milasevic p. 1987. some asymptotic properties of the circular median. commun stat – theor m 16(3): 659-664.
he x, simpson dg 1992. robust direction estimation. ann stat 20(1): 351-369.
jammalamadaka, s.r. and sengupta, a. 2001. topics in circular statistics. singapore: world scientific publishing.
kato s., eguchi s. 2016. robust estimation of location and concentration parameters for the von mises–fisher distribution. stat pap 57: 205-234.
laha, ak and mahesh kc 2015. robustness of tests for directional data. statistics 49(3): 522-536.
lenth rv 1981. robust measures of location for directional data. technometrics 23(1): 77-81.
mahmood, ehab a., rana, sohel, hussin, abdul ghappor and midi, habshah. 2016. detection of outliers in univariate circular data using robust circular distance. it is submitted at journal of modern applied statistical methods.
mardia, k.v. & jupp, p.e. 2000. directional statistics. chichester: john wiley & sons ltd.
maronna ra, martin and rd, yohai vj 2006. robust statistics, theory and methods. john wiley & sons ltd, chichester.