21Egghe, L.: Mathematical study of hindex sequences.
In: Information processing and management. 45(2009) no.2, S.288297.
Abstract: This paper studies mathematical properties of hindex sequences as developed by Liang [Liang, L. (2006). hIndex sequence and hindex matrix: Constructions and applications. Scientometrics, 69(1), 153159]. For practical reasons, Liming studies such sequences where the time goes backwards while it is more logical to use the time going forward (real career periods). Both type of hindex sequences are studied here and their interrelations are revealed. We show cases where these sequences are convex, linear and concave. We also show that, when one of the sequences is convex then the other one is concave, showing that the reversetime sequence, in general, cannot be used to derive similar properties of the (difficult to obtain) forward time sequence. We show that both sequences are the same if and only if the author produces the same number of papers per year. If the author produces an increasing number of papers per year, then Liang's hsequences are above the "normal" ones. All these results are also valid for g and Rsequences. The results are confirmed by the h, g and Rsequences (forward and reverse time) of the author.
Themenfeld: Informetrie
Objekt: hindex

22Egghe, L. ; Rousseau, R.: ¬An hindex weighted by citation impact.
In: Information processing and management. 44(2008) no.2, S.770780.
Abstract: An htype index is proposed which depends on the obtained citations of articles belonging to the hcore. This weighted hindex, denoted as hw, is presented in a continuous setting and in a discrete one. It is shown that in a continuous setting the new index enjoys many good properties. In the discrete setting some small deviations from the ideal may occur.
Objekt: hindex

23Egghe, L. ; Ravichandra Rao, I.K.: Study of different hindices for groups of authors.
In: Journal of the American Society for Information Science and Technology. 59(2008) no.8, S.12761281.
Abstract: In this article, for any group of authors, we define three different hindices. First, there is the successive hindex h2 based on the ranked list of authors and their hindices h1 as defined by Schubert (2007). Next, there is the hindex hP based on the ranked list of authors and their number of publications. Finally, there is the hindex hC based on the ranked list of authors and their number of citations. We present formulae for these three indices in Lotkaian informetrics from which it also follows that h2 < hp < hc. We give a concrete example of a group of 167 authors on the topic optical flow estimation. Besides these three hindices, we also calculate the twobytwo Spearman rank correlation coefficient and prove that these rankings are significantly related.
Themenfeld: Informetrie
Objekt: hindex

24Egghe, L.: ¬The influence of transformations on the hindex and the gindex.
In: Journal of the American Society for Information Science and Technology. 59(2008) no.8, S.13041312.
Abstract: In a previous article, we introduced a general transformation on sources and one on items in an arbitrary information production process (IPP). In this article, we investigate the influence of these transformations on the hindex and on the gindex. General formulae that describe this influence are presented. These are applied to the case that the sizefrequency function is Lotkaian (i.e., is a decreasing power function). We further show that the hindex of the transformed IPP belongs to the interval bounded by the two transformations of the hindex of the original IPP, and we also show that this property is not true for the gindex.
Themenfeld: Informetrie
Objekt: hindex ; gindex

25Egghe, L. ; Liang, L. ; Rousseau, R.: Fundamental properties of rhythm sequences.
In: Journal of the American Society for Information Science and Technology. 59(2008) no.9, S.14691478.
Abstract: Fundamental mathematical properties of rhythm sequences are studied. In particular, a set of three axioms for valid rhythm indicators is proposed, and it is shown that the Rindicator satisfies only two out of three but that the Rindicator satisfies all three. This fills a critical, logical gap in the study of these indicator sequences. Matrices leading to a constant Rsequence are called baseline matrices. They are characterized as matrices with constant wyear diachronous impact factors. The relation with classical impact factors is clarified. Using regression analysis matrices with a rhythm sequence that is on average equal to 1 (smaller than 1, larger than 1) are characterized.
Themenfeld: Informetrie

26Egghe, L.: Mathematical theory of the h and gindex in case of fractional counting of authorship.
In: Journal of the American Society for Information Science and Technology. 59(2008) no.10, S.16081616.
Abstract: This article studies the hindex (Hirsch index) and the gindex of authors, in case one counts authorship of the cited articles in a fractional way. There are two ways to do this: One counts the citations to these papers in a fractional way or one counts the ranks of the papers in a fractional way as credit for an author. In both cases, we define the fractional h and gindexes, and we present inequalities (both upper and lower bounds) between these fractional h and gindexes and their corresponding unweighted values (also involving, of course, the coauthorship distribution). Wherever applicable, examples and counterexamples are provided. In a concrete example (the publication citation list of the present author), we make explicit calculations of these fractional h and gindexes and show that they are not very different from the unweighted ones.
Themenfeld: Informetrie
Objekt: hindex ; gindex

27Egghe, L. ; Ravichandra Rao, I.K.: ¬The influence of the broadness of a query of a topic on its hindex : models and examples of the hindex of ngrams.
In: Journal of the American Society for Information Science and Technology. 59(2008) no.10, S.16881693.
(Brief communication)
Abstract: The article studies the influence of the query formulation of a topic on its hindex. In order to generate pure random sets of documents, we used Ngrams (N variable) to measure this influence: strings of zeros, truncated at the end. The used databases are WoS and Scopus. The formula h=T**1/alpha, proved in Egghe and Rousseau (2006) where T is the number of retrieved documents and is Lotka's exponent, is confirmed being a concavely increasing function of T. We also give a formula for the relation between h and N the length of the Ngram: h=D10**(N/alpha) where D is a constant, a convexly decreasing function, which is found in our experiments. Nonlinear regression on h=T**1/alpha gives an estimation of , which can then be used to estimate the hindex of the entire database (Web of Science [WoS] and Scopus): h=S**1/alpha, , where S is the total number of documents in the database.
Themenfeld: Informetrie
Objekt: hindex

28Egghe, L.: ¬The measures precision, recall, fallout and miss as a function of the number of retrieved documents and their mutual interrelations.
In: Information processing and management. 44(2008) no.2, S.856876.
Abstract: In this paper, for the first time, we present global curves for the measures precision, recall, fallout and miss in function of the number of retrieved documents. Different curves apply for different retrieved systems, for which we give exact definitions in terms of a retrieval density function: perverse retrieval, perfect retrieval, random retrieval, normal retrieval, hereby extending results of Buckland and Gey and of Egghe in the following sense: mathematically more advanced methods yield a better insight into these curves, more types of retrieval are considered and, very importantly, the theory is developed for the "complete" set of measures: precision, recall, fallout and miss. Next we study the interrelationships between precision, recall, fallout and miss in these different types of retrieval, hereby again extending results of Buckland and Gey (incl. a correction) and of Egghe. In the case of normal retrieval we prove that precision in function of recall and recall in function of miss is a concavely decreasing relationship while recall in function of fallout is a concavely increasing relationship. We also show, by producing examples, that the relationships between fallout and precision, miss and precision and miss and fallout are not always convex or concave.

29Egghe, L.: ¬A model for the sizefrequency function of coauthor pairs.
In: Journal of the American Society for Information Science and Technology. 59(2008) no.13, S.21332137.
Abstract: Lotka's law was formulated to describe the number of authors with a certain number of publications. Empirical results (Morris & Goldstein, 2007) indicate that Lotka's law is also valid if one counts the number of publications of coauthor pairs. This article gives a simple model proving this to be true, with the same Lotka exponent, if the number of coauthored papers is proportional to the number of papers of the individual coauthors. Under the assumption that this number of coauthored papers is more than proportional to the number of papers of the individual authors (to be explained in the article), we can prove that the sizefrequency function of coauthor pairs is Lotkaian with an exponent that is higher than that of the Lotka function of individual authors, a fact that is confirmed in experimental results.
Themenfeld: Informetrie
Objekt: LotkaGesetz

30Egghe, L. ; Rousseau, R. ; Rousseau, S.: TOPcurves.
In: Journal of the American Society for Information Science and Technology. 58(2007) no.6, S.777785.
Abstract: Several characteristics of classical Lorenz curves make them unsuitable for the study of a group of topperformers. TOPcurves, defined as a kind of mirror image of TIPcurves used in poverty studies, are shown to possess the properties necessary for adequate empirical ranking of various data arrays, based on the properties of the highest performers (i.e., the core). TOPcurves and essential TOPcurves, also introduced in this article, simultaneously represent the incidence, intensity, and inequality among the top. It is shown that TOPdominance partial order, introduced in this article, is stronger than Lorenz dominance order. In this way, this article contributes to the study of cores, a central issue in applied informetrics.
Themenfeld: Informetrie

31Egghe, L.: Dynamic hindex : the Hirsch index in function of time.
In: Journal of the American Society for Information Science and Technology. 58(2007) no.3, S.452454.
Abstract: When there are a group of articles and the present time is fixed we can determine the unique number h being the number of articles that received h or more citations while the other articles received a number of citations which is not larger than h. In this article, the time dependence of the hindex is determined. This is important to describe the expected career evolution of a scientist's work or of a journal's production in a fixed year.
Themenfeld: Informetrie

32Egghe, L.: Untangling Herdan's law and Heaps' law : mathematical and informetric arguments.
In: Journal of the American Society for Information Science and Technology. 58(2007) no.5, S.702709.
Abstract: Herdan's law in linguistics and Heaps' law in information retrieval are different formulations of the same phenomenon. Stated briefly and in linguistic terms they state that vocabularies' sizes are concave increasing power laws of texts' sizes. This study investigates these laws from a purely mathematical and informetric point of view. A general informetric argument shows that the problem of proving these laws is, in fact, illposed. Using the more general terminology of sources and items, the author shows by presenting exact formulas from Lotkaian informetrics that the total number T of sources is not only a function of the total number A of items, but is also a function of several parameters (e.g., the parameters occurring in Lotka's law). Consequently, it is shown that a fixed T(or A) value can lead to different possible A (respectively, T) values. Limiting the T(A)variability to increasing samples (e.g., in a text as done in linguistics) the author then shows, in a purely mathematical way, that for large sample sizes T~ A**phi, where phi is a constant, phi < 1 but close to 1, hence roughly, Heaps' or Herdan's law can be proved without using any linguistic or informetric argument. The author also shows that for smaller samples, a is not a constant but essentially decreases as confirmed by practical examples. Finally, an exact informetric argument on random sampling in the items shows that, in most cases, T= T(A) is a concavely increasing function, in accordance with practical examples.
Themenfeld: Informetrie
Objekt: HerdanGesetz ; HeapsGesetz

33Egghe, L.: Existence theorem of the quadruple (P, R, F, M) : precision, recall, fallout and miss.
In: Information processing and management. 43(2007) no.1, S.265272.
Abstract: In an earlier paper [Egghe, L. (2004). A universal method of information retrieval evaluation: the "missing" link M and the universal IR surface. Information Processing and Management, 40, 2130] we showed that, given an IR system, and if P denotes precision, R recall, F fallout and M miss (reintroduced in the paper mentioned above), we have the following relationship between P, R, F and M: P/(1P)*(1R)/R*F/(1F)*(1M)/M = 1. In this paper we prove the (more difficult) converse: given any four rational numbers in the interval ]0, 1[ satisfying the above equation, then there exists an IR system such that these four numbers (in any order) are the precision, recall, fallout and miss of this IR system. As a consequence we show that any three rational numbers in ]0, 1[ represent any three measures taken from precision, recall, fallout and miss of a certain IR system. We also show that this result is also true for two numbers instead of three.

34Egghe, L.: Expansion of the field of informetrics : the second special issue.
In: Information processing and management. 42(2006) no.6, S.14051407.
Anmerkung: Einführung in ein "Special Issue on Informetrics"
Themenfeld: Informetrie

35Egghe, L.: Empirical and combinatorial study of country occurrences in multiauthored papers.
In: Information  Wissenschaft und Praxis. 57(2006) H.8, S.427432.
Abstract: Papers written by several authors can be classified according to the countries of the author affiliations. The empirical part of this paper consists of two datasets. One dataset consists of 1,035 papers retrieved via the search "pedagog*" in the years 2004 and 2005 (up to October) in Academic Search Elite which is a case where phi(m) = the number of papers with m =1, 2,3 ... authors is decreasing, hence most of the papers have a low number of authors. Here we find that #, m = the number of times a country occurs j times in a mauthored paper, j =1, ..., m1 is decreasing and that # m, m is much higher than all the other #j, m values. The other dataset consists of 3,271 papers retrieved via the search "enzyme" in the year 2005 (up to October) in the same database which is a case of a nondecreasing phi(m): most papers have 3 or 4 authors and we even find many papers with a much higher number of authors. In this case we show again that # m, m is much higher than the other #j, m values but that #j, m is not decreasing anymore in j =1, ..., m1, although #1, m is (apart from # m, m) the largest number amongst the #j,m. The combinatorial part gives a proof of the fact that #j,m decreases for j = 1, m1, supposing that all cases are equally possible. This shows that the first dataset is more conform with this model than the second dataset. Explanations for these findings are given. From the data we also find the (we think: new) distribution of number of papers with n =1, 2,3,... countries (i.e. where there are n different countries involved amongst the m (a n) authors of a paper): a fast decreasing function e.g. as a power law with a very large Lotka exponent.
Themenfeld: Informetrie

36Egghe, L.: Properties of the noverlap vector and noverlap similarity theory.
In: Journal of the American Society for Information Science and Technology. 57(2006) no.9, S.11651177.
Abstract: In the first part of this article the author defines the noverlap vector whose coordinates consist of the fraction of the objects (e.g., books, Ngrams, etc.) that belong to 1, 2, , n sets (more generally: families) (e.g., libraries, databases, etc.). With the aid of the Lorenz concentration theory, a theory of noverlap similarity is conceived together with corresponding measures, such as the generalized Jaccard index (generalizing the wellknown Jaccard index in case n 5 2). Next, the distributional form of the noverlap vector is determined assuming certain distributions of the object's and of the set (family) sizes. In this section the decreasing power law and decreasing exponential distribution is explained for the noverlap vector. Both item (token) noverlap and source (type) noverlap are studied. The noverlap properties of objects indexed by a hierarchical system (e.g., books indexed by numbers from a UDC or Dewey system or by Ngrams) are presented in the final section. The author shows how the results given in the previous section can be applied as well as how the Lorenz order of the noverlap vector is respected by an increase or a decrease of the level of refinement in the hierarchical system (e.g., the value N in Ngrams).

37Egghe, L.: Expansion of the field of informetrics : origins and consequences.
In: Information processing and management. 41(2005) no.6, S.13111316.
Anmerkung: Einführung in ein "Special Issue on Infometrics"
Themenfeld: Informetrie

38Egghe, L.: Relations between the continuous and the discrete Lotka power function.
In: Journal of the American Society for Information Science and Technology. 56(2005) no.7, S.664668.
Abstract: The discrete Lotka power function describes the number of sources (e.g., authors) with n = 1, 2, 3, ... items (e.g., publications). As in econometrics, informetrics theory requires functions of a continuous variable j, replacing the discrete variable n. Now j represents item densities instead of number of items. The continuous Lotka power function describes the density of sources with item density j. The discrete Lotka function one obtains from data, obtained empirically; the continuous Lotka function is the one needed when one wants to apply Lotkaian informetrics, i.e., to determine properties that can be derived from the (continuous) model. It is, hence, important to know the relations between the two models. We show that the exponents of the discrete Lotka function (if not too high, i.e., within limits encountered in practice) and of the continuous Lotka function are approximately the same. This is important to know in applying theoretical results (from the continuous model), derived from practical data.
Themenfeld: Informetrie
Objekt: LotkaGesetz

39Egghe, L.: ¬The power of power laws and an interpretation of Lotkaian informetric systems as selfsimilar fractals.
In: Journal of the American Society for Information Science and Technology. 56(2005) no.7, S.669675.
Abstract: Power laws as defined in 1926 by A. Lotka are increasing in importance because they have been found valid in varied social networks including the Internet. In this article some unique properties of power laws are proven. They are shown to characterize functions with the scalefree property (also called seifsimilarity property) as weIl as functions with the product property. Power laws have other desirable properties that are not shared by exponential laws, as we indicate in this paper. Specifically, Naranan (1970) proves the validity of Lotka's law based on the exponential growth of articles in journals and of the number of journals. His argument is reproduced here and a discretetime argument is also given, yielding the same law as that of Lotka. This argument makes it possible to interpret the information production process as a seifsimilar fractal and show the relation between Lotka's exponent and the (seifsimilar) fractal dimension of the system. Lotkaian informetric systems are seifsimilar fractals, a fact revealed by Mandelbrot (1977) in relation to nature, but is also true for random texts, which exemplify a very special type of informetric system.
Themenfeld: Informetrie
Objekt: LotkaGesetz

40Egghe, L.: Zipfian and Lotkaian continuous concentration theory.
In: Journal of the American Society for Information Science and Technology. 56(2005) no.9, S.935945.
Abstract: In this article concentration (i.e., inequality) aspects of the functions of Zipf and of Lotka are studied. Since both functions are power laws (i.e., they are mathematically the same) it suffices to develop one concentration theory for power laws and apply it twice for the different interpretations of the laws of Zipf and Lotka. After a brief repetition of the functional relationships between Zipf's law and Lotka's law, we prove that Price's law of concentration is equivalent with Zipf's law. A major part of this article is devoted to the development of continuous concentration theory, based an Lorenz curves. The Lorenz curve for power functions is calculated and, based an this, some important concentration measures such as the ones of Gini, Theil, and the variation coefficient. Using Lorenz curves, it is shown that the concentration of a power law increases with its exponent and this result is interpreted in terms of the functions of Zipf and Lotka.
Themenfeld: Informetrie
Objekt: ZipfGesetz ; LotkaGesetz