Extracting bibliographic information

Copyright 2016, Pedro Belin Castellucci,

This work is licensed under a Creative Commons Attribution 4.0 International License.</i>

BibTeX is a reference management software for formatting lists of references. The BibTeX tool is typically used together with the LaTeX document preparation system (...). The name is a portmanteau of the word bibliography and the name of the TeX typesetting software.

BibTeX makes it easy to cite sources in a consistent manner, by separating bibliographic information from the presentation of this information, similarly to the separation of content and presentation/style supported by LaTeX itself.</i> Wikipedia

One of the file extensions associated with BibTeX is .bib. In this Notebook we will parse a .bib file to create a profile of publications along a time horizon. This will motivate us to learn a bit about Strings, Files and Dictionaries in Julia.

Strings

Strings in Julia are defined by double quotes.


In [25]:
exampleString = "Hello"


Out[25]:
"Hello"

We can check its type.


In [26]:
typeof(exampleString)


Out[26]:
String

It is also possible to iterate over strings in the same way we iterate over arrays. In fact, strings can be thought as arrays of chars.


In [27]:
for c in exampleString
    println(c, " is a ", typeof(c))
end


H is a Char
e is a Char
l is a Char
l is a Char
o is a Char

Concatenation and methods contains, strip, split and parse

We will not dig deep into strings in Julia, but will take a look at the following methods: contains, strip, split and parse. Later, we will use the contains method to check if a string contains a certain substring. Before that, let us show how to concatenate two strings.


In [28]:
"Hello, " * "Julia!"


Out[28]:
"Hello, Julia!"

Now, an example of the contains method.


In [29]:
println(contains("Hello", "el"))
println(contains("Hello", "EL"))


true
false

The strip method is used to remove left and right characters of a string. By default it removes "blank" spaces (e. g. '\n', '\t', ' '), but we can specify the characters to be remove. The header of these two options can be checked with methods.


In [30]:
methods(strip)


Out[30]:
2 methods for generic function strip:

Let us see an example of the version without additional arguments.


In [31]:
stringExample = "  string to be stripped     \n"
strip(stringExample)


Out[31]:
"string to be stripped"

For the second use, we can specify the characters which will be stripped.


In [32]:
stringExample = "pppHello, World!..."
strip(stringExample, ['p', '.'])


Out[32]:
"Hello, World!"

The split method is used to separate "words" of a "phrase".


In [33]:
split("We have a phrase here.")


Out[33]:
5-element Array{SubString{String},1}:
 "We"    
 "have"  
 "a"     
 "phrase"
 "here." 

But we can also specify a separator of the "words". Let as see an example of that:


In [34]:
split("Now, let us separate the phrases. We will use the dot as separator. That's all we need", '.')


Out[34]:
3-element Array{SubString{String},1}:
 "Now, let us separate the phrases" 
 " We will use the dot as separator"
 " That's all we need"              

Finally, the parse method is used to convert a string to different objects. We can check all its uses with methods.


In [35]:
methods(parse)


Out[35]:
14 methods for generic function parse:

Let us see a simple example.


In [36]:
stringExample = "123456"
println(typeof(stringExample))

intValue = parse(Int64, stringExample)
println(typeof(intValue))


String
Int64

Exercise: Using the functions we have just seen, write algorithms to perform the following activities:

  1. Remove leading and trailing zeros of an integer number, provided as a string.
  2. Separate user and domain from an e-mail provided as user@domain.com
  3. Build the e-mail address (user@domain.com) provided the user and domain.

In [41]:
intExample = "000123000"
println(strip(intExample, '0'))

userDomain = "user@domain.com"
user, domainCom = split(userDomain, '@')
domain = domainCom[1:end-4]
println(user)
println(domain)

println(user * "@" * domain * ".com")


123
user
domain
user@domain.com

We will use what we learnt so far to help us parse a .bib. But first we need to learn a bit about reading files.

Reading files

The safest way in Julia to manipulate files is to use an open ... do ... end block.


In [ ]:
open("example.bib") do fd
    # Do something with the file...
end

Here, we will read the file example.bib line by line.


In [42]:
open("example.bib") do fd
    for line in eachline(fd)
        print(line)
    end
end


@article{Aqlan2014,
author = {Aqlan, Faisal and Lam, Sarah S. and Ramakrishnan, Sreekanth},
issn = {09255273},
journal = {International Journal of Production Economics},
keywords = {Configure-to-order production,Dynamic batch size,Optimization,Process layout,Product layout,Server manufacturing,Simulation},
month = feb,
pages = {51--61},
publisher = {Elsevier},
title = {{An integrated simulation-optimization study for consolidating production lines in a configure-to-order production environment}},
volume = {148},
year = {2014}
}
@article{Battaia2013,
author = {Batta\"{\i}a, Olga and Dolgui, Alexandre},
journal = {International Journal of Production Economics},
number = {2},
pages = {259--277},
title = {{A taxonomy of line balancing problems and their solution approaches}},
volume = {142},
year = {2013}
}
@article{Bernard2011,
author = {Bernard, Carole and Boyle, Phelim},
journal = {The European Journal of Finance},
number = {3},
pages = {169--196},
title = {{Monte Carlo methods for pricing discrete Parisian options}},
volume = {17},
year = {2011}
}
@article{Brunet2012,
author = {Brunet, Robert and Guill\'{e}n-Gos\'{a}lbez, Gonzalo and P\'{e}rez-Correa, J. Ricardo and Caballero, Jos\'{e} Antonio and Jim\'{e}nez, Laureano},
issn = {00981354},
journal = {Computers \& Chemical Engineering},
keywords = {biotechnological processes,hybrid simulation-optimization,mixed-integer dynamic optimization},
month = feb,
pages = {125--135},
publisher = {Elsevier Ltd},
title = {{Hybrid simulation-optimization based approach for the optimal design of single-product biotechnological processes}},
volume = {37},
year = {2012}
}
@article{Figueira2014,
author = {Figueira, Gon\c{c}alo and Almada-Lobo, Bernardo},
issn = {1569190X},
journal = {Simulation Modelling Practice and Theory},
keywords = {Classification,Hybrid methods,Review,Simulation–optimization,Taxonomy,optimization,simulation},
month = aug,
pages = {118--134},
publisher = {Elsevier B.V.},
title = {{Hybrid simulation-optimization methods: A taxonomy and discussion}},
volume = {46},
year = {2014}
}
@article{Lee2012,
author = {Lee, Choonsik and Kim, Kwang Pyo and Long, Daniel J. and Bolch, Wesley E.},
journal = {Medical physics},
number = {4},
pages = {2129--2146},
title = {{Organ doses for reference pediatric and adolescent patients undergoing computed tomography estimated by Monte Carlo simulation}},
volume = {39},
year = {2012}
}
@article{Ludwig2011,
author = {Ludwig, M. and Huege, T.},
journal = {Astroparticle Physics},
number = {6},
pages = {438--446},
title = {{REAS3: Monte Carlo simulations of radio emission from cosmic ray air showers using an "end-point" formalistm}},
volume = {34},
year = {2011}
}
@inproceedings{Mahajan2004,
author = {Mahajan, Prasad S. and Ingalls, Ricki G.},
booktitle = {Proceedings of the 2004 Winter Simulation Conference},
pages = {663--671},
title = {{Evaluation of methods used to detect warm-up period in steady state simulation}},
year = {2004}
}
@article{Moreira2015,
author = {Moreira, Mayron C\'{e}sar O. and Miralles, Cristobal and Costa, Alysson M.},
journal = {Computers \& Operations Research},
pages = {64--73},
title = {{Model and heuristics for the Assembly Line Worker Integration and Balancing Problem}},
volume = {54},
year = {2015}
}
@incollection{Morettin2003,
author = {Morettin, Pedro A. and Bussab, Wilton de O.},
booktitle = {Estat\'{\i}stica b\'{a}sica},
pages = {323--355},
title = {{Testes de hip\'{o}teses}},
year = {2003}
}
@article{Papadopoulos1996,
author = {Papadopoulos, H T and Heavey, C},
journal = {European Journal of Operational Research},
keywords = {queueing},
number = {1},
pages = {1--27},
title = {{Queueing theory in manufacturing systems analysis and design: A classification of models for production and transfer lines}},
volume = {92},
year = {1996}
}
@article{Srivastav2011,
author = {Srivastav, R K and Srinivasan, K and Sudheer, K P},
isbn = {9791191174},
issn = {0022-1694},
journal = {Journal of Hydrology},
number = {3-4},
pages = {209--225},
publisher = {Elsevier B.V.},
title = {{Simulation optimization framework for multi-season hybrid stochastic models}},
volume = {404},
year = {2011}
}
@article{Tekiner2010,
author = {Tekiner, Hatice and Colt, David W. and Felder, Frank A.},
journal = {Electric Power Systems Research},
number = {12},
pages = {1394--1405},
title = {{Multi-period multi-objective electricity generation expansion planning problem with Monte Carlo simulation}},
volume = {80},
year = {2010}
}
@article{Tempelmeier2001,
author = {Tempelmeier, Horst and B\"{u}rger, Malte},
journal = {IIE Transactions},
number = {4},
pages = {293--302},
title = {{Performance evaluation of unbalanced flow lines with general distributed processing times, failures and imperfect production}},
volume = {33},
year = {2001}
}
@misc{Amorim2011,
abstract = {The pressure of reducing costs in supply chains forces companies to take an integrated view of their production and distribution processes. In perishable goods besides the cost issue there is an important freshness concern that shall not be disregarded. This challenging logistic problem involves several tightly interrelated production planning, scheduling, distribution and routing problems, considering more than one objective. Even when considered as independent from the other ones, each of the mentioned problems is of large combinatorial complexity. This research aims to allow companies to increase the value added by their perishable goods in supply chains through a combination of production and distribution planning on a multi-objective framework solved by hybrid solution approaches. Hence, along with realistic and integrated mathematical programming models, new state-of-the-art hybrid multi-objective optimization algorithms are to be developed.},
author = {Amorim, Pedro Sanches},
pages = {52},
title = {{Research Project Proposal : Multi-Objective Optimization for Integrated Production and Distribution of Perishable Goods}},
year = {2011}
}
@article{AraujoFelipeFB2012,
abstract = {In this article, we introduce two new variants of the Assembly Line Worker Assignment and Balancing Problem (ALWABP) that allow parallelization of and collaboration between heterogeneous workers. These new approaches suppose an additional level of complexity in the Line Design and Assignment process, but also higher flexibility; which may be particularly useful in practical situations where the aim is to progressively integrate slow or limited workers in conventional assembly lines. We present linear models and heuristic procedures for these two new problems. Computational results show the efficiency of the proposed approaches and the efficacy of the studied layouts in different situations.},
author = {{Ara\'{u}jo, Felipe F B} and Costa, Alysson M and Miralles, Crist\'{o}bal},
journal = {International Journal of Production Economics},
keywords = {Assembly line balancing,Disabled integration,Heterogeneous workers},
number = {1},
pages = {483--495},
title = {{Two extensions for the assembly line worker assignment and balancing problem : parallel stations and collaborative approach .}},
volume = {140},
year = {2012}
}
@article{Baren1962,
author = {Barten, K},
journal = {Journal of Industrial Engineering},
number = {6},
pages = {703--710},
title = {{A queuing simulator for determining optimum invetory levels in a sequential process}},
volume = {14},
year = {1962}
}
@article{Boysen2007,
abstract = {Assembly lines are special flow-line production systems which are of great importance in the industrial production of high quantity standardized commodities. Recently, assembly lines even gained importance in low volume production of customized products (mass-customization). Due to high capital requirements when installing or redesigning a line, its con- figuration planning is of great relevance for practitioners. Accordingly, this attracted attention of many researchers, who tried to support real-world configuration planning by suited optimization models (assembly line balancing problems). In spite of the enormous academic effort in assembly line balancing, there remains a considerable gap between requirements of real configuration problems and the status of research. To ease communication between researchers and practitioners, we provide a classification scheme of assembly line balancing. This is a valuable step in identifying remaining research chal- lenges which might contribute to closing the gap.},
author = {Boysen, N and Fliedner, M and Scholl, A},
issn = {03772217},
journal = {European Journal of Operational Research},
keywords = {assembly line balancing,classification,configuration of assembly lines},
month = dec,
number = {2},
pages = {674--693},
title = {{A classification of assembly line balancing problems}},
volume = {183},
year = {2007}
}
@article{Boysen2008,
abstract = {This paper discusses a two stage graph-algorithm, which was designed to solve line balancing problems including prac- tice relevant constraints (GALBP), such as parallel work stations and tasks, cost synergies, processing alternatives, zoning restrictions, stochastic processing times or U-shaped assembly lines. Unlike former procedures, the presented approach can be easily modified to incorporate all of the named extensions. It is not only possible to select and solve single classes of constraints, but rather any combination of them with just slight modifications.},
author = {Boysen, Nils and Fliedner, Malte},
issn = {03772217},
journal = {European Journal of Operational Research},
keywords = {General assembly line balancing (GALBP),Production,Shortest-path algorithm},
month = jan,
number = {1},
pages = {39--56},
title = {{A versatile algorithm for assembly line balancing}},
volume = {184},
year = {2008}
}
@article{Boysen2008a,
abstract = {Assembly lines are flow-line production systems which are of great importance in the industrial production of high quantity standardized commodities and more recently even gained importance in low volume production of customized products. Due to high capital requirements when installing or redesigning a line, configuration planning is of great relevance for practitioners. Accordingly, this attracted the attention of researchers, who tried to support practical configuration planning by suited optimization models. In spite of the great amount of extensions of basic assembly line balancing (ALB) there remains a gap between requirements of real configuration problems and the status of research. This gap might result from research papers focusing on just a single or only a few practical extensions at a time. Real-world assembly systems require a lot of these extensions to be considered simultaneously. This paper structures the vast field of ALB according to characteristic practical settings and highlights relevant model extensions which are required to reflect real-world problems. By doing so, open research challenges are identified and the practitioner is provided with hints on how to single out suited balancing procedures for his type of assembly system.},
author = {Boysen, Nils and Fliedner, Malte and Scholl, Armin},
issn = {09255273},
journal = {International Journal of Production Economics},
keywords = {assembly line balancing,classification,configuration of assembly lines},
month = feb,
number = {2},
pages = {509--528},
title = {{Assembly line balancing: Which model to use when?}},
volume = {111},
year = {2008}
}
@inproceedings{Chaves2009,
address = {Undine, Italy},
author = {Chaves, Antonio Augusto and Lorena, Luiz Antonio Nogueira and Miralles, Cristobal},
booktitle = {6th International Workshop, HM 2009},
pages = {1--14},
publisher = {Springer Berlin Heidelberg},
title = {{Hybrid Metaheuristic for the Assembly Line Worker Assignment and Balancing Problem}},
year = {2009}
}
@book{Checkland1981,
author = {Checkland, Peter},
publisher = {John Wiley \& Sons, Ltd},
title = {{Systems Thinking, Systems Practice}},
year = {1981}
}
@inproceedings{Cortez2011,
abstract = {Assembly lines are flow oriented production systems where parts of a product are assembled in different workstations. Depending on the context, the planning and operation of these lines give rise to different combinatorial optimization problems. In this study, two variants are of particular interest: the mixed-model assembly line sequencing problem, where different versions of a product must be sequenced in the line; and the assembly line worker assignment and balancing problem, where workers in the line present different characteristics. In this article, we study these two variants in an integrated fashion. The problem is defined and a linear mathematical model is introduced. Moreover, a hybrid heuristic approach is developed and tested on small-scale instances. The computational experiments show that the proposed method is both fast and acurate.},
address = {Ubatuba, Brasil},
author = {Cortez, P\^{a}mela Michele C and Costa, Alysson M},
booktitle = {Anais do XLIII SBPO},
keywords = {assembly lines,disabled workers,multi-models},
pages = {2046--2055},
title = {{A mathematical model and a hybrid heuristic for sequencing mixed-model assembly lines with disabled workers}},
year = {2011}
}
@article{Das2010,
abstract = {A computer simulation model was used to evaluate a bowl versus inverted bowl assembly line arrangement for normal and exponential distributions and variances equal to 1 and 16. The model was developed on the basis of a realistic case problem and applied to a six- station assembly line. The results show that the inverted bowl is superior to the bowl arrangement for a normal distribution in terms of the total elapsed time evaluation criterion; however, with an exponential distribution, the bowl was found better than the inverted bowl for the same criterion. On the basis of the average percentage of working time and the average time in the system evaluation criteria, the bowl was found superior to the inverted bowl for a normal distribution. Similar results were obtained for an exponential distribution with a variance equal to 1, but no definitive inference could be made with a variance equal to 16.},
author = {Das, Biman and Garcia-Diaz, Alberto and MacDonald, Corinne A. and Ghoshal, Kalyan K.},
issn = {0268-3768},
journal = {The International Journal of Advanced Manufacturing Technology},
keywords = {1 introduction and literature,an assembly line consists,assembly line balancing,bowl,bowl and inverted,computer simulation,of a series of,review,variable operation times,where,workstations},
month = mar,
number = {1-4},
pages = {15--24},
title = {{A computer simulation approach to evaluating bowl versus inverted bowl assembly line arrangement with variable operation times}},
volume = {51},
year = {2010}
}
@article{Das2012,
abstract = {Abstract Purpose – The purpose of this paper is to develop a computer simulation model to evaluate increasing versus decreasing mean operation times assembly line arrangement for normal and exponential distributions and the variances equal to 1 and 16. Design/methodology/approach – The model was developed on the basis of a realistic case problem and applied to a six-station assembly line. The evaluation criteria were: the minimization of the total elapsed time; the maximization of the average percentage of working time; and the minimization of the average time in the system. Findings – The increasing mean operation times line arrangement is superior to the decreasing mean operation times line arrangement for the normal and exponential distributions and the variances equal to 1 and 16, in terms of the total elapsed time and the average percentage of the working time evaluation criteria. The decreasing mean operation times lines is marginally superior to the increasing operation times line for the normal distribution for the variances equal to 1 and 16, in terms of the average time in the system evaluation criterion. The above inference can be made for the exponential distribution for the variance 16, but no definitive conclusion can be made for the variance 1. Overall, the increasing mean operation times line arrangement has proven to be superior to the decreasing operation times line arrangement for both the stated distributions and variances, in terms of the important evaluation criteria. Originality/value – The paper contributes to the computer simulation approach to solving assembly line problems that deal with the impact of normally and exponentially distributed operation times, with variances equal to 1 and 16, on the increasing and decreasing mean operation times assembly line arrangements. Keywords},
author = {Das, Biman and Garcia-Diaz, Alberto and MacDonald, Corinne A. and Ghoshal, Kalyan K.},
issn = {1741-038X},
journal = {Journal of Manufacturing Technology Management},
keywords = {Assembly line balancing,Assembly lines,Computer simulation,Increasing and decreasingmean operation times,Operations management,Stochastic operation times},
number = {6},
pages = {806--822},
title = {{Evaluation of alternative assembly line arrangements with stochastic operation times: A computer simulation approach}},
volume = {23},
year = {2012}
}
@article{Das2010a,
abstract = {Purpose – The purpose of this paper is to develop a computer simulation model to evaluate the bowl phenomenon and the allocation at the end of the line of stations with either greater mean operation times or higher variability of operation times. Design/methodology/approach – The model was developed on the basis of a realistic case problem and applied to a six-station assembly line. The evaluation criteria were the: minimization of the total elapsed time; maximization of the average percentage of working time; and minimization of the average time in the system. Findings – The performance of an assembly line with independently normally distributed operation times could be improved by applying the bowl phenomenon. The allocation of large operation mean times to stations located near the end of the line did not produce improved results. Instead a more balanced allocation proved to be more significantly effective. On the other hand, the assignment of larger variability of operation times to the stations near the end of the line improved the performance of the assembly line. Originality/value – The investigation contributed to the computer simulation approach to solving assembly line problems that dealt with the impact of normally distributed operation times on the bowl phenomenon and assembly lines with increasing mean operation times and higher variability of operation times at the end of the line of stations.},
author = {Das, Biman and Sanchez-Rivas, Jesus M. and Garcia-Diaz, Alberto and MacDonald, Corinne a.},
isbn = {1741038101107},
issn = {1741-038X},
journal = {Journal of Manufacturing Technology Management},
keywords = {assembly lines,journal of manufacturing technology,operations and production management,paper type research paper,process planning,simulation},
number = {7},
pages = {872--887},
title = {{A computer simulation approach to evaluating assembly line balancing with variable operation times}},
volume = {21},
year = {2010}
}
@article{Davis1966,
author = {Davis, L.},
journal = {International Journal of Production Research},
number = {3},
title = {{Pacing effects of manned assembly lines}},
volume = {4},
year = {1966}
}
@article{Dudley1963,
author = {Dudley, N.A.},
journal = {International Journal of Production Research},
number = {2},
pages = {137--144},
title = {{Work-time distributions}},
volume = {2},
year = {1963}
}
@article{El-Rayah979,
author = {El-Rayah, T.E.},
journal = {International Journal of Production Research},
number = {1},
pages = {61--75},
title = {{The efficiency of balanced and unbalanced production lines}},
volume = {17},
year = {1979}
}
@article{Hillier1966,
abstract = {Presented the bowl phenomenon},
author = {Hillier, F. S. and Boling, Ronald W.},
journal = {Journal of Industrial Engineering},
number = {12},
pages = {651--658},
title = {{The effect of some design factors on the efficiency of production lines with variable operation times}},
volume = {17},
year = {1966}
}
@article{Hillier1993,
abstract = {The bowl phenomenon provides a way of increasing the throughput of some production line systems with variable processing times by purposely unbalancing the line in a certain manner. However, previously available numerical results for applying the bowl phenomenon have been quite limited. We extend these numerical results here by obtaining the optimal allocation of work for somewhat larger cases than previously considered. We also develop some guidelines and data for extrapolating these results to estimate the optimal allocation of work for even larger production lines that are beyond the reach of exact solution methods. The results cover a broad cross-section of values of both buffer capacities and the coefficient of variation of processing times.},
author = {Hillier, Frederick S. and So, K. C.},
journal = {International Journal of Production Research},
number = {4},
pages = {811--822},
title = {{Some data for applying the bowl phenomenon to large production line systems}},
volume = {31},
year = {1993}
}
@article{Hillier1996,
abstract = {The bowl phenomenon provides a way of increasing the throughput of some production line systems with variable processing times by purposely unbalancing the line in a certain manner. However, achieving this increase in throughput depends on correctly identifying the values of the system parameters to estimate the optimal amount of unbalance and then actually being able to assign work to stations according to the optimal bowl allocation. In this paper we study the robustness of the bowl phenomenon by examining the effect of inaccurately estimating the optimal amount of unbalance and the effect of deviating from the optimal bowl allocation. Our results show that the bowl phenomenon is relatively robust in the sense that fairly large errors (even 50\%) in the amount of unbalance still provide most of the potential improvement in throughput over a perfectly balanced line. Moreover, the throughput still exceeds that of a perfectly balanced line in most cases even when the work allocation to each station deviates from the optimal bowl allocation by as much as 10\%. We also address the question of whether the optimal bowl allocation or the balanced line provides a more robust 'target' when assigning work to stations. When the deviations from these two targets are of the same magnitude, we found that the optimal bowl allocation target yields the larger throughput in most cases, where the average difference between their throughputs is roughly the same as the difference between the optimal throughput and the throughput of a balanced line. Furthermore, for the same magnitude of deviation, the throughput depends more heavily on the direction of the deviation from the balanced line than that from the optimal bowl allocation, so that the risk of a substantially reduced throughput is much larger when using the balanced line as the target. Therefore, the optimal bowl allocation provides a much more robust target than the balanced line. Keywords:},
author = {Hillier, Frederick S and So, Kut C},
journal = {European Journal Of Operational Research},
keywords = {manufacturing,production line design,queueing},
number = {1979},
pages = {496--515},
title = {{On the robustness of the bowl phenomenon}},
volume = {2217},
year = {1996}
}
@article{Hillier1967,
abstract = {This paper considers a queuing system consisting of N service channels in series where each channel has an exponential or Erlang holding time and (except for the first channel) a finite queue, and where the input process is such that the first queue is never empty. The measures considered are the steady-state mean output rate and mean number of customers in the system (excluding the first queue). First, a procedure is described, for obtaining these measures, that is relatively efficient computationally. Second, an exceptionally efficient procedure is developed for approximating the mean output rate for the case of exponential holding times. It is demonstrated that this procedure provides an excellent approximation for most cases and that it is computationally feasible for large problems. Third, extensive new numerical results are obtained.},
author = {Hillier, Frederick S. and Boling, Ronald W.},
journal = {Operations Research},
number = {2},
pages = {286--303},
title = {{Finite Queues in Series with Exponential or Erlang Service Times-A Numerical Approach}},
volume = {15},
year = {1967}
}
@article{Hillier1979,
abstract = {This paper provides results related to the optimal design of unpaced production lines. It has been shown previously that unbalancing an unpaced production line in an appropriate way will increase its production rate. Results are presented here which show how the optimal allocation of work between stations changes with respect to — the number of work stations in the line, — the limit on the amount of work-in-progress, and — the variance of station operation times. An analysis of results given here demonstrate the following system characteristics. When the number of stations in the production line increases, the average amount of unbalance in the optimal allocation remains about the same, but the increase in mean production rate obtained by using the optimal allocation rather than the balanced line becomes substantially larger. If the operation times are highly variable (exponential distribution), the effect of increasing in-process storage space is to substantially decrease the average amount of unbalance in the optimal allocation but to only slightly decrease the resulting improvement over the balanced line. If the amount of in-process storage space is very small, the effect of decreasing the variability of operation times is to decrease these same quantities but at a surprisingly slow rate. On the other hand, the effect of simultaneously increasing in-process storage space and decreasing the variability of operation times is to very rapidly decrease both the optimal unbalance and the resulting improvement. The model used to characterize an unpaced production line system is the classical queueing system with finite queues in series.},
author = {Hillier, Frederick S. and Boling, Ronald W.},
journal = {Management Science},
keywords = {: inventory/production: stochastic systems,production/scheduling: line balancing,queues},
number = {8},
pages = {721--728},
title = {{On the Optimal Allocation of Work in Symmetrically Unbalanced Production Line Systems with Variable Operation Times}},
volume = {25},
year = {1979}
}
@article{Hillier1995,
abstract = {We consider tandem queueing systems that can be formulated as a continuous-time Markov chain, and investigate how to maximize the throughput when the queue capa- cities are limited. We consider various constrained optimization problems where the decision variables are of one or more of the following types: (1) expected service times, (2) queue capacities, and (3) the number of servers at the respective stations. After sur- veying our previous studies of this kind, we open up consideration of three new pro- blems by presenting some numerical results that should give some insight into the general form of the optimal design.},
author = {Hillier, Frederick S. and So, Kut C.},
issn = {0257-0130},
journal = {Queueing Systems},
keywords = {bowl phenomenon,optimal design,tandem queues},
month = sep,
number = {3-4},
pages = {245--266},
title = {{On the optimal design of tandem queueing systems with finite buffers}},
volume = {21},
year = {1995}
}
@article{Hillier2013,
abstract = {This article considers the optimal design of unpaced assembly lines. Two key decisions in designing an unpaced assembly line are the allocation ofwork to the stations and the allocation of buffer storage space between the stations.To the best of the author’s knowledge, this is the first article to jointly optimize both the allocation of workload and the allocation of buffer spaces simultaneously when the objective is to maximize the revenue from throughput minus the cost of work-in-process inventory. Exact solutions are provided for small lines (three or four stations) with a fixed kind of processing time distribution (exponential or Erlang). Ten observations are made about the characteristics of the allocation of workload and buffer spaces.Heuristics are suggested for designing lines with more stations or different processing time distributions. A simulation study is done to test the observations and heuristics for longer lines and different processing time distributions (lognormal). Significant savings can be achieved by jointly optimizing both the workload and the buffer space allocations.},
author = {Hillier, Mark},
issn = {0740-817X},
journal = {IIE Transactions},
keywords = {applications,heuristics,inventory,line balancing,production,queueing,simulation},
month = may,
number = {5},
pages = {516--527},
title = {{Designing unpaced production lines to optimize throughput and work-in-process inventory}},
volume = {45},
year = {2013}
}
@article{Hillier2006,
abstract = {Much of the previous research on unpaced production lines with variable processing times has addressed either the issue of the allocation of work to the stations or the allocation of storage space to the buffers between stations. Our focus is instead on the simultaneous optimization of the workload allocation and the buffer allocation. We use a basic cost-based model that includes both revenue per unit of throughput and cost per unit of buffer space. Using both exponential and Erlang processing times, exact numerical results are obtained for the underlying Markov chain for cases where the number of states range up to over 2000 000. We investigate how the bowl phenomenon for workload allocation and the storage bowl phenomenon for buffer allocation interact when performing both allocations simultaneously. We also find counterexamples to a conjecture previously published in the literature that a balanced buffer allocation is optimal when the total number of buffer spaces is an integer multiple of the number of buffers.},
author = {Hillier, Mark. and Hillier, Frederick S.},
journal = {IIE Transactions},
number = {1},
pages = {39--51},
title = {{Simultaneous optimization of work and buffer space in unpaced production lines with random processing times}},
volume = {38},
year = {2006}
}
@article{Hong2013,
abstract = {Often in lean manufacturing, multiple products are produced in U-shaped manufacturing cells to simulta- neously achieve product variety and production efficiency. We examine two design issues for mixed-model U-lines: work rules (a first-come-first-serve rule and a crossover-and-return rule) and inventory flow choices (direct flow and buffered flow). Simulation results indicated that throughput and labor utilization can be improved by implementing a “buffer” with the first-come-first-serve rule. Interestingly, the effectiveness of an inventory flow choice was dependent on the work rule (interaction effect), and relationships among performance dimensions differed across the designs.},
author = {Hong, Yunsook and Visich, John K. and Pinto, Peter a. and Khumawala, Basheer M.},
issn = {09696016},
journal = {International Transactions in Operational Research},
keywords = {inventory control,lean manufacturing,mixed-model,productivity,u-line design,work rules},
month = jun,
number = {6},
pages = {917--936},
title = {{Evaluation of mixed-model U-line operational designs}},
volume = {20},
year = {2013}
}
@article{Hunt956,
author = {Hunt, Gordon C.},
journal = {Operations Research},
pages = {674--583},
title = {{Sequential arrays of waiting lines}},
volume = {4},
year = {1956}
}
@article{Kala1973,
author = {Kala, R. and Hitchings, G.},
journal = {International Journal of Production Research},
number = {2},
title = {{The effects of performance time variance on a balanced, four-station manual assembly line}},
volume = {11},
year = {1973}
}
@article{Karwan1989,
abstract = {A 1985 study in this journal examined the "bowl phenomenon" and claimed to demonstrate that Hllher and Boling's original results (1967) were due to an assumption of unrealistically large variations m processing times. A closer look at these findings reveals that they are derived from a flawed experiment that cannot possibly serve to verify whether or not average task times at work stations on an assembly line should be selected in a bowl distribution. We offer further arguments in clarification of Hilher and Boling's original work based both on our own simulation experiments as well as recent studies by other authors. We also comment on the need to orient future efforts toward practical integration of the bowl phenomenon result with other findings that relate to improving the throughput of unpaced assembly hnes.},
author = {Karwan, K and Philipoom, P},
issn = {02726963},
journal = {Journal of Operations Management},
month = jan,
number = {1},
pages = {48--54},
title = {{A note on ``Stochastic unpaced line design: Review and further experimental results''}},
volume = {8},
year = {1989}
}
@article{Kong2013,
abstract = {This study investigates the properties of the optimal assignment of two special workers in a limited-cycle model with multiple periods. Due to the competition among enterprises, meeting the scheduled delivery date becomes a necessity. In most arbitrary cases, the result and efficiency of a certain period of a production cycle are influenced not only by the risks that exist in the current period but also by the risks that existed in the foregoing ones. Moreover, the risk itself is also greatly affected by the risks that existed in the earlier periods. This kind of model is called a limited-cycle model with multiple periods. In this paper, we consider the properties of the optimal worker assignment that minimises the total expected risk in a limited-cycle model with multiple periods. This study is useful in describing the properties of the optimal worker assignment theoretically and determining an optimal cycle time.},
author = {Kong, Xianda and Sun, Jing and Yamamoto, Hisashi and Matsui, Masayuki},
journal = {Asian Journal Management Science and Applications},
keywords = {expected cost.,limited-cycle model,multiple periods,optimal worker assignment,processing efficiency,scheduling},
number = {1},
pages = {96--120},
title = {{Optimal worker assignment with two special workers in a limited-cycle model with multiple periods}},
volume = {1},
year = {2013}
}
@article{Kottas1981,
abstract = {Unlike earlier simulation studies of unpaced production lines which generally concentrated on steady-state behavior, we are concerned with their transient characteristics. This paper (1) presents an approach for studying and monitoring transient behavior, (2) reports some important transient operating characteristics previously overlooked, and (3) demonstrates that the conflicting results reported in some of the earlier simulation studies are due to an inadequate accounting for transient behavior. Thus, this paper illustrates the importance of having a better understanding of transient behavior in unpaced lines, and presents our initial steps towards dealing with it.},
author = {Kottas, John F. and Lau, Hon-Shiang},
issn = {02726963},
journal = {Journal of Operations Management},
month = feb,
number = {3},
pages = {155--164},
title = {{Some problems with transient phenomena when simulating unpaced lines}},
volume = {1},
year = {1981}
}
@article{Lau1992,
abstract = {For an unpaced line with equal mean station processing time for all stations, we study how the line's utilization factor is affected by different patterns of allocating processing-time variances among the stations. A literature review shows that earlier results on this question are ambiguous and inconclusive. Using long simulation runs (up to 240000 production units) and considering ‘larger’ lines with up to 19 stations and 6 units/buffer, we identified three desirable variance-allocation characteristics: ‘bowl-shape’, ‘symmetry’ and ‘spike-shape’. Conceptual generalization of these characteristics enable us to explain and reconcile the seemingly contradictory performance of a wide variety of variance-allocation patterns considered in this as well as the earlier studies.},
author = {Lau, Hon-Shiang},
journal = {European Journal of Operational Research},
number = {3},
pages = {345--356},
title = {{On balancing variances of station processing times in unpaced lines}},
volume = {61},
year = {1992}
}
@phdthesis{Malaki2012,
abstract = {In the present fierce global competition, poor responsiveness, low flexibility to meet the uncertainty of demand, and the low efficiency of traditional assembly lines are adequate motives to persuade manufacturers to adopt highly flexible production tools such as cross-trained workers who move along the assembly line while carrying out their planned jobs at different stations [1]. Cross-trained workers can be applied in various models in assembly lines. A novel model which taken into consideration in many industries nowadays is called the linear walking worker assembly line and employs workers who travel along the line and fully assemble the product from beginning to end [2]. However, these flexible assembly lines consistently endure imbalance in their stations which causes a significant loss in the efficiency of the lines. The operational time variability is one of the main sources of this imbalance [3] and is the focus of this study which investigated the possibility of decreasing the mentioned loss by arranging workers with different variability in a special order in walking worker assembly lines. The problem motivation comes from the literature of unbalanced lines which is focused on bowl phenomenon. Hillier and Boling [4] indicated that unbalancing a line in a bowl shape could reach the optimal production rate and called it bowl phenomenon. This study chose a conceptual design proposed by a local automotive company as a case study and a discrete event simulation study as the research method to inspect the questions and hypotheses of this research. The results showed an improvement of about 2.4\% in the throughput due to arranging workers in a specific order, which is significant compared to the fixed line one which had 1 to 2 percent improvement. In addition, analysis of the results concluded that having the most improvement requires grouping all low skill workers together. However, the pattern of imbalance is significantly effective in this improvement concerning validity and magnitude. Keywords:},
author = {Malaki, Afshin Amini},
pages = {73},
school = {J\"{o}nk\"{o}ping University},
title = {{A Study of the Effects of Operational Time Variability in Assembly Lines with Linear Walking Workers}},
year = {2012}
}
@article{Mirabedini2013,
abstract = {In this study, a simulation optimization method is applied in order to find the optimal design of a U-shape assembly line. The optimality criterion is the minimum number of needed stations. While many previous works use deterministic models to solve this problem, a simulation approach is applied in this study to consider the stochastic nature of the problem. On other hand, when we use a simulation method, a better understanding of system behavior can be obtained through the evolution of system. Another case that is considered in this study is the failures of conveyers which happen in the real world and the fatigue of operators is considered too. The procedure is as follows: first, an initial design of system is obtained by an optimizer (here Genetic Algorithm) with an initial given parameters. Second, the output of the optimizer is used to implement a simulation model in Visual Slam. Third, after running the model in simulator, the desired outputs are evaluated and the necessary changes will be made to optimizer parameters. Fourth, again the optimizer is used to generate new design with new parameters.},
author = {Mirabedini, Seyed Nima and Mina, Hassan and Iranmanesh, Seyed Hossein and Saleckpay, Babak},
journal = {Research Journal of Applied Sciences, Engineering and Technology},
keywords = {assembly line balancing problem,mathematical programming,simulation,single line},
number = {15},
pages = {2846--2858},
title = {{Optimization of a Single Model U-SLAB with Stochastic Duration with Integration of Genetic Algorithm and Computer Simulation}},
volume = {6},
year = {2013}
}
@article{Miralles2007,
author = {Miralles, Crist\'{o}bal and Garcia-Sabater, J and Andr\'{e}s, C and Card\'{o}s, M},
journal = {International Journal of Production Economics},
number = {1-2},
pages = {187--197},
title = {{Advantages of assembly lines in sheltered work centers for disabled: a case study}},
volume = {110},
year = {2007}
}
@phdthesis{Moreira2011,
author = {Moreira, Mayron C\'{e}sar de Oliveira},
keywords = {Linhas de produ\c{c}\~{a}o. Trabalhadores,deficientes. Otimiza\c{c}\~{a}o inteira mista. Metaheur\'{\i}st},
pages = {90},
school = {Universidade de S\~{a}o Paulo},
title = {{Balanceamento de linhas de produ\c{c}\~{a}o com trabalhadores deficientes}},
year = {2011}
}
@article{Moreira2013,
author = {Moreira, Mayron C\'{e}sar O. and Costa, Alysson M.},
issn = {09255273},
journal = {International Journal of Production Economics},
keywords = {assembly lines,hybrid algorithm,job rotation},
month = feb,
number = {2},
pages = {552--560},
title = {{Hybrid heuristics for planning job rotation schedules in assembly lines with heterogeneous workers}},
volume = {141},
year = {2013}
}
@article{Muth1987,
abstract = {A novel method of analysing serial production lines has been developed. This method lets one compute the throughput rate of lines composed of dissimilar stations, as well as for a large class of distributions of station service times. Several distribution-free models of 3-station lines are presented. These models are used to compute the throughput rate of unbalanced lines in which the sum of the mean service times is constant. Results are shown as contour plots of constant throughput rate. The bowl phenomenon is reviewed in the light of this capacity to model with a greater degree of freedom.},
author = {Muth, Enginhard J. and Alkaff, Abdullah},
doi = {0.1080/00207548708919831},
journal = {International Journal of Production Research},
number = {2},
pages = {161--173},
title = {{The bowl phenomenon revisited}},
volume = {25},
year = {1987}
}
@article{Otto2013,
abstract = {Recently, the importance of correctly designed computational experiments for testing algorithms has been a subject of extended discussions. Whenever real-world data is lacking, generated data sets provide a substantive methodological tool for experiments. Focused research questions need to base on special- ized, randomized and sufficiently large data sets, which are sampled from the population of interest. We integrate the generation of data sets into the process of scientific testing. Until now, no appropriate generators or systematic data sets have been available for the assembly line balancing problem (ALBP). Computational experiments were mostly based on very limited data sets unsystematically collected from the literature and from some real-world cases. As a consequence, former performance analyses often come to contradictory conclusions and lack on statistical evidence. We introduce SALBPGen, a new instance generator for the simple ALBP which can be applied and extended to any generalized ALBP, too. Unlike most generators, SALBPGen takes into account usual prop- erties of precedence graphs in manufacturing. It is very flexible and able to create instances with very diverse structures under full control of the experiment’s designer. We also propose new challenging data sets, as shown with the new direct measure of instance’s hardness called trickiness. By two exemplary computational experiments, we illustrate how important insights can be gained with the help of the systematically generated data sets.},
author = {Otto, Alena and Otto, Christian and Scholl, Armin},
doi = {10.1016/j.ejor.2012.12.029},
issn = {03772217},
journal = {European Journal of Operational Research},
keywords = {Assembly line balancing,Benchmark data set,Complexity measures,Precedence graph,Scheduling,Structure analysis},
month = jul,
number = {1},
pages = {33--45},
title = {{Systematic data generation and test design for solution algorithms on the example of SALBPGen for assembly line balancing}},
volume = {228},
year = {2013}
}
@article{Patterson1964,
author = {Patterson, R.L.},
journal = {Journal of Industrial Engineering},
number = {4},
pages = {188--193},
title = {{Markov processes occurring in the theory of traffic flow through an N-Stage stochastic service system}},
volume = {15},
year = {1964}
}
@article{Payne1972,
author = {Payne, S. and Slack, N. and R., Wild},
journal = {International Journal of Production Research},
number = {1},
title = {{A note on the operating characteristics of balanced and unbalanced production flow lines}},
volume = {10},
year = {1972}
}
@book{Pidd1998,
address = {Chichester, UK},
author = {Pidd, M.},
edition = {4th},
isbn = {0470092300},
publisher = {Wiley},
title = {{Computer simulation in management science}},
year = {1998}
}
@article{Rao1975,
abstract = {This paper presents an analytical study of several aspects of two-stage production systems with variable operation times and provision for intermediate storage. For the exponential service times assumed at one of the stages, the set of simultaneous equations satisfied by the steady-state probabilities are shown to involve the Laplace transform of the density function at the other stage and its various order derivatives. An analysis of this set of equations leads to a recursive solution for the mean production rate of a system with any number of storages. The realistic cases of Erlang and normal density functions are worked out in detail. It turns out that for moderate coefficients of variation the production rates for these two distributions differ only marginally. That this is not generally true is illustrated by considering a uniform distribution, for which the results are significantly different. The problem of balancing the production system is discussed at some length. It is shown that the production rate improves on allotting a slightly higher load to the less variable stage.},
author = {Rao, Nori Praska},
journal = {IEE Transactions},
number = {4},
pages = {414--421},
title = {{Two-Stage Production Systems with Intermediate Storage}},
volume = {7},
year = {1975}
}
@article{Rao1976,
abstract = {' Bowl phenomenon ' refers to the increase in production rate obtained by unbalancing a series production system such that the service time increases progressively on either side of the central stage(s). While such a result is valid for production systems with otherwise identical stages, earlier studies have suggested that a different effect may come into play when the stages of the system differ in their variability. In the simplest case of n two-stage system, production rate could be improved by shifting a part of the work load from the more variable stage to the less variable one. From an analysis of three-stage systems with all possible combinations of exponential and deterministic stages, it is shown in the present paper that optimum unbalancing results from a superposition of these two effects. This loads in some cases to a large improvement in the production rate. For the large differences in coefficients of variation considered in this paper, the ' variability imbalance ' clearly plays a decisive role and outweighs the ' bowl phenomenon ' when they act in opposing directions. For a three-stage system with a uniform stage sandwiched between two exponential stages, it is shown that the two effects exactly cancel one another when the coefficient of variation of the central stage is nearly 05.},
author = {Rao, Nori Praska},
journal = {International Journal of Production Research},
number = {4},
pages = {437--443},
title = {{A generalization of the 'bowl phenomenon' in series production systems}},
volume = {14},
year = {1976}
}
@book{Rapp2009,
author = {Rapp, Donald},
edition = {2009},
isbn = {0387876294},
pages = {274},
publisher = {Copernicus},
title = {{Bubbles, Booms, and Busts: The Rise and Fall of Financial Assets}},
year = {2009}
}
@book{Robinson2004,
author = {Robinson, Stewart},
isbn = {0470847727},
publisher = {John Wiley \& Sons, Ltd},
title = {{Simulation : The Practice of Model Development and Use}},
year = {2004}
}
@book{Scholl1999,
author = {Scholl, Armin},
edition = {2},
isbn = {3790811807},
pages = {318},
publisher = {Physica},
title = {{Balancing and Sequencing of Assembly Lines}},
year = {1999}
}
@inproceedings{Shaaban2011,
abstract = {In this paper we study the operating behaviour and performance of reliable, unpaced and unbalanced serial production lines with either imbalanced service time means, unequal coefficients of variation, or uneven buffer capacities. The lines were simulated with various values of line length, buffer storage size, degree of imbalance, coefficient of variation, along with a number of imbalance configurations. The primary measures of efficiency were idle time and average buffer level. Output data from the discrete event simulation of such lines under their steady-state mode of operation were analyzed using a set of statistical methods. Various relationships between the independent and response variables, rankings of configurations and comparisons with balanced lines were obtained. For the mean processing times imbalance, it turned out that a bowl-shaped arrangement provides smaller idle time amounts and lower average buffer levels than those of a balanced line counterpart. As regards the variability imbalance, it was found that the best configurations are respectively, a bowl allocation and a monotone decreasing order, with the first resulting in decreased idle times and the second leading to lower average buffer levels than those of a balanced line. As far as the buffer size imbalance is concerned, it was concluded that the most advantageous patterns that generate lower idle times and average buffer levels as compared to a balanced line are to respectively distribute total available buffer capacity as evenly as possible along the buffers and to allocate more buffer capacity towards the end of the line.},
author = {Shaaban, Sabry and Mcnamara, Tom and Atil, Ahmed},
booktitle = {Proceedings for the Northeast Region Decision Sciences Institute},
keywords = {imbalanced,one source imbalance,simulation,unpaced serial production lines},
title = {{The behaviour of unpaced production lines with unequal mean processing times, variability, or buffer capacities}},
year = {2011}
}
@article{Smunt1989,
abstract = {In the May 1985 issue of the Journal of Operattons Management, we published a paper containing two major segments (Smunt and Perkins (1985)). The first segment provtded a comprehensive review of previously published research on unpaced assembly lines. Two of the primary references in this review were path-breaking studies by Hillier and Bohng (1966) and Hillier and Bohng (1979). These studies introduced the idea of the "bowl phenomenon," which suggests that line output can be increased (compared to a balanced line) by unbalancing the line with high service times placed at the beginning and end of the line and low service times placed in the middle of the line. This pair of studies also verified the existence of the bowl phenomenon for exponential and Erlang service times with line lengths up to five stations and buffer capacities between stations from zero to four units.},
author = {Smunt, T and Perkins, W},
issn = {02726963},
journal = {Journal of Operations Management},
month = jan,
number = {1},
pages = {55--62},
title = {{Stochastic unpaced line design: A reply}},
volume = {8},
year = {1989}
}
@article{Smunt1985,
abstract = {Previous design studies of unpaced assembly lines that exhibit stochastic task times indicate that an unbalanced allocation of task times results in optimal output rates. In this article, we present a comprehensive review of the previous literature on this topic and discuss the results of simulation experiments that test the bowl distribution for unbalancing unpaced lines. The simulation experiment was designed to test the bowl distribution in more realistic environments than previously tested and illustrates that a balanced line configuration is as good as or better than an unbalanced line configuration when task times are modeled with more typical values of variance. Stochastic unpaced assembly line research employs both simulation and analytical approaches to test the allocation of buffer capacity and task times to work stations. Analytical models are utilized to investigate simple line designs with exponential or Erlang task time distributions. Simulation is used for longer lines and for normal task time distributions. From the review of the previous research using both approaches, we note five major findings: 1) unbalancing task time allocation is optimal when task time variation is large; 2) unbalanced allocation of buffer storage capacity improves line output rate when task time variation is large; 3) output rate of an unpaced line decreases as the number of sequential workstations increases; 4) output rate increases as more buffer storage capacity is available; and 5) output rate decreases as the task time variation increases. Most of the previous research on unpaced lines investigated lines with few workstations and large task time variation. Empirical research by Dudley (6) suggests that variation of task times in practice is much less than variations employed in previous unpaced line studies. We present the results from simulation experiments that model longer unpaced lines with lower levels of task time variance of the magnitude that is likely to occur in practice. The results of our simulation experiments verify the benefits of using the bowl distribution for task time allocation when line lengths are short and task times experience large variance. However, when line lengths are extended or task time variation is reduced, the use of the bowl distribution for unbalancing the line degrades the line's efficiency. In these situations, the optimal task time allocation is a balanced line. Two important implications for managers follow from the results of our experiments: 1) that unpaced line output rate is relatively insensitive to moderate variations from optimal task time allocations when buffer storage is limited; and 2) that perfectly balanced line designs are optimal for most cases in practice.},
author = {Smunt, Timothy L. and Perkins, William C.},
issn = {02726963},
journal = {Journal of Operations Management},
month = may,
number = {3},
pages = {351--373},
title = {{Stochastic unpaced line design: Review and further experimental results}},
volume = {5},
year = {1985}
}
@article{So1989,
abstract = {Previous work on optimal allocation of work to production line systems has found that the throughput of a production line is maximized by deliberately unbalancing the line in an appropriate way when the processing time distribution is exponential or Erlang. A recent simulation study suggests that unbalancing the production line may not improve the efficiency of the line when the processing times are normally distributed. However, it appears that the unbalanced work allocation being studied may not be chosen appropriately in the simulation study and therefore, no significant improvement (and actually a decrease) in performance was observed when compared with the perfectly balanced work allocation. This study performs a similar set of simulation experiments of the previous study with normally distributed processing times, but uses more appropriate unbalanced work allocations to determine whether the efficiency of a production line can be improved if the line is unbalanced appropriately.},
author = {So, K. C.},
journal = {International Journal of Production Research},
number = {4},
pages = {717--729},
title = {{On the efficiency of unbalancing production lines}},
volume = {27},
year = {1989}
}
@article{Tempelmeier2003,
abstract = {In this paper we consider the problems faced by an industrial planner who is responsible for the design of real-life asynchronous production lines under sto- chastic conditions that may be due to breakdowns or random processing times. Basedon real-life system data, it is shown that a number of available algorithms for the performance evaluation of a given system configuration as well as an algorithm for determining the optimum buffer configuration can be successfully applied in industrial practice.},
author = {Tempelmeier, Horst},
issn = {0020-7543},
journal = {International Journal of Production Research},
month = jan,
number = {1},
pages = {149--170},
title = {{Practical considerations in the optimization of flow production systems}},
volume = {41},
year = {2003}
}
@article{Yarmand2013,
abstract = {In this paper we consider the problem of allocating servers to maximize throughput for tandem queues with no buffers. We propose an allocation method that assigns servers to stations based on the mean service times and the current number of servers assigned to each station. A number of simu- lations are run on different configurations to refine and verify the algorithm. The algorithm is proposed for stations with exponentially distributed ser- vice times, but where the service rate at each station may be different. We also provide some initial thoughts on the impact on the proposed allocation method of including service time distributions with different coefficients of variation.},
author = {Yarmand, Mohammad H. and Down, Douglas G.},
issn = {03772217},
journal = {European Journal of Operational Research},
keywords = {server allocation,tandem queue,zero buffer},
month = nov,
number = {3},
pages = {596--603},
title = {{Server allocation for zero buffer tandem queues}},
volume = {230},
year = {2013}
}

Our goal is to look for the lines indicating the year of each publication and count the number of publications of each year. A way to keep this information is to use dictionaries. So let us take a look at them.

Dictionaries

Dictionaries are data structures data associate a key and a value. A simple example is a word dictionary. Each word is a key and its meaning is the associated value. Other example would be a mapping between account number and name of account owner. In Julia, it looks like the following:


In [43]:
accountNameDict = Dict{Int, String}(1 => "John Doe", 7 => "Mary Doe", 3 => "Mary John", 10 => "Doe Doe")


Out[43]:
Dict{Int64,String} with 4 entries:
  7  => "Mary Doe"
  10 => "Doe Doe"
  3  => "Mary John"
  1  => "John Doe"

To access the element with a specific key, we can do as follows:


In [44]:
accountNameDict[7]


Out[44]:
"Mary Doe"

If we try to access a key that is not in the dictionary we will get a KeyError exception.


In [45]:
accountNameDict[2]


KeyError: key 2 not found

 in getindex(::Dict{Int64,String}, ::Int64) at ./dict.jl:688

Adding a new element (key, value) to a dictionary is straight forward.


In [46]:
accountNameDict[2] = "Julia Prog"
accountNameDict


Out[46]:
Dict{Int64,String} with 5 entries:
  7  => "Mary Doe"
  10 => "Doe Doe"
  2  => "Julia Prog"
  3  => "Mary John"
  1  => "John Doe"

The collection of keys and values of a dictionary is available using the methods keys and values.


In [47]:
println(keys(accountNameDict))
println(values(accountNameDict))


[7,10,2,3,1]
String["Mary Doe","Doe Doe","Julia Prog","Mary John","John Doe"]

Exercise: Consider an array of elements $[(a_i, b_i)]$. Write an algorithm to build a dictionary such that each $a_i$ is a key and $b_i$ is the correspondent value.


In [49]:
# You can use this array as an input example for the exercise:
array = [(1, "1"), (2, "2"), (4, "4"), (10, "10")]

d = Dict{Int, String}()
for elem in array
    d[elem[1]] = elem[2]
end
d


Out[49]:
Dict{Int64,String} with 4 entries:
  4  => "4"
  10 => "10"
  2  => "2"
  1  => "1"

Putting it all together

In our example, we want to associate the year (key) to the number of publications in that year (value). So, the idea of the algorithm is to iterate over each line of a .bib looking for year lines and parsing them to build our {year => number of publications} dictionary. Let us implement it.


In [50]:
yearCountDict = Dict{Int, Int}()

open("example.bib") do fd 
    for line in eachline(fd)  # Iterating over the lines.
        
        # If it is a "year" line:
        if contains(line, "year")
            
            # Getting the string after = sign:
            year = split(line, "=")[end]
            
            # Striping the string:
            year = strip(year, ['\n', ' ', '{', '}'])
            
            # Parsing to Int (is it really needed?):
            year = parse(Int, year)
            
            # The get method, as we use here, insert the default value 0
            # if the key <year> is not found in the dictionary 'yearCountDict':
            count = get(yearCountDict, year, 0)
            
            yearCountDict[year] = count + 1
        end
    end
end

# Printing the dictionary:
for (year, count) in yearCountDict 
    println("$year => $count")
end


2004 => 2
2015 => 1
2003 => 2
2009 => 2
1976 => 1
1966 => 2
1993 => 1
2010 => 3
1956 => 1
1972 => 1
1992 => 1
1998 => 1
2014 => 2
1973 => 1
1975 => 1
1962 => 1
1967 => 1
2008 => 2
1996 => 2
2007 => 2
1987 => 1
1989 => 3
1985 => 1
1995 => 1
1981 => 2
2011 => 7
2001 => 1
1963 => 1
2013 => 8
1979 => 2
1999 => 1
2012 => 5
2006 => 1
1964 => 1

Note that the keys in the dictionary are not sorted. If we want to print the dictionary sorted by keys, we can do as follows:


In [51]:
for year in sort(collect(keys(yearCountDict)))
    println("$year => $(yearCountDict[year])") 
end


1956 => 1
1962 => 1
1963 => 1
1964 => 1
1966 => 2
1967 => 1
1972 => 1
1973 => 1
1975 => 1
1976 => 1
1979 => 2
1981 => 2
1985 => 1
1987 => 1
1989 => 3
1992 => 1
1993 => 1
1995 => 1
1996 => 2
1998 => 1
1999 => 1
2001 => 1
2003 => 2
2004 => 2
2006 => 1
2007 => 2
2008 => 2
2009 => 2
2010 => 3
2011 => 7
2012 => 5
2013 => 8
2014 => 2
2015 => 1

We can do a better job with the output. Julia has different plotting libraries, we are going to use Plots to plot a histogram of the information we gathered. To install it you can use the following:


In [ ]:
Pkg.add("Plots");

To plot the histogram, we only need to separate the keys and values. The rest of the code is formating the plotting area.


In [52]:
using Plots

dictKeys = collect(keys(yearCountDict))
dictValues = collect(values(yearCountDict))
    
fig = bar(dictKeys, dictValues, xticks=dictKeys, xrotation=90, label="", xlabel="Year", ylabel="Count")
    
fig


Out[52]:

In this Notebook, we used the motivation of exploring a .bib file to learn some of the basics of Julia regarding strings, files and dictionaries.