Monday, August 31, 2015

Accessibility

Not that kind of accessibility.

I have a bit of a love/hate relationship with Disney on a variety of fronts, which is not worth going into right now. One of the love aspects recently is the Disney Institute, which is their corporate training wing, where they act as consultants to do third party training to help other organizations achieve the magic they are known for. Of course, I think the reason they can give away their secrets is two-fold: first, I don't think they give away all their secrets and second, they know that most organizations can't or won't implement them. It is a significant investment requiring complete buy-in from the top to the bottom and back up again.

In a few recent blog posts, DI has talked about service recovery, which is how to fix problems when something goes wrong. Empowerment is the key to service recovery. For an example, every employee at Disneyland is trained to know that if a guest drops their cheese-filled pretzel or frozen banana on the ground, they can have a new one. It doesn't matter if it's the person who sold it to you or a random dude sweeping up garbage; half-eaten or just licked a little bit of the salt off, doesn't matter. They know the policy - free replacement cheese-filled pretzel - so before you even realize you dropped it, the garbage dude has swept it up and told you to go to the closest stand and just tell them that you need another one. Done. The pretzel person won't bat an eye, because they know the policy, too.

One of the most important points that I think gets lost is that empowerment should actually be empowerment, and not theoretical empowerment. That's where the idea of accessibility comes in. Accessibility means the service recovery solution is readily obtainable. As DI puts it, if a meal voucher is likely to be a recovery solution, "be sure the vouchers are available to employees when needed - not just when the one person with the key to the voucher drawer is present." If the vouchers are not accessible, the empowered employee is going to be less likely to offer that solution, and the unhappy customer is unlikely to want to stick around and waste time waiting for someone to find the key if that solution is offered.

That's also a good way to hide behind fake empowerment, where you tell a customer you wish you could give them something and if it was up to you, you would, but you know that the next level up will deny the request. You might even tell the next level up to deny the request when you put it in, just so you can say you tried even if it wasn't that hard.

Assuming you're not going down the fake empowerment road, the last thing you want to do is tell the person that you wish you could give them a particular solution but that you have to talk to your manager first. This is especially true if you know the customer deserves the solution but there is a chance management will turn down the request. That kind of empowerment is theoretical empowerment - your manager tells you that you are empowered, but there's no solid proof that your empowerment actually exists.

Even if you know they will back you up, it's annoying that everyone's time is being wasted - the customer, the employee, and the manager. How much is all that time and goodwill worth? Probably more than a cheese-filled pretzel.

Friday, July 31, 2015

The Art of Teaching

Another great post by Seth Godin, in which he talks about art. He clarifies that art can be anything, whether or not it's a painting (or photograph or sculpture, I might add) that is generous and risky.

Of course, teaching comes to mind. There are all sorts of people weighing in on various sides of whether you can measure what a teacher does. There are bad teachers that sometimes certain metrics may find and sometimes they may not. There are plenty of things that a good teacher does that cannot be measured.

Seth's list of characteristics of what makes art:

  • Human
  • Generous
  • Risky
  • Change
  • Connection
Of course, if we try too hard, we'll end up trying to make a rubric to measure whether or not someone's teaching or other works are "artistic enough" which then actually completely misses the point in the first place.

But, think of the good and bad teachers you've had (or that you've been), and consider the 5 factors above. Are they realistic? Do they give more than they ask in return? Do they sometimes try things that don't work but still end up being just as much (or more) of a learning experience as the the lesson plans that were executed to perfection? Do they instill real change (for the better)? Is there a real personal touch present, even if but for a moment?

Monday, June 29, 2015

Communication Management

In any project you do, a big piece of the success of the project is communication. As such, a large portion of the role of project manager is, you guessed it, communication. Sometimes people get frustrated by what they see as overcommunication from the PM. Other times people feel a bit like they're floating out there on their own, unsure of how the things they are doing fit in with what everyone else is doing. The PM must find a way to balance these extremes, so everyone gets just what they need (including the project manager).

The key first step is to identify the project stakeholders and perform a stakeholder analysis. Now, the term stakeholder is a bit of a loaded one for many people. For some it is the executive stakeholders who are the customers for whom a system is being built or change is being implemented. Sure, they're stakeholders. But that's a fairly narrow view.

Back it up and think of everyone who has a stake in the successful outcome of a project. Think of anyone who could positively or negatively impact the project. Then think of anyone who could be positively or negatively impacted by the project. Sure, your executive stakeholders or customers are on the list, which probably your project sponsor and/or champion who will probably need more communication than some of the other executives. And, yes, the sponsor/champion needs to be an executive, as they need to have money and political clout to help get past any roadblocks.

This is where the first step of the stakeholder analysis comes in - the power/interest grid. You build a 2x2 matrix with Power on one axis and Interest on another. You could pick different items for your axes, but these two often will get you as far as you need to go. You sponsor should be high and right. Not enough power means they're not going to be able to help when times get tough. Not enough interest means they're not going to want to.

Other stakeholders, such as future users of the system may fit higher or lower on the power and interest, depending on what kind of project you're doing. Something that has a status quo that people want to maintain will result in high interest users trying to shut you down. If they're the users, they have relatively high power, as they can sabotage, refuse to help get the system going, or just not use the system after it goes live, even if they are not actually on the project team.

Still others will include people like the dude you always see in the breakroom and is trying to get into a position in your department and really wants to know what is going on, even though at this point the poor soul has nothing to actually offer you. Then you have managers in other departments that could shut you down if they wanted, but they care little if anything about what is going on in your project, so you try to keep away from those people as much as possible. If one of those high power / low interest executives finds out how much your project is costing and wants some of that funding for something in their department, watch out. So a big piece of what you want to watch for is considering what information is relevant to which people, so you can be sure everyone has everything they need and nothing they don't. Even the lowly users who don't have much say in what is going on may be upset if they knew how much the project is costing, but unless they're writing the check, that's not something you talk to them about.

Once you've thoroughly gone through who the various stakeholders are, what kinds of things they can offer you, how much they care about what is going on, the methods of communication that would be most effective for each, and the specific details each cares about, it's time to actually create the communication plan.

The plan itself will be based on the stakeholder analysis and three major phases - introducing the project, carrying out the project, and closure.

Project introduction will include things like gaining buy-in from everyone. Sometimes it's little more than a courtesy notification that the project is happening, particularly for low power players. You'd be surprised how often people are surprised by projects that have been started and people who thought they were a key stakeholder are completely left out of the loop. Let people know what is going on, how the project will affect them, and what help you will need from them. If they have high power over your project, something deeper like explaining the ROI or strategic purpose behind the project will be necessary.

As the project begins, you need to check in with people every so often. A person who won't see the system for 2 years until it's completely done and ready to launch and isn't working on the project will get annoyed if they are receiving weekly status reports. Don't CC the entire company on things. Don't throw information out to people that they don't need to know. Be thoughtful and consider both their time and the political fallout of making people angry at you.

Do make your plan specific. Lay out what information team leads and other team members need to report back to the project manager and how often. Then lay out what the project manager will collect and analyze and who that aggregated information will be sent out to. What format will it be in? What are they expected to do with it? Just read it if they choose or provide feedback and approvals? What items don't recur regularly but happen on either just a certain date or upon some event occurring. When there are change requests, there should be a plan for getting those communicated to people, even though you don't know when they are going to happen. You just know when they are approved, they need to be communicated quickly so the project team is working on the latest information.

And as the project comes to an end, there is information that needs to be communicated and gathered to close out the project. Often final versions of the recurring communications will be put together. Other information, such as lessons learned and team member performance may not be known until the project actually does end. Whether the project is successful or unsuccessful, there should be closure. In fact, one piece of closure is to communicate about the success (or not) of the project. There can be many lessons learned from a failed project, so don't forget to sit down and talk about how to make sure the same thing doesn't happen again in the future. If you don't document that you were going to compile the approvals of all the project deliverables at the completion of each project phase, you may not be able to go back and collect those all later, due to people either leaving the company, losing interest in the project, forgetting what they agreed to, changing their mind, or otherwise. So know what you'll be communicating at the end so you can be gathering that information throughout.

As you lay out the items you will be communicating to gain buy-in and start the project off on the right foot, the items you will be communicating on a recurring or scheduled basis throughout the project itself, and the items you will gather to provide closure to your stakeholders when the project wraps up, be sure that you refer back regularly to the stakeholder analysis. Don't spend a lot of time on people who have nothing to help you with or who don't care about what you're doing. Be sure you have a sponsor who isn't going to lose interest in you half way through.

Make certain you include everything from the stakeholder analysis in your communication plan; if you know someone cares about the project costs and the communication plan never has you sending them a report on how much is being spent, you're missing something. Possibly even more important, when you're sending information to people, refer to the communication plan and from there back to the stakeholder analysis, and don't send stuff to people they don't care about and don't need, as it will begin to burn any goodwill you have with people and make it politically difficult to work with those people in the future. Yes, this means that you need to check who's on the CC line of an email before you hit reply-all and coordinate most of the communication centrally.

Saturday, May 30, 2015

Game Changer

The automobile is a source of freedom for all but big city folk who have solid, reasonable options for transportation (and ridiculous traffic and parking fees). So outside a few big cities, no one would be willing to give up their car, right? Maybe. I think everyone knows self-driving cars are coming. To some extent, they're already here, even if not widespread yet. But natural next step may not even be to purchase a self-driving car but rather rent one when you need it. I, for one, welcome our new taxi-bot overlords.

There is very little remaining to make these viable. Obviously GPS and mapping technologies are involved so the car's computer can find the route to get you from beginning to end, and we've largely handed navigation over to these devices already anyway. When was the last time driving somewhere new that you didn't pull out your GPS or look up the Google Map before leaving? Likewise, many vehicles are coming with sensors that warn the driver of other cars around it already.

But let's take it a step further than just having it to the heavy lifting on the freeway. This is definitely something that can be a game changer in terms of Porter's 5 Forces - talk about bargaining power over your customers - if you can reduce their costs so much that they don't need to buy their own car and make it so they don't have to hassle with parking, that's pretty amazing. You can already call a car to get you with an Uber or Lyft app. It's just combining that system with the self-driving car instead of a professional taxi driver or an amateur Uber/Lyft driver.

What could we do with all the parking lots in front of stores? What will we all turn our garages into when we don't need our own car? You might think that you'll always want to drive your own car, but what happens when insurance rates go up for self-drivers so much due to the fact that they drive unsafely and get into more accidents? Insurance companies already have devices they can put in your car to measure how good of a driver you are by collecting data about your driving habits. They just have to compare your habits to those of the taxi-bots, and your rates skyrocket.

So then ethically, how does this affect us? More tracking of where you have traveled to and from being stored in someone's database (more because it's already happening some). Actually, anywhere you go carrying your cell phone, you're already being tracked and recorded wherever you go, and in many cities your license plate number is tracked as you drive around town..

One of the big ethical questions is what happens when someone does get hurt or killed? Fewer people will be hurt with self-driving cars/taxis, but instead of it maybe being the fault of the person driving, what if it is the fault of the programming of the vehicle? What if a sensor is dirty and doesn't catch debris on the roadway?

What is a fair trade-off there handing over the control of your travels in contrast with the overall benefits to individuals and society? There are some very tricky issues here, but self-driving cars are here, whether owned by individuals or by taxi companies or long-haul trucking companies. How many jobs will be created vs other jobs that will be lost? Could a community taxi-bot take kids around to sports and lessons so the soccer mom doesn't have to anymore? Could the dream of sleeping through a night-time road trip and awaking as you pull up to your destination become a reality? How would that affect the airline industry? Will my youngest never need to learn to drive? There's almost no end of the implications here.

Thursday, April 30, 2015

Project Cancellation

I had an interesting discussion with a student recently regarding cancelling projects. In question was whether it is appropriate to cancel a failing project. The student's position was that a project should never be cancelled. The claim was that, at least at the large company where the student works, they could not afford to cancel a project once it started. If it is failing, then one would investigate the cause and make whatever changes are needed to get back on track.

Of course, you want to track things carefully to be sure any project is progressing as it should. If it gets into trouble, you do a risk assessment and change requests and whatever needs to be done to salvage it. But eventually, if it's actually failing, you cancel it. I think the disagreement came down to perhaps a difference in definition of "failing". If you have been through the process of analyzing what is going on and trying to fix it and it is still doomed for failure, then yes, it needs to be cancelled. If a couple things are just not going as planned, that doesn't mean failure; it means job security for good project managers.

There are many projects that are not cancelled even though they should be because of not much more than pride or attempting to save face. One of the most important concepts I learned about in my MBA program is that of sunk costs. That is, if you’ve already spent the money, it’s gone, sunk, finito. You don’t look back. What you already spent in the past is less important than what is going to happen moving forward. You look at how much it will cost to complete the project or change it or whatever moving forward, and the corresponding opportunity cost (which concept I learned about in undergrad economics), which is to look at whether there is something better you could be doing with that money (or time or any other resources involved) instead. This is sometimes referred to as a good-better-best comparison.

This not being willing to cut one's losses is where compulsive gamblers run into a similar issue, where they lose money and the more they lose the more they want to bet to try to win that money back. But it just digs the hole deeper instead of salvaging what remains in order to take the lessons learned and invest more wisely in the future.

Even better than straight up cancelling, however, is to build in several exit gates throughout the project so that upon completion of a phase, a planned review takes place, with the intent of determining whether the project should proceed. This is most common when the first phase is a feasibility study, but it can also be added after a prototype, pilot, or contract negotiation phase. Write up the criteria correctly, and you can find yourself successfully terminating a project by making the determination that a contract is not worth pursuing or that the pilot did not show the expected benefits. Then reallocate resources to something better.

Tuesday, March 24, 2015

Consistency vs Transformation

A project is a temporary endeavor. Its successful completion results in the creation of a new or improved product, service, process, or other result.

Being temporary means it should have a distinct beginning and end. In some project-based organizations, the temptation may be to drag the project on forever as a form of job security. The best job security, however, is being efficient at finishing projects and knowing your successful performance means you’ll always be reassigned once your current project is over.

Operations and processes just keep going on without a distinct beginning or end. An assembly line may be used to build a car from beginning to end, but as a whole, the assembly line is really a process that continually creates new cars over and over. If an inefficiency in the process is found, a project may be undertaken to overhaul the process, but once the new process is in place, it goes on with no planned end in sight.

Operations are important to the consistent functioning of a business. But don't underestimate the transformational power of a good project.

Saturday, February 28, 2015

Cardinal Wolsey



When I am forgotten, as I shall be,
And sleep in dull cold marble, where no mention
Of me more must be heard of, say, I taught thee,
Say, Wolsey, that once trod the ways of glory,
And sounded all the depths and shoals of honour,
Found thee a way, out of his wreck, to rise in;
A sure and safe one, though thy master miss'd it.
Mark but my fall, and that that ruin'd me.
Cromwell, I charge thee, fling away ambition:
By that sin fell the angels; how can man, then,
The image of his Maker, hope to win by it?
Love thyself last: cherish those hearts that hate thee;
Corruption wins not more than honesty.
Still in thy right hand carry gentle peace,
To silence envious tongues. Be just, and fear not:
Take an inventory of all I have;
My robe, and my integrity to heaven, is all
I dare now call mine own. O Cromwell, Cromwell!
Had I but served my God with half the zeal
I served my king, he would not in mine age
Have left me naked to mine enemies.

Image "Cardinal Wolsey Christ Church" by Sampson Strong (circa 1550–1611)

Tuesday, January 6, 2015

Haiku

I jokingly told my daughter who is supposed to do a presentation of some type on the seasons that she should do it as a haiku. I was looking up the "rules" since I couldn't remember how many syllables were supposed to be in each line. I found a site that talked about haiku, with all the rules and a bunch of examples. There are some great ones on that page. I really like the Christmas one about three quarters of the way down the page.

The basics are the 5 | 7 | 5 syllables per line, and it doesn't have to rhyme. What I had either forgotten or not known is that it is supposed to be seasonal, even if not obviously seasonal. And it's supposed to have a twist of some kind. So there are two halves, with some change from one to the other that provides a new perspective. Of course she had to do it, because of the season thing, but I still couldn't convince her, so she's doing a boring poster with a sun and the tilt of the earth across the different seasons.

So I decided to write a haiku for each season. Since they don't generally have titles (which would be kind of cheating on the 17 syllables thing, I grabbed some great Creative Commons licensed pics from Flickr to accompany each. Sure, each pic is worth 1000 words, but no syllables, so here they are with my four haiku:

Frigid, wind-whipped, dark,
Sullen stillness, empty streets.
Introvert's blanket.


Golden flowers bloom.
Wildlife fills the savannah.
Dandelions roar.


She reclines in sand,
Ocean waves in the distance.
Aye, mocking mirage.


Final drops, warmth drained,
He leans into coming cold.
A pile of leaves. Fall.


Photos by: ldandersen | paullew | cleftclips | sixelsid

Tuesday, December 16, 2014

Information Systems Success Models - An Annotated Bibliography

DeLone,W.H. & McLean, E.R. (1992). Information systems success: The quest for the dependent variable. Information Systems Research, 3(1).

IS success is multidimensional and interdependent, so interactions between success dimensions need to be isolated. Success dimensions should be based on goals of the research as well as proven measures where available. The number of factors should be minimized. The key factors included in the model include system quality, information quality, system use, user satisfaction, individual impacts, and organizational impact.

Rai, A., Lang, S.S., & Welker, R.B. (2002). Assessing the validity of IS success models: An empirical test and theoretical analysis. Information Systems Research, 13(1).

IS success models are compared. One major factor that differs among models is the category of IS Use. Some models include Use as a process since it is a prerequisite to other factors, others an indicator of success since people won’t use a system if they haven’t determined it will be useful to them, and of course perceived usefulness vs. measured use. The Technology Acceptance Model suggests that perceived usefulness and ease of use directly impact user behavior and system use.

Seddon, P.B. (1997). A respecification and extension of DeLone and McLean’s model of IS success. Information Systems Research, 8(September).

Standard variance models assert that variance in independent variables predicts variance in dependent variables. Process models, on the other hand, posit that not only are the occurrence of events necessary but that it is a particular sequence of events that leads to a change in the dependent variable. The presented IS success model removes the process component of the DeLone and McLean’s model. The problematic model contained three meanings of information system use. One meaning is that use provides some benefit to the user. A second, invalid, meaning presented use as a dependent variable of future use (i.e., if the user believes the system will be useful in the future, they will use it now). The third, also invalid, is that use is an event in the process that leads to individual or organizational impact. The proposed model links measures of system and information quality to perceived usefulness and user satisfaction, which in turn leads to expectations of future system usefulness and then use. Observing benefits to other individuals, organizations, and society also impact perceived usefulness and user satisfaction regardless of system or information quality.

Velasquez, N.F., Sabherwal, R., & Durcikova, A. (2011). Adoption of an electronic knowledge repository: A feature-based approach. Presented at 44th Hawaii International Conference on System Sciences, 4-7 January 2011, Kauai, HI.

This article discusses the types of use for knowledge base users. It utilized a cluster analysis to come up with three types of users. This included Enthusiastic Knowledge Seekers, Thoughtful Knowledge Providers, and Reluctant Non-adopters. Enthusiastic Knowledge Seekers made up the largest group at 70%. They had less knowledge and experience and shared little if anything of their own but considered the knowledgebase articles to be of high quality and very useful. The thoughtful knowledge providers, 19% of the users, submitted quality articles to the knowledgebase, enjoy sharing their knowledge with others, had moderate experience, and were intrinsically motivated. The smallest group, Reluctant Non-adopters at 11%, were experts who were highly experienced and adept at knowledge sharing but lacked the time or intrinsic motivation to do contribute meaningfully. They considered the knowledgebase to be low quality and did not consider it worth their time to work on improving it.

Thursday, December 4, 2014

Cluster Analysis and Special Probability Distributions - An Annotated Bibliography

Antonenko, P., Toy, S., & Niederhauser, D. (2012). Using cluster analysis for data mining in educational technology research. Educational Technology Research and Development, 60(3), 383-398.

Server log data from online learning environments can be analyzed to examine student behaviors, in terms of pages visited, length of time on a page, order of links clicked, and so on. This analysis is less cognitively taxing to the student than think aloud techniques and to the researcher since there is no coding of behaviors involved. Cluster analysis groups cases such that they are very similar within the cluster and dissimilar to other cases outside the cluster across target variables. It is related to factor analysis, where regression models are created based on a set of variables across cases, but in cluster analysis, cases are then grouped. Proximity indices (squared Euclidean distance or sum of the squared differences across variables) are calculated for every pair of cases. Squaring makes them all positive and accentuates the outliers. Various clustering algorithms are available to then group similar cases. Ward’s is a hierarchical clustering technique that combines cases one at a time from n clusters to 1 cluster and determines which minimizes the standard error, and is used when there is no preconceived idea about the likely number of clusters. Using k-means clustering, a non-hierarchical techniques, an empirical rationale for a predetermined number of clusters is tested. It may also be used when there is a large sample size in order to increase efficiency; if no empirical basis exists, the model is run on 3, 4, and 5 clusters. The method calculates k centroids and associates cases with the closest centroid, repeating until the standard error is minimized by allowing cases to move to a different centroid. It may also be possible to use two different kinds of techniques, for example, a Ward’s cluster analysis on a small sample followed by a k-means cluster analysis based on the findings from Ward’s. After determining the clusters, the characteristics of each cluster should be compared to ensure there is a meaningful difference among them and that there is a meaningful difference in the outcome based on their behaviors, since cluster analysis can find structures in data where none exists. ANOVA may then be used to determine for each cluster how much each variable contributes to variation in the dependent variable. It may be useful to use more than one technique and compare or average them, as different techniques may result in a variation in the results.

Bain, L.J. & Englehardt, M. (1991). Special probability distributions. In Introduction to probability and mathematical statistics (2nd Edition). Belmont, CA: Duxberry Press.

A Bernoulli trial has two discrete outcomes whose probabilities add up to 1. A series of independent Bernoulli trials forms a Binomial distribution, where the number of successes (or failures) are determined for n trials. A Hypergeometric distribution occurs when n samples are taken from a population of N+M without replacement. It can be useful for testing a batch of manufactured products for defects in order to accept or reject the batch. The Geometric Binomial distribution determines the minimum number of Bernoulli trials that must occur to achieve a success. The Negative Binomial distribution determines the minimum number of Bernoulli trials that must occur to achieve n successes. The Poisson distribution describes the probability of n independent successes occurring over a certain number of trials. The discrete uniform distribution allows for n possible values, each with equal probability of occurrence.

Blau, B.M., Brough, T.J., & Thomas, D.W. (2013). Corporate lobbying, political connections, and the bailout of banks. Unpublished manuscript, Department of Finance and Economics, Utah State University, Logan, UT.

When measuring a dependent variable with discrete values, an appropriate count regression framework must be used. Poisson, negative binomial, and OLS are possible models to use. Poisson regression uses a distribution where the mean is equal to its variance. If the distribution is over-dispersed or significantly greater than 0, Poisson will not work. No discussion of when negative binomial or OLS work.

Collins, L.M. & Lanza, S.T. (2010). Latent class and latent transition analysis for the social, behavioral, and health sciences. New York: Wiley. Latent variables are unobserved but predicted by the observation of multiple observed variables. The latent variable is presumed to cause the observed indicator variables. Different models are used, depending on whether the observed and latent variables are discrete or continuous. Using a discrete latent variable helps organize complex arrays of categorical data. A given construct may be measured using either continuous or discrete variables, so the method used when there is a choice should be based on which best helps address the research questions. When cases are placed into classes, the classes are named by the researcher based on their similar characteristics.

Fisher, W.D. (1958). On grouping for maximum homogeneity. Journal of the American Statistical Association, 53, 789-798.

Grouping or clustering is a useful tool for distinguishing sets of cases based either on prior theory of what the groups should entail or with no initial structure in mind. Combining the groups has a goal of minimizing the variance or error sum of squares. For some small cases, a visual inspection of data may allow the researcher to come up with the clusters. In large data sets with evenly dispersed data, this is difficult or impossible.

Francis, B. (2010). Latent class analysis methods and software. Presented at 4th Economic and Social Research Council Research Methods Festival, 5 - 8 July 2010, St. Catherine’s College, Oxford, UK.

Latent class cluster analysis assigns cases to groups based on statistical likelihood; they do not have to be assigned to discrete classes. K-means clustering is problematic, since the number of groups has to be specified a priori, cases are assigned to unique clusters, and only allows continuous data.

Gardner, W., Mulvey, E.P., & Shaw, E.C. (1995). Regression analyses of counts and rates: Poisson, overdispersed Poisson, and negative binomial models. Psychological Bulletin 118(3).

Researchers often use suboptimal strategies when analyzing count data, such as artificially breaking down counts into categories of 5 or 10, but this loses data and statistical power. Another ineffective strategy is to use regular linear regression or OLS. Using OLS, illogical values, such as negatives will be predicted, and the model’s variance of values around the mean is not likely to fit well. Another problem with OLS is heteroscedastic error terms, where larger values will have larger variances and smaller values small variances. Nonlinear models that allow for only positive values and describe likely dispersion about the mean must be used. Poisson places restrictive assumptions on the size of the variance. The Overdispersed Poisson model corrects for the large variances that are common. The negative binomial is another option. In the regular Poisson model, truncated extreme tail values could lead to underdispersion and a large number of high values could lead to overdispersion. An overdispersion parameter is calculated by dividing Pearson’s chi-squared by the degrees of freedom and then the overdisperson parameter is multiplied by the mean. The negative binomial model includes a random component that accounts for individual variances. The negative binomial model allows one to estimate the probability distribution, where the overdispersed Poisson does not.

Osgood, D.W. (2000). Poisson-based regression analysis of aggregate crime rates. Journal of Quantitative Criminology 16(1).

The normal approach to analyze per capita rates of occurrence is to use the OLS model. However, OLS does not provide an effective model when recording a small number of events. For large populations, OLS may work, but for a small number of events in a small population, the results is an overestimated rate of occurrence. Often small counts will be skewed with a floor of 0. The Poisson model corrects for many of these issues with OLS; however, the unlikely assumption of the Poisson’s mean equaling the variance must hold. Due to individual variations and correlation between observed values and variance, overdispersion is common. Adjusting the standard errors and thus t-test results for the overdispersion helps correct the model. The negative binomial model combines the Poisson distribution with a gamma distribution that accounts for unexplained variation.

Romesburg, H.C. (1990). Cluster Analysis for Researchers. Malabar, FL: Robert E. Krieger Publishing Company.

The steps in doing cluster analysis begin with creating the data matrix, including objects and their attributes. The objective is to determine which objects are most similar based on those attributes. An optional step is to standardize the data matrix. A resemblance matrix is then calculated, showing for each pair of objects a similarity coefficient, such as the Euclidean distance. Based on the similarity coefficient, a tree is created by combining similar objects and comparing their average to the other existing objects. Then rearrange objects in the data matrix to show the closest objects next to each other.

Velasquez, N.F., Sabherwal, R., & Durcikova, A. (2011). Adoption of an electronic knowledge repository: A feature-based approach. Presented at 44th Hawaii International Conference on System Sciences, 4-7 January 2011, Kauai, HI.

This article discusses the types of use for knowledge base users. It utilizes a cluster analysis to come up with three types of users. Clustering methods compared were Ward’s, between-groups linkage, within-groups linkage, centroid clustering, and median clustering and the one with the best fit was used.

Wang, W. & Famoye, F. (1997). Modeling household fertility decisions with generalized Poisson regression. Journal of Population Economics 10. Poisson and negative binomial models account for non-negative counts of discrete occurences. The Poisson model requires that the mean and variance of the dependent variable are equal, which is rarely true. This leads to a consistent model but invalid standard errors. The negative binomial model handles counts with overdispersion. When underdispersion is present, a generalized Poisson regression model may be used. Generalized Poisson handles both overdispersion and underdispersion.

Ward, J. H., Jr. (1963), Hierarchical Grouping to Optimize an Objective Function, Journal of the American Statistical Association, 48, 236–244.

Ward describes a clustering technique that allows for grouping with respect to many variables in such a way that minimizes the loss in each group. Traditional statistics would take a group of numbers, find the mean, and then calculate the error sum of squares for all cases and the one mean. By grouping, the ESS will be minimized as they are compared to the group means. The appropriate number of groups can be determined in the grouping process rather than needing to specify it in advance.