Seamheads.com and The Baseball Gauge Launch Negro Leagues Database
Our friends at Seamheads.com and The Baseball Gauge have launched the Negro Leagues Database, the first comprehensive online statistical encyclopedia of the great black baseball teams and leagues that operated behind the color line in the days of Jim Crow segregation.
The database, created by longtime SABR members Gary Ashwill, Scott Simkus, Kevin Johnson and Dan Hirsch, also collects a vast amount of biographical information about the players, including links to their SABR Baseball Biography Project entries and player pages at Baseball-Reference.com and Retrosheet.org, where applicable.
Ashwill and Simkus have compiled the statistics for the first three seasons of the Negro National League, 1920 to 1922; four seasons of pre-Negro League play, 1916 to 1919; and nine seasons of the Cuban Winter League, 1905 to 1913. These leagues featured some of the best black ballplayers of the era, including Oscar Charleston, Cristóbal Torriente, Bullet Rogan, Joe Williams and Pete Hill.
SABR member Mike Lynch, founder of Seamheads.com, says, “We are in the act of putting this encyclopedia together; it's very much a work in progress, which we'll be adding to little by little, game by game, season by season.”
On Tuesday, we spoke with Ashwill — who blogs regularly on the Negro Leagues at http://agatetype.typepad.com — by phone to talk about his research and launching the Negro Leagues Database. Here are some excerpts from our conversation:
SABR.org: What inspired you to get into Negro Leagues research?
Gary Ashwill: It goes back to when I was a kid and my mom bought me Robert Peterson's Only the Ball Was White. That was a great book; I was fascinated by it. And as a kid, I always loved stats and the Baseball Encyclopedia. But I think I was most fascinated by those old editions where they had a special section on the National Association in the 1870s, but they didn't have all the categories. There were a lot of blank spaces, and it seemed kind of mysterious to me. I always wanted to fill out those blank spaces, and the Negro Leagues is kind of a version of that. … And then in the 1980s, (SABR members) John Holway and Dick Clark and Larry Lester published a couple of seasons of stats in the Baseball Research Journal. And they put some stats in the back of the Baseball Encyclopedia, but it was all kind of partial. For various reasons, that project didn't keep going and I was left hanging by it. So it's kind of like: You write the book that you want to read. That's kind of how it happened for me.
SABR: How long have you been working on this database?
GA: I've personally been working on the research for 10 years, so it's been a while for me. But just getting it ready for the website, it's been a little more than a month. I think Kevin (Johnson) first wrote me at the end of July and suggested it. It's been remarkably quick, so I'm really pleased with that. I thought I knew these stats inside and out, but it's pretty neat to look at it in this different form … I'd never run any advanced stats on these guys, like Win Shares and stuff like that. So that's been kind of a revelation for me. I think it's in 1916, there was a pitcher named Juan Padrón who turned out to be the most valuable player in the (pre-)Negro Leagues that year. I had no idea.
SABR: Are the numbers stacking up the way you thought they would?
GA: I think in general, it's about what you'd expect. Which is a good thing, because you don't want it to be completely chaotic and unreliable. You look at, for example, Jimmie Lyons was reputed to be the fastest baserunner before Cool Papa Bell in the 1920s, and you look at the leaders of stolen bases and there's Jimmie Lyons at the top (for those years). … Oscar Charleston, Cristóbal Torriente, Dick Redding, Joe Williams, those guys are all at the top of the (leaderboards). But sometimes you get these little unexpected things, and that's what makes it cool.
SABR: What sources are you drawing from?
GA: It's all original research. Everything I do comes from contemporary newspapers. That's the reason it takes a long time to do this. You know, The Sporting News didn't publish any Negro League stats. At the time, most black newspapers were weekly papers and some were distributed nationally. So they tried to be pretty comprehensive in their coverage. That's a good place to start. So I'll get box scores for maybe half the games, and then I'll have to go to the individual cities — Detroit, Kansas City, Baltimore — and dig up these newspapers for the rest. The main difficulty is the time it takes to track them down.
SABR: Did any of your statistics come from the massive print sources such as The Negro Leagues Book or The Biographical Encyclopedia of the Negro Baseball Leagues?
GA: No, all of our numbers come from box scores that we've got. And the same thing with the biographical data. … We're lucky to live in an era where a lot of databases get digitized, so it makes a lot of kinds of research much easier, such as inspecting Census records and (military) draft cards, etc. I always try to work from contemporary sources, so a lot of my stuff may seem revisionist from (previous) sources, such as name spellings. But I'm trying to gather the best available evidence.
SABR: Negro Leagues players are known to have bounced around a lot, for segregation reasons and otherwise. How are you keeping track of them for the database?
GA: Well, we're working in two veins right now. We're doing the Negro Leagues, for the stuff in the U.S. in the summer. But we're also doing the Cuban Winter League, which a lot of major and minor leaguers went down to play in. So there was some interracial play going on. It's a useful place to see how good these Negro Leaguers were. … We should be able to add some major Cuban (League) stuff to the database fairly soon, and you can get a pretty good idea of how good Pete Hill and Joe Williams and some of the early players were. Scott Simkus is working on the 1930s stuff right now, so I'm interested in seeing how that looks.
SABR: How difficult is it to evaluate advanced metrics for these players?
GA: I think you kind of have to be sabermetrically aware to make sense of all this stuff. You know, schedules are really unbalanced, some teams will play 80 games against what we consider to be the top black teams, some will play 10 … the parks were different … and there were a lot of complex, historical reasons for all that. … In the end, it's always going to be kind of arbitrary and sometimes you have to just draw the line where it makes the most sense. There's not always going to be a right answer or a wrong answer.
The number one thing you have to look at is the sample size: the number of games they play. So you have to do the best you can in terms of interpreting the material, and be aware that it's going to be a little fuzzy. I tend to think it's best to look at groups of seasons; don't just focus on one season to see how good a player was. Look at how he did from 1918 to 1922, and compare how he was with everybody else. You just have to be aware that you have to do a little more work to (properly evaluate). The players in the early Eastern leagues — and in some cases, they didn't even have organized leagues — didn't really play as many games against each other, and so guys like Louis Santop and Spott Poles ... won't have as many Win Shares as the Midwestern players. You have to keep those factors in mind.
SABR: How can other researchers get involved and help build this database?
GA: We always welcome any kind of help, especially in terms of really difficult-to-access newspapers. The toughest city for us has been Atlantic City, and it's really vital for Negro League history. There were a lot of important games played there in the 1900s and early 1910s. I've been told that paper is only on microfiche, not microfilm, and it's only at one library and they refuse to loan it out. So you have to go to Atlantic City to find that information. … Another thing is that I realized the website was going to need photos. My library's pretty good, so I was able to find more than 300 players. But some of the quality's not that great and there's still a lot of players and teams missing. I'm always interested in photos, especially high-resolution photos, that anyone can contribute.
Gary can be reached by e-mail at firstname.lastname@example.org.
To browse the Negro Leagues Database, visit http://www.seamheads.com/NegroLgs.
— Jacob Pomrenke
This page was last updated September 14, 2011 at 10:57 am MST.