Stat Archive FAQ
- Southpaw Slim
- Kenesaw Mountain Landis
- Posts: 610
- Joined: Sun Jan 30, 2005 7:36 pm
- Location: Oakland, CA
- Contact:
Stat Archive FAQ
OK. I'm at a point where I think I might finally have this PHP thing figured out to where I'm ready to start hand-coding the archive. But before I get too involved in it, it's important that I find out what it is you all want to know about it.
For instance, what stats do you want to be able to see? So far I intend to deliver the basics like AB, H, R, HR, RBI, etc. Basically any stat we keep on our scoresheets. But there are also the complex stats like AVG, APS/OPS, SLG, etc. These are the reason for having the database to begin with. It saves Peter a lot of grief and time he doesn't really have to spend on figuring out our averages.
Furthermore, what special things do you want from the site? I for one would like to see personal player stat pages. By this I mean we'd each have a page with our picture, name, possibly a desired jersey number, hometown, career stats, and other personal info you'd like to share. From there you'd be able to choose to see individual game stats in descending order from the most recent game all the way back to the first one you played. This is how I'd picture the site to be at its best.
However, first things first. In order to be able to sort our stats, we need to have stats in the database to begin with. That's why my first priority at this point is the input page. Since Scott's the one who has all the scoresheets, it'll be catered to his liking (provided I can write the code to do what he needs.) The one wrench that's been holding the whole project to a standstill is the inning-specific nature of our current stat-keeping. I think I've finally overcome the hurdle though, and therefore everything is going to get rolling again.
So please let me know what it is you'd like the site to do for you. I plan to model it loosely on the MLB and Yahoo! Sports sites, however I may come up with something completely original in nature. It all depends on what you want.
BTW: With this recent PHP epiphany, the site is currently scheduled to premier in late June to early August.
For instance, what stats do you want to be able to see? So far I intend to deliver the basics like AB, H, R, HR, RBI, etc. Basically any stat we keep on our scoresheets. But there are also the complex stats like AVG, APS/OPS, SLG, etc. These are the reason for having the database to begin with. It saves Peter a lot of grief and time he doesn't really have to spend on figuring out our averages.
Furthermore, what special things do you want from the site? I for one would like to see personal player stat pages. By this I mean we'd each have a page with our picture, name, possibly a desired jersey number, hometown, career stats, and other personal info you'd like to share. From there you'd be able to choose to see individual game stats in descending order from the most recent game all the way back to the first one you played. This is how I'd picture the site to be at its best.
However, first things first. In order to be able to sort our stats, we need to have stats in the database to begin with. That's why my first priority at this point is the input page. Since Scott's the one who has all the scoresheets, it'll be catered to his liking (provided I can write the code to do what he needs.) The one wrench that's been holding the whole project to a standstill is the inning-specific nature of our current stat-keeping. I think I've finally overcome the hurdle though, and therefore everything is going to get rolling again.
So please let me know what it is you'd like the site to do for you. I plan to model it loosely on the MLB and Yahoo! Sports sites, however I may come up with something completely original in nature. It all depends on what you want.
BTW: With this recent PHP epiphany, the site is currently scheduled to premier in late June to early August.
I intended to write something to remind everybody of my superior prowess.
- Baseball=Life
- Baseball Deity
- Posts: 1031
- Joined: Sat Jan 29, 2005 11:16 pm
- Location: SF, CA
Hey Nick, this is going to be stat heaven when you are done.
I'm thinking basically what yahoo has in their free fantasy leagues. Hahahah we could even have our friends from all over the country pay us for live stattracker of our games to their PCs.
Just kidding. But really, here's what I'm talking about:
1. boxscores
2. overall (all players total) stats, averages per game
3. individual player stats
* game log
* ab, h, r, 2b, 3b, hr, rbi, k, wg, games played win/loss record
* splits, like defremery vs curt flood vs raimondi vs overall
* splits, like 1st game vs 2nd game (of doubleheaders)
* splits, like spring/summer season vs fall/winter season
* splits, like march vs april vs may
* splits, batting 1st vs batting 2nd
4. player info: picture/avatar, hometown, # of posts & status (ie "all star"), current leader status (ie "currently most hits" or "currently 2nd in batting average") for a few key stats, player's career or single-game records currently held except when negative like most K
5. Sortable player total stats (ie just click to sort by X stat column header)
6. Leader Boards, like current "record book" but up to 1-5 (ie top 5 batting avg, or top 5 RBI)
I'm thinking basically what yahoo has in their free fantasy leagues. Hahahah we could even have our friends from all over the country pay us for live stattracker of our games to their PCs.

1. boxscores
2. overall (all players total) stats, averages per game
3. individual player stats
* game log
* ab, h, r, 2b, 3b, hr, rbi, k, wg, games played win/loss record
* splits, like defremery vs curt flood vs raimondi vs overall
* splits, like 1st game vs 2nd game (of doubleheaders)
* splits, like spring/summer season vs fall/winter season
* splits, like march vs april vs may
* splits, batting 1st vs batting 2nd
4. player info: picture/avatar, hometown, # of posts & status (ie "all star"), current leader status (ie "currently most hits" or "currently 2nd in batting average") for a few key stats, player's career or single-game records currently held except when negative like most K
5. Sortable player total stats (ie just click to sort by X stat column header)
6. Leader Boards, like current "record book" but up to 1-5 (ie top 5 batting avg, or top 5 RBI)
"Baseball is like church, many attend, few understand"
- Leo Durocher
- Leo Durocher
- Baseball=Life
- Baseball Deity
- Posts: 1031
- Joined: Sat Jan 29, 2005 11:16 pm
- Location: SF, CA
Can from here on out.
With the advent of the new plate appearance tracking scoresheet, we now can reliably track OPS.
"Baseball is like church, many attend, few understand"
- Leo Durocher
- Leo Durocher
- Southpaw Slim
- Kenesaw Mountain Landis
- Posts: 610
- Joined: Sun Jan 30, 2005 7:36 pm
- Location: Oakland, CA
- Contact:
I'm excited to see a response so quickly, so I'll just go over it point by point and let everybody know where the archive stands in terms of development.
Thank you, Scott, for all the input you gave in this one post. I'll gladly take more requests from anybody else. Get creative. I'd really like to know what you want to see from the site. The more stats I can produce, the more excited I'll be. However, I may need to be careful not to overload the archive with scripts in order to keep it running at a decent pace. But I really don't care. Stats are worth waiting a few extra seconds for!
This is the most crucial aspect of the site. Boxscores need to be entered first before anything else can be done. I plan to have them appear on the site as they would on paper... only neater. (That's the beauty of the digital era.)Baseball=Life wrote:here's what I'm talking about:
1. boxscores
I'm not quite sure exactly what you mean by this, but I do plan to have a "League Average" stat for each category. I can also make AVG/game stats for every category which can either follow a league avg or player's avg.2. overall (all players total) stats, averages per game
This is the primary reason why I want the archive up in the first place. I want to see how my hitting holds up from game to game.3. individual player stats
* game log
Thank you! This is exactly what I'm talking about. If I can find out which splits you want to know, I can add the appropriate categories to the database for sorting later. I already have some code written for field and month splits, but I haven't quite gotten the batting order splits worked out on paper... yet. I also plan to split the game logs into seasons (a la Yahoo! fantasy baseball) with the three most recent seasons showing when we have that many, and an option to view all seasons.* ab, h, r, 2b, 3b, hr, rbi, k, wg, games played win/loss record
* splits, like defremery vs curt flood vs raimondi vs overall
* splits, like 1st game vs 2nd game (of doubleheaders)
* splits, like spring/summer season vs fall/winter season
* splits, like march vs april vs may
* splits, batting 1st vs batting 2nd
Incorporating the forum statistics (# of posts and status) will be the most difficult, so I'll focus on those after everything else is done. However, it is important to me to get some bios up for people to view when we have a full season entered into the database. I'm also intrigued by the idea of current leader standings. It would be tough, but not impossible, to have the site figure out how close you are to the lead in specific categories. I would propose a "Top 5" limit on APS, AVG, H, RBI, R, and HR, and a "Top 3" limit on 2B, 3B, and WG. And also a "Lucky Charm" award to the players with the most ROE and winning percentage. Most excitingly, I plan to formulate an equation to figure out where people stand in the MVP race. (Scott, feel free to correspond with Franz and others to figure out the best equation. I think a balance of APS, RBI, R, and W% make the best combo.) Don't expect to see everything early on, however. It takes a while to set something like that up, so I'll more realistically have this part finished near the end of next season. When I do, I'm sure DeFStAr (also possibly OakStAr or DPDB) will be a force with which to be reckoned.4. player info: picture/avatar, hometown, # of posts & status (ie "all star"), current leader status (ie "currently most hits" or "currently 2nd in batting average") for a few key stats, player's career or single-game records currently held except when negative like most K
This is the easiest (and therefore most reliable) thing for me to do, so expect it to be first on the list of available features.5. Sortable player total stats (ie just click to sort by X stat column header)
Without a doubt, this is a must have. It was the first thing I started working on after sortable stats, and will most definitely be included in the first version of DeFStAr.6. Leader Boards, like current "record book" but up to 1-5 (ie top 5 batting avg, or top 5 RBI)
Thank you, Scott, for all the input you gave in this one post. I'll gladly take more requests from anybody else. Get creative. I'd really like to know what you want to see from the site. The more stats I can produce, the more excited I'll be. However, I may need to be careful not to overload the archive with scripts in order to keep it running at a decent pace. But I really don't care. Stats are worth waiting a few extra seconds for!
I intended to write something to remind everybody of my superior prowess.
- Southpaw Slim
- Kenesaw Mountain Landis
- Posts: 610
- Joined: Sun Jan 30, 2005 7:36 pm
- Location: Oakland, CA
- Contact:
Update
After taking a little vacation, I'm back in the code. I think it's a good thing I took some time off because there's something I've been unhappy with. Although it took almost two months to figure out how to translate boxscores so they could be entered into a database, the sheer amount of space required for each game was getting on my nerves. As I had figured it, for every inning in which any stat was recorded for a particular player (be it an AB, a ROE, or even a WG if the person had no plate appearance that inning) a record would have to be made which included statistics for 13 categories: player-specific ID, game-specific ID, team the player was on, AB, R, H, 2B, 3B, HR, RBI, K, ROE and WG.
Assuming we average 20 players at 6.5 AB per person per game (which is overshooting a little bit for caution) that adds up to 13,520 records per year! So far this year we've already had just over 1900 recorded plate appearances (including the one recorded LIVE game), and that's considering we started the year playing only twice a month (instead of twice a week.) So as it is now, if we play a double-header every Sunday we'll have approximately 9800 AB per year! This is using a 96 AB/game average without even factoring in ROE (which make up nearly 10% of all PA.) Adding the estimated 980 ROE, we have a realistic estimate of 10,000 records per year.
Now, I don't know much about databases yet, but I'm pretty sure 10,000 records could make stat searches slow down. And that doesn't mean looking for your specific stats over a period of time. For every time a page is loaded which contains some statistical information, whether it be leader boards, a personal info page, or even the latest boxscore, a request is sent to the database asking for the information on file. The database then searches through all the records looking for matches. These are sent back to the web browser and decoded into the web page you see before your eyes. In fact, if you look at the page you're viewing now, you'll see the characters "php?t=175" or "php?p=1810#1810" at the end of the web address. This is telling the browser to view the topic with the index number 175 or post number 1810 out of however many there are currently on the forum.
The call to the stat archive acts in much the same way. For example, if I want to know all the stats for Scott Leathers during the month of May, I would (theoretically) just choose his name from a drop-down list and check a box entitled "May". Then I'd hit a "Submit" button which sends the request to the database for all records with a playerID that matches the specific ID allocated to Scott (let's say "3") in the month of May (which would have the date format of 2005-05-* where "*" is a wildcard character which accepts any day of the month). This request is sent to the database like so:The ending semicolon is to tell the database to process the request as a command and acts kind of like the Enter key.
The database then returns (SELECT) all records (*) from a table called "gamedata" for Scott (playerID='3') in the month of May, 2005 (date='2005-05-*') and this information is sent to the browser (IE, Netscape, Firefox, etc.) The browser then uses the information contained in the rest of the code from the web page to figure out what to do. In this hypothetical situation, the result would be displayed in a table game-by-game much like the following... but WAAAAY better to look at:
So it's a great idea to make these tables for each person to look at, but after a few games, you'll start to notice a decline in response speed. I'm trying to remedy that by compacting the records inserted into the database. I then plan to break the records down by inning after retrieving the information from the database. The speed of load time will depend on your individual computer instead of the server itself. At first, there will be no perceptible benefit to this method. In fact, it would start out being a bit slower than the per-inning record method metioned above (you know, 10,000 records per year) but would prove to be beneficial by only requiring a maximum of 2,100 records per year (including additional pitching statistics for LIVE games.)
I don't know if this means anything to anybody, or if I even managed to explain it in a way that makes sense. If you have any questions at all about what I'm talking about, post them here and I'll more than happily try to explain what it is I'm talking about. The main reason for this post was to reassure everyone that there has been a lot of thought going into the stat archive and making it the most efficient it can be so we can use it for years. Please post your comments here. I'm interested in knowing what everyone has to say about all of this.
-Nick
Assuming we average 20 players at 6.5 AB per person per game (which is overshooting a little bit for caution) that adds up to 13,520 records per year! So far this year we've already had just over 1900 recorded plate appearances (including the one recorded LIVE game), and that's considering we started the year playing only twice a month (instead of twice a week.) So as it is now, if we play a double-header every Sunday we'll have approximately 9800 AB per year! This is using a 96 AB/game average without even factoring in ROE (which make up nearly 10% of all PA.) Adding the estimated 980 ROE, we have a realistic estimate of 10,000 records per year.
Now, I don't know much about databases yet, but I'm pretty sure 10,000 records could make stat searches slow down. And that doesn't mean looking for your specific stats over a period of time. For every time a page is loaded which contains some statistical information, whether it be leader boards, a personal info page, or even the latest boxscore, a request is sent to the database asking for the information on file. The database then searches through all the records looking for matches. These are sent back to the web browser and decoded into the web page you see before your eyes. In fact, if you look at the page you're viewing now, you'll see the characters "php?t=175" or "php?p=1810#1810" at the end of the web address. This is telling the browser to view the topic with the index number 175 or post number 1810 out of however many there are currently on the forum.
The call to the stat archive acts in much the same way. For example, if I want to know all the stats for Scott Leathers during the month of May, I would (theoretically) just choose his name from a drop-down list and check a box entitled "May". Then I'd hit a "Submit" button which sends the request to the database for all records with a playerID that matches the specific ID allocated to Scott (let's say "3") in the month of May (which would have the date format of 2005-05-* where "*" is a wildcard character which accepts any day of the month). This request is sent to the database like so:
Code: Select all
SELECT * FROM gamedata WHERE playerID='3' AND date='2005-05-*';
The database then returns (SELECT) all records (*) from a table called "gamedata" for Scott (playerID='3') in the month of May, 2005 (date='2005-05-*') and this information is sent to the browser (IE, Netscape, Firefox, etc.) The browser then uses the information contained in the rest of the code from the web page to figure out what to do. In this hypothetical situation, the result would be displayed in a table game-by-game much like the following... but WAAAAY better to look at:
Code: Select all
----------------------------------------------------------------
| SCOTT LEATHERS IN MAY 2005 |
----------------------------------------------------------------
| DATE | POS | H/AB | R | 2B | 3B | HR | RBI | K | WG | NOTES |
----------------------------------------------------------------
| 5/01 | CF | 3/5 | 2 | -- | -- | -- | 1 | - | -- | ----- |
| 5/01 | UTL | 3/6 | 5 | -- | -- | -- | 1 | - | -- | ----- |
| 5/15 |2B/LF| 5/7 | 4 | 1 | -- | -- | 1 | - | -- | ----- |
| 5/22 |LF/SS| 3/4 | 2 | 1 | -- | -- | 3 | - | -- | ----- |
| 5/22 | 3B | 4/4 | 3 | -- | -- | -- | 2 | - | -- | ----- |
| 5/29 | SS | 2/6 | 2 | -- | -- | -- | 1 | - | -- | ----- |
| 5/29 | SS | 5/6 | 3 | 1 | -- | -- | 4 | - | -- | ----- |
----------------------------------------------------------------
| MAY TOTALS | 25/38 |21 | 3 | 0 | 0 | 13 | 0 | 0 | .658 |
----------------------------------------------------------------
I don't know if this means anything to anybody, or if I even managed to explain it in a way that makes sense. If you have any questions at all about what I'm talking about, post them here and I'll more than happily try to explain what it is I'm talking about. The main reason for this post was to reassure everyone that there has been a lot of thought going into the stat archive and making it the most efficient it can be so we can use it for years. Please post your comments here. I'm interested in knowing what everyone has to say about all of this.
-Nick
I intended to write something to remind everybody of my superior prowess.
- Baseball=Life
- Baseball Deity
- Posts: 1031
- Joined: Sat Jan 29, 2005 11:16 pm
- Location: SF, CA
Nick, first of all, we're paying you big bucks to get this done, so hurry the fuck up! Hahahhahaha just kidding, thanks for all the work you have done and are doing for DeFStAr.
So, I honestly need to read your post a few more times before I understand it and can actually respond, so keep checking back.
But right off the bat I want to thank you for putting my May performance as the example.... I think this will really help me in the MVP race, which we all should know is coming down to the final month of the season, this month (June). So everyone, take in Nick's post and reply about DeFStAr, but also take note of my May performance in and of itself.
Finally, thanks for bringing the thread back up so I have been reminded to get with Franz about the MVP modifiers, etc. I will certainly do that. This guy is such a baseball nerd (that says a lot coming from me) that he has developed his own player rankings system. [Derek Lee is currently # 1 in his system.]
Ok, check back for my actual response to the content of your post.
So, I honestly need to read your post a few more times before I understand it and can actually respond, so keep checking back.
But right off the bat I want to thank you for putting my May performance as the example.... I think this will really help me in the MVP race, which we all should know is coming down to the final month of the season, this month (June). So everyone, take in Nick's post and reply about DeFStAr, but also take note of my May performance in and of itself.
Finally, thanks for bringing the thread back up so I have been reminded to get with Franz about the MVP modifiers, etc. I will certainly do that. This guy is such a baseball nerd (that says a lot coming from me) that he has developed his own player rankings system. [Derek Lee is currently # 1 in his system.]
Ok, check back for my actual response to the content of your post.
"Baseball is like church, many attend, few understand"
- Leo Durocher
- Leo Durocher
- Southpaw Slim
- Kenesaw Mountain Landis
- Posts: 610
- Joined: Sun Jan 30, 2005 7:36 pm
- Location: Oakland, CA
- Contact:
OMG, Scott. If you can get the formula from Franz so I can put it into the archive, I would owe you Fenton's for life. (Not really, now that I think about it. But I will buy you a Zachary's and a Fenton's sundae.) And also, I had originally used Chris Adams, since his last name starts with A and is always at the top of the drop-down list. But then I noticed I wasn't doing him a favor by posting his stats in detail from such a horrible month. So I figure I can put up with a little more boasting from you about how "great" you are and all that bullshit. But seriously, .658 is fucking incredible. Way to go. And notice you also picked up 13 RBI. That's huge, man. Keep it up, and I can stop worrying about Paul getting to me and focus on staying ahead of you instead.
BTW, thanks for the encouragement, Pat. I need to know my efforts are appreciated sometimes so I don't bash my head through the monitor when I hit a roadblock.
BTW, thanks for the encouragement, Pat. I need to know my efforts are appreciated sometimes so I don't bash my head through the monitor when I hit a roadblock.
I intended to write something to remind everybody of my superior prowess.
Hey Nick,
It sounds like you're doing some great work on this. I am/was a programmer and have worked with databases quite a bit and would be happy to offer you assistance if you'd like although you seem to be basically on the right track. Here are a few observations, suggestions and excuse me if they're obvious to you already:
1. 10,000 records is really nothing for a database running on any reasonably powered server. Start worrying about querying speed when the record count gets into the millions. (It is a real database such as MySQL, right?)
2. Store only the minimum data needed and no redundant data in each record. (All stats will be calculated in code on the server.) I.e. don't have a field for "hit" and another field for "1B". This is a redundancy which will cause confusion if the data is entered inconsistently. A hit has to be either a "1B", "2B", "3B" or "HR". If all 4 of these fields are empty then check your "ROE" or "FC", etc. fields to find out which non-hit type the at bat was. (Again, I'd be happy to help you with the database design which is the most important part--the foundation--of a project like this.)
Well, I could go on about database design. For instance, it should be a relational database. But I won't go there unless you ask. Again, excuse me if I've stated the obvious.
This is going to be fantastic. (Although don't display my stats too prominently...yet!!)
It sounds like you're doing some great work on this. I am/was a programmer and have worked with databases quite a bit and would be happy to offer you assistance if you'd like although you seem to be basically on the right track. Here are a few observations, suggestions and excuse me if they're obvious to you already:
1. 10,000 records is really nothing for a database running on any reasonably powered server. Start worrying about querying speed when the record count gets into the millions. (It is a real database such as MySQL, right?)
2. Store only the minimum data needed and no redundant data in each record. (All stats will be calculated in code on the server.) I.e. don't have a field for "hit" and another field for "1B". This is a redundancy which will cause confusion if the data is entered inconsistently. A hit has to be either a "1B", "2B", "3B" or "HR". If all 4 of these fields are empty then check your "ROE" or "FC", etc. fields to find out which non-hit type the at bat was. (Again, I'd be happy to help you with the database design which is the most important part--the foundation--of a project like this.)
Well, I could go on about database design. For instance, it should be a relational database. But I won't go there unless you ask. Again, excuse me if I've stated the obvious.
This is going to be fantastic. (Although don't display my stats too prominently...yet!!)
Baseball is 90% mental, the other half is physical --Yogi Berra
You should take Dave up on the database design offer... that stuff is complex.
"forum statistics (# of posts and status) will be the most difficult"
That stuff is all in a mySQL table on this site... we can probably find out how to link to it pretty easily (we just have to assign the right user to the right person).
"forum statistics (# of posts and status) will be the most difficult"
That stuff is all in a mySQL table on this site... we can probably find out how to link to it pretty easily (we just have to assign the right user to the right person).