-
Notifications
You must be signed in to change notification settings - Fork 19
Python application for generating pseudo-random data
License
chris1610/barnum-proj
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
What is Barnum? =============== Barnum is a python-based application for quickly and easily creating pseudo-random data typically used for application testing. Why did you create Barnum? ========================== I am developing a shopping cart application in Django and realized that I needed a bunch of data to simulate the store's behavior under somewhat normal production usage. I got tired of always trying to think of names and addresses for customers and so decided to automate the process a little bit. Such was born Barnum. Why is Barnum unique? ===================== I was able to find some online systems for generating large amounts of test data. I could not find any application that had the breadth of data generation capabilities nor the ability to easily interface with Django in the way I wanted to. One of the most unique aspects of Barnum is that the data is what I'll call "plausible." For example, here's an example "identity" randomly generated from Barnum - Sid Seymour 10 Kimbrough Grove Drive Arthur ND, 58006 (701)642-6471 Who works at: Network Hardware Co as a Personnel Clerk Senior You should notice a couple of things about this data. - There's a realistic first and last name - The street names are also plausible - Arthur, ND is a real city and the zip code is 58006 - 701 is an area code used for North Dakota - The fictional company is somewhat reasonable. - The job position also makes sense. Why not just use Random to create strings of letters? ===================================================== Well, I find that when testing applications, if it's just a random string of numbers of letters, it gets hard to tell if something is out of place or "looks wrong." If you'd like to just generate totally random information, then you probably don't need Barnum! What type of information does Barnum generate? ============================================== Here's a list of types of dummy data Barnum can create: - First name and/or last name in either gender - Job title - Phone number - Street number and name - Zip code plus city & state - Company name - Credit card number and type (with valid checksum) - Dates - Email addresses - Sample password - Words (latin) - Sentences and/or paragraphs of random latin words How do I use it? ================ The gen_data.py script is the primary showcase for how to create random data using Barnum. If you run it from the command line: python gen_data.py You'll see some sample data output. If you'd like to call it from another script, here's an example or two from the interpreter: Python 2.4.2 (#1, Feb 9 2006, 05:29:30) [GCC 3.4.4 (Gentoo 3.4.4-r1, ssp-3.4.4-1.0, pie-8.7.8)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import barnum.gen_data as gen_data >>> gen_data.create_name() ('Danilo', 'Rendon') >>> gen_data.create_name() ('Melodie', 'Kraft') >>> gen_data.create_name() ('Laverne', 'Hopson') >>> gen_data.create_city_state_zip() ('36475', 'Repton', 'AL') >>> gen_data.create_city_state_zip() ('01090', 'West Springfield', 'MA') >>> gen_data.create_phone() '(907)339-3308' >>> gen_data.create_phone('38138') '(901)606-5635' >>> gen_data.create_sentence() 'Delenitaugue iriure zzril euismod dolore vulputate iriuredolor iriure eu.' >>> gen_data.create_sentence() 'Consequatvel in blandit praesent veniam in ex illum vulputate feugait molestie.' >>> gen_data.cc_number() ('visa', ['4532837148746906']) >>> gen_data.cc_number() ('mastercard', ['5417967544412568']) You can see that it should be trivial to incorporate this data into any python script. The possibilities of creating CSV's, raw SQL, Python Objects, etc are practically endless! Where does the data come from? ============================== I pulled sample data and existing scripts from a bunch of different sources. - The names are from 1990 US Census data https://www.census.gov/genealogy/names/names_files.html - The street names are from real us streets in a few locales. - Company names are randomly generated by me. - Job Titles were taken from another census site that I can't seem to find now. - Zip Codes from https://www.cfdynamics.com/cfdynamics/zipbase/index.cfm - Random latin text came from https://www.4guysfromrolla.com/webtech/052800-1.shtml - Credit Card generator is from Graham King - https://www.darkcoding.net/index.php/credit-card-numbers/ - Password generator is from Pradeep Kishore Gowda via the Python Cookbook How can I add more data? ======================== If all you'd like to do is add some more seed data to an existing source, edit the appropriate file in the source-data directory and execute the convert_data.py script to create a new pickle file. How can I contribute? ===================== Just ask. I can't forsee this script needing it's own mailing list so right now, use the ticket system on google code to submit a ticket with your suggestion/patch. Why is this so US focused? ========================== I needed info for the US only. I had access to this data and knew what I wanted. If you would like to add other countries or info, feel free to contribute! Can this be used for evil? ========================== Ummm. Probably not. All of the data is random. The credit card numbers conform to the Luhn 10 checksum formula but are not necessarily valid numbers. Even if they were, you would need to know the real name, address and phone number before you could do anything illegal with the data. I think we're all pretty safe. Where did this name come from? ============================== Choosing names for projects is kind of fun but kind of a hassle. There needs to be a name but it can't be anything too stupid. I started off thinking of an acronym and ended up with PT ("Python Testing") and immediately thought of P.T. Barnum. I really liked the name because I was using this for Satchmo and project made in Django. Single word names seemed cool. Also, I like the fact that P.T. Barnum was really a master at making people think something was real that wasn't. Which is exactly what this little script does. Why is it licensed under the GPL? ================================= I use a couple of other python scripts that were licensed under the GPL. So, I figured it was best to just release under the GPL. If you would like another license arrangement, let me know and I'll see if there's something we can do.
About
Python application for generating pseudo-random data
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published