Zurück zum Blog
    More

    Generate dummy data with Faker with Python

    A guide on how to use Faker package in Python to populate a dummy dataset.

    fassil-yehuala
    ·7 min read
    Generate dummy data with Faker with Python

    What is dummy data?

    Dummy data is fictitious information that is generated or used to simulate real data in various contexts such as testing, development, and training. This type of data is designed to mimic the characteristics and structure of actual data, without containing any meaningful or sensitive information. Dummy data is commonly used to:

    1. $1
    2. $1
    3. $1
    4. $1

    Why use dummy data?

    Fictitious data is required for a variety of purposes. Whether for testing, anonymising sensitive data, or adding “noise” to a training dataset, it can be beneficial to have access to a fake dataset in the same shape as the real data. You may also need to generate dummy data for testing and operational purposes. That is, to test what you have developed and how your code reacts to different types of input.

    However, finding the necessary data in a specific format we want can be difficult. So, where do you get dummy data for your own application? There is an elegant solution to this problem in the form of the Faker package. With Python, you can use the Faker package to generate data according to your data needs. Faker is an open source library designed to generate different types of synthetic data.

    How to populate database with dummy data?

    In this article, we’ll take a quick tour of Faker package in Python and how to use them to create a dummy dataset.

    The Faker library in Python is a popular tool for generating fake data for a variety of uses, such as testing, development, and training machine learning models. It allows users to create dummy data that mimics real-world data in a flexible and customizable manner. Faker can generate data in various formats, including names, addresses, dates, text, and more.

    Key features of the Faker library include:

    1. $1
    2. $1
    3. $1
    4. $1

    How to use the Faker library in Python

    Installation and Use

    Faker allows you to generate random data in dozens of languages. Since Faker is an open library for the community, it is constantly evolving. Providers –generators specific to a certain type of data– are added regularly by the community. Let’s take a look at how to use it in terms of codes.

    The installation can be done via pip with the command:

    pip install Faker

    With the following two lines of code you can initialise Faker. While the first line imports the generator (Class Faker), the second one is used to initialise the generator with English as a default language parameter. If you want to initialise Faker in other languages you need to specify the language parameter (eg. Faker(“de_DE”) for German).

    from faker import Faker

    fake = Faker()

    Generating Fakes

    Now, you are ready to generate whatever data you want. The generated data is called fake. As the name suggests, it is fake data that is randomly generated. Its purpose is to act as a substitute or placeholder for the actual data. A fake is generated when the method corresponding to the data type is called.

    The name() method can be used to create a full name. Let’s jump into the code and check how these methods work.

    for i in range(5): # Returns full names

    print(fake.name())

    >>>Samantha Fernandez

    >>>Denise Barnes

    >>>Jason Strong

    >>>Edward Burton

    >>>Tonya Rocha

    However, if you want the only first or last name instead, you can use the first_name() and last_name() methods.

    ``

    fake.first_name() # Returns a first name

    >>>Samuel

    Note that, each call to these methods will generate a random name.

    fake.last_name() # Returns last name

    >>>Espinoza

    To create addresses, you can use the address().

    fake.address() # Returns an address

    >>>3066 Mary Hills Suite 873

    >>>Lake Stevenport, NV 32423

    Moreover, the fake.sentence() method will return a string containing a random sentence, whereas faker.text() will return a randomly generated text.

    fake.sentence() # Returns a random sentence

    >>>Never across staff attention within.

    As can be seen below faker.text() generates a random paragraph.

    fake.text() # Returns a random text

    >>>From send bed. Could country reveal send role. Guy involve issue picture get election. Sure do memory kitchen candidate fish defense. Try paper forward to build gas human.

    Let’s say you want to generate a list of 5 email addresses. Each time, the below code generates 5 random names.

    for i in range(5): # generates 5 random emails

    print(fake.email())

    >>>garciaeric@example.com

    >>>logan01@example.net

    >>>contrerasaustin@example.org

    >>>rpreston@example.org

    >>>brandy16@example.net

    But when the data gets bigger, there is a chance that you would get the same email address more than once. So, to create unique dummy data using the Faker package, you can use the .unique property of the generator.

    for i in range(10): # generates 5 unique random emails

    print(fake.unique.email())

    >>>hughesbrian@example.org

    >>>raymondchapman@example.org

    >>>vicki25@example.com

    >>>munozzachary@example.net

    >>>karen44@example.org

    Each time the above code runs, it will generate 5 unique email addresses. This is quite helpful when you are generating data like ID, that does not need to be repeated.

    Faker also has a method to generate a dummy profile.

    fake.profile() #Returns a fake profile

    >>>{‘address’: ‘64992 Becky Stream Apt. 932\nRebeccaville, WV 34184’,

    >>>‘birthdate’: datetime.date(2000, 3, 24),

    >>>‘blood_group’: ‘O-’,

    >>>‘company’: ‘Lopez and Sons’,

    >>>‘current_location’: (Decimal(‘78.061493’), Decimal(‘-114.798399’)),

    >>>‘job’: ‘Pharmacologist’,

    >>>‘mail’: ‘rebeccahansen@yahoo.com’,

    >>>‘name’: ‘Autumn Sanchez’,

    >>>‘residence’: ‘8702 Matthew Circles Apt. 938\nDickersonfurt, WA 82226’,

    >>>‘sex’: ‘F’,

    >>>‘ssn’: ‘534–29–2074’,

    >>>‘username’: ‘llowe’,

    >>>‘website’: [‘http://hawkins.com/', ‘https://wolf.com/']}

    So far we have used forger generator properties like name(), first_name(), last_name(), email(), etc. There are also many such properties packaged in ‘Providers’. Some are standard providers, while others are providers developed by the community.

    Standard Providers

    There are many standard providers like address, currency, credit_card, date_time, internet, geo, person, profile, bank etc. that help create the relevant dummy data. More information on the full list of standard providers and their properties can be found here.

    Let’s have a look at some examples from faker.providers.address

    for i in range(5): # Returns 5 country names

    print(fake.country())

    >>>Luxembourg

    >>>Vietnam

    >>>Tonga

    >>>Mozambique

    >>>Austria

    You can also get country codes.

    for i in range(5): # Returns 5 country codes

    print(fake.country_code())

    >>>ES

    >>>RO

    >>>MH

    >>>MR

    >>>CL

    As stated before, the default language is English and the default country is set to be the United States.

    fake.current_country() #Returns current country

    >>>United States

    When the locale is changed the output of current_country(), current_country_code(), address(), etc will be changed as follows:

    Fake = Fake(“de_DE”)

    fake.current_country_code() #Returns current country code

    >>>DE

    Community Providers

    There are many community providers like Credit Score, Air Travel, Vehicle, Music, etc. You can also create your provider and add it to the Faker package. More information on the full list of community providers and their properties can be found here.

    Let’s have a look at some examples from Faker_music. Before you start generating fake music data using community providers you need to install the package using pip.

    pip install faker_music

    And then you need to add the provider to your Faker instance:

    from faker_music import MusicProvider

    fake = Faker()

    fake.add_provider(MusicProvider)

    Now you set to generate fake music data:

    ``

    for i in range (5): #Returns music genres

    print(fake.music_genre())

    >>>Rock

    >>>World

    >>>Classical

    >>>Pop

    >>>Vocal

    Localised Providers

    You can create the localised dummy data by providing the required locale as an argument to the dummy generator. It also supports multiple locales. In that case, all locales must be provided in the Python list data type like in the example shown below.

    fake = Faker([‘De_DE’, ‘fr_FR’, ‘ja_JP’])

    for _ in range(10):

    print(fake.name())

    >>>山本 陽子

    >>>Lina Weinhold

    >>>Dorothee Huhn

    >>>Anika Henck-Hörle

    >>>Ilonka Drubin MBA.

    >>>Philomena Rohleder

    >>>高橋 裕太

    >>>Jacques Dumont Le Perrin

    >>>斎藤 治

    >>>小林 淳

    The default locale is ‘en_US’, i.e. US English. Let’s code to create 5 addresses in Germany.

    ``

    fake=Faker(“de_DE”) # Returns German addresses

    for i in range(3):

    print(fake.address())

    >>>Rafael-Mende-Platz 04

    >>>04196 Steinfurt

    >>>Resi-Atzler-Allee 843

    >>>96746 Coburg

    >>>Scheibeplatz 5/1

    >>>52115 Stollberg

    ``

    fake=Faker(“de_DE”) #Returns German federal states

    for i in range(5):

    print(fake.administrative_unit())

    >>>Bremen

    >>>Hessen

    >>>Rheinland-Pfalz

    >>>Nordrhein-Westfalen

    >>>Bayern

    Generating a Dummy Dataset

    We will create a fictitious dataset of 100 people with attributes such as id, name, email, address, date of birth, place of birth, etc. We will use the standard provider ‘Profiles’ to create this data and use Pandas Dataframes to save that.

    ``

    #Import packages

    from faker import Faker

    from faker_music import MusicProvider

    import pandas as pd

    #Declare faker object

    fake = Faker()

    #Add music faker

    fake.add_provider(MusicProvider)

    #Define function to generate fake data and store into a JSON file

    def generate_dummy_data(records):

    data={}

    #Iterate the loop and generate fake data

    for i in range(0, records):

    data[i]={}

    data[i][“id”] = fake.unique.random_number(8)

    data[i][“name”] = fake.name()

    data[i][“email_address”]= fake.unique.email()

    data[i][“address”]= fake.address()

    data[i][“date_of_birth”]= fake.date_between(“-67y”, “-18y”)

    data[i][“country_of_birth”]= fake.country()

    data[i][“member_since”]= fake.date_time_between(“-2y”,“now”)

    return data

    #Call the function to generate fake data and store into a json file

    fake_data = generate_dummy_data(100)

    # Convert JSON to DataFrame

    fake_data = pd.DataFrame(fake_data)

    fake_data = fake_data.T

    fake_data

    Bildschirmfoto 2022-08-18 um 20.25.09.png

    Conclusion

    Faker is a Python library for generating fake data. It can be very practical in several cases. There are several alternatives to Faker but it remains the most well-known option in Python. It is popular because it is the easiest way to create fake records that look real. You can use it to create loops of dummy data –with simple steps it generates a large number of data in seconds.

    I hope you enjoyed this article. If you have any questions leave a comment below.

    Mehr von datadice

    Upgrade Your ls Command to eza
    More

    Upgrade Your ls Command to eza

    A Guide to Adding Flair to Your Terminal. By Sanu Maharjan

    sanu-maharjan
    3 min
    How to Dockerize your Python Script
    More

    How to Dockerize your Python Script

    From Script to Container. By Sanu Maharjan

    sanu-maharjan
    6 min

    Kommentare

    Kommentar hinterlassen