FB Messenger Image Scraper

Want to bulk download Images from Facebook Messenger in high resolution? Read on, or scroll right through to the code. This simple Image Scraper for Messenger is available to all.

The Story

Well, it’s been quite a long time since my last blog post. Life has been busy, with full of ups and downs. But I digress, that’s not how you stumbled upon this post most likely.

So, I’ll just cut right to the chase, and right down to the story.

A few weeks ago, an old friend reached out and asked if I knew how to download images from Facebook Messenger. His girlfriend and himself had recently attended a wedding ceremony. That evening, she had exchanged hundreds of photos with him. A few days later, he encountered this very problem.

At the time of writing this post, Messenger does not make it easy to bulk download high-resolution images shared over each chat.

I was interested. Having not done any Python programming in a while, I figured I’d look into it, once I had some time. Several weeks later, and here we are.

The (brief) Research

Given that I wanted to do this in Python, I needed to find a Python API that allowed me to easily interact with Messenger.

A quick search yielded a few results, but I was quickly seduced by fbchat.

The Implementation

After spending a bit of time re-configuring my Python environment (Python 3.7, pip, PyCharm, venv, etc.) and creating a new repository on GitHub, I got to work.

Thankfully Python pretty much reads itself, so I encourage you to simply read through the following if you want to understand how it works. Building the Image Scraper script was, overall, pretty easy.

GitHub (most up-to-date): https://github.com/cbdelavenne/fb-messenger-media-scraper

import os
import requests
import time
import uuid
import configparser
import datetime

from fbchat import Client, ImageAttachment
from fbchat import FBchatException
from pathlib import Path

politeness_index = 0.5  # ;)
epoch = datetime.datetime(1970, 1, 1)


def download_file_from_url(url, target_path):
    """
    Download image from a given URL to a specified target path.

    :param url: URL of file to download
    :param target_path: Local target path to save the file
    :type url: str
    :type target_path: str
    """
    if url is not None:
        r = requests.get(url)
        with open(target_path, 'wb') as f:
            print('\tDownloading image to {path}'.format(path=target_path))
            f.write(r.content)


def convert_date_to_unix_ms(date, as_int=True):
    """
    Convert a given date string to epoch (int in milliseconds)

    :param date: Date string (preferred format %Y-%m-%d)
    :param as_int: Return unix timestamp as an integer value, instead of a float
    :type date: str
    :type as_int: int
    :return: int
    """
    try:
        dt = datetime.datetime.strptime(date, '%Y-%m-%d')
        res = ((dt - epoch).total_seconds() * 1000)  # convert to milliseconds

        return int(res) if as_int else res
    except ValueError:
        return None


if __name__ == '__main__':
    config_path = Path('.') / 'config.ini'
    if os.path.exists(config_path) is False:
        raise Exception("Please create config.ini under this script's current directory")

    # Load config file
    config = configparser.ConfigParser()
    config.read(config_path)

    download_path = config.get('Download', 'path')
    if os.path.exists(download_path) is False:
        raise Exception("The path specified in download_path does not exist ({path}). Please specify a valid path in "
                        "config.ini".format(path=download_path))

    # Initialize FB Client
    fb_email = config.get('Credentials', 'email')
    fb_pw = config.get('Credentials', 'password')
    fb_client = Client(fb_email, fb_pw)

    # Search for latest threads
    thread_search_limit = int(config.get('Threads', 'search_limit'))
    thread_search_before = convert_date_to_unix_ms(config.get('Threads', 'before_date'))

    if thread_search_before is not None:
        threads = fb_client.fetchThreadList(limit=thread_search_limit, before=thread_search_before)
    else:
        threads = fb_client.fetchThreadList(limit=thread_search_limit)

    # Find correct thread for given user URL
    my_thread = None
    for thread in threads:
        if thread.url == config.get('Friend', 'url'):
            my_thread = thread

    # Get Messages for my_thread
    if my_thread is not None:
        message_search_limit = int(config.get('Messages', 'search_limit'))
        message_search_before = convert_date_to_unix_ms(config.get('Messages', 'before_date'))

        if message_search_before is not None:
            messages = fb_client.fetchThreadMessages(my_thread.uid, limit=message_search_limit,
                                                     before=message_search_before)
        else:
            messages = fb_client.fetchThreadMessages(my_thread.uid, limit=message_search_limit)

        # Extract Image attachments' full-sized image signed URLs (along with their original file extension)
        full_images = []

        sender_id = None
        if config.getboolean('Media', 'sender_only'):
            sender_id = my_thread.uid

        for message in messages:
            if len(message.attachments) > 0:
                if (sender_id is None) or (sender_id == message.author):
                    for attachment in message.attachments:
                        if isinstance(attachment, ImageAttachment):
                            try:
                                full_images.append({
                                    'extension': attachment.original_extension,
                                    'full_url': fb_client.fetchImageUrl(attachment.uid)
                                })
                            except FBchatException:
                                pass  # ignore errors

        # Download Full Images
        if len(full_images) > 0:
            images_count = len(full_images)

            print('Attempting to download {count} images...................\n'.format(count=images_count))

            for full_image in full_images:
                friend_name = str.lower(my_thread.name).replace(' ', '_')
                file_uid = str(uuid.uuid4())
                file_ext = full_image['extension']
                img_url = full_image['full_url']

                image_path = ''.join([download_path, '\\', 'fb-image-', friend_name, '-', file_uid, '.', file_ext])

                download_file_from_url(img_url, image_path)

                # Sleep half a second between file downloads to avoid getting flagged as a bot
                time.sleep(politeness_index)
        else:
            print('No images to download in the last {count} messages'.format(count=message_search_limit))
    else:
        print('Thread not found for URL provided')

Sample output:

Logging in {EMAIL}...
Login of {EMAIL} successful.
Attempting to download 2 images...................

 Downloading image to c:\\Users\\{USER}\\Downloads\fb-image-{FRIEND_NAME}-50345061-3ff9-4f0a-a6f4-1988a4259a62.png
 Downloading image to c:\\Users\\{USER}\\Downloads\fb-image-{FRIEND_NAME}-9deb01e8-eae2-4915-a28f-ac61d85bea2e.png

Some Caveats

The current version of the script comes with a few caveats:

  1. Finding the chat you’re looking for isn’t the easiest. First you need your friend’s profile page URL. Then, the fbchat API seems to be restricted to searching through 20 threads at a time.
  2. Finding the messages you’re looking for isn’t the easiest either. I’m also not sure what the query limit is for messages.
  3. The script could use some extra error handling for users that aren’t necessarily comfortable editing this code themselves.

Coursera’s Game Design and Development (Part 2)

Note: This is a follow up to my previous post on Coursera’s Game Design and Development Specialization. If you’d like to read the first part, check it out here.

Third Course Impressions: Business of Games and Entrepreneurship

The third course in the Game Design and Development specialization, taught by Professor Casey O’Donnell (from Michigan State University), was particularly informative.

As with previous Coursera courses in this series, the course was spread out over four weeks. Each week focused on introducing specific sub-topics related to business and entrepreneurship. Professor O’Donnell addressed serious topics in an informative and concise manner. The course structure and flow are better than they were in the second course of the specialization (also taught by O’Donnell).

Topics covered include:

  • Various funding models used in the game industry historically and today;
  • A basic introduction to legal issues common in the game industry:
    • Intellectual property
    • Copyright
    • Patents and trade secrets
  • Teamwork and working with people;
  • Project management tools and techniques (with a brief introduction to the SCRUM methodology);
  • The qualities, styles, tasks and foundations of leadership;
  • Pitching yourself, your game idea, and demo your game;
  • Launching a business and choosing the right business partners;
  • Working for hire, communicating and interacting with clients.

Over the course’s four weeks, the assignments were:

  • A SWOT analysis of an existing game franchise to evaluate its Strengths, Weaknesses, Opportunities and Threats (using a provided template document);
  • A production plan document to schedule, budget and determine the key personnel necessary for creating a game of your own design (using a provided template document);
  • A video to present your game idea to other students enrolled in the course (or alternatively a video pitching yourself and your skill set);
  • A competitive analysis document that situates your game versus its competition and the overall market for similar titles.

For the video assignment, I used Movavi Video Suite 15 to record my pitch. I also added snippets of gameplay taken from other game titles to help illustrate my game idea. Surprisingly, the Movavi software was simple and efficient to use, but I ended up purchasing a license to remove the software’s watermark from my video.

Cyborg Games Studio: Opening to my game idea pitch
Opening to my game idea pitch…

Throughout the course, links to external resources and videos were provided to get more in-depth information on some of the topics mentioned above. For instance, there are numerous interesting blog posts published on the Gamasutra website that are worth reading through. Another interesting document shared in the course was VALVE’s handbook for new employees.

Professor O’Connell also recommended the book “The Art of the Start 2.0” by Guy Kawasaki. Additionally, Guy Kawasaki has a number of lectures on YouTube. Also recommended, “The Lean Startup” by Eric Ries – although I have yet to obtain it.

The course’s additional reading has led me to seek out more reading material. A good resource I’ve found for recommendations on entrepreneurship books and websites was Y Combinator’s Startup Library. I’ve since been reading “How to Win Friends & Influence People” by Dale Carnegie.

"The Art of the Start 2.0" by Guy Kawasaki recommended in the Game Design and Development specialization "How to Win Friends & Influence People" by Dale Carnegie recommended in Y Combinator Startup Library "The Lean Startup" by Eric Ries recommended in the Game Design and Development specialization


To be continued…

That’s it for part 2 of 4! Overall, I’d recommend the Business of Games and Entrepreneurship course. It conveys useful and clear information, and the assignments are insightful. There is one more course in the specialization and a capstone project to complete it, so stay tuned for more blog posts!

In the mean time, I am also getting started on Udemy‘s Unreal Engine 4 course and will probably make a post on that once I get the chance.

Coursera’s Game Design and Development (Part 1)

A few months ago, I acquired a few eBooks from the O’Reilly website. This led me to purchase a book recently released titled “Building a Game with Unity and Blender” by Lee Zhi Eng. Reading through Zhi Eng’s book, I look for an online course on game design and development.

Having used the Coursera and Treehouse learning platforms in the past, I signed up to Coursera’s Game Design and Development Specialization. It is composed of five courses focusing on different aspects of game development and design with the last course being a capstone project (more details on the specialization here).


First Course Impressions: Introduction to Game Development

The first course, Introduction to Game Development, was, as I suspected, there to motivate me to learn more. It’s a basic introduction to Unity 3D and works the students through a number of hands-on exercises to work with the game engine.

During this course, I got to make a few basic systems and game iterations. The first project was a simple 3D system of our Solar System (not to scale) with a mini-map. Clicking on any planet either via the main camera view or via the mini-map would toggle the camera to follow that planet instead of staying centered on the Sun.

Game Design and Development: Solar System
Solar System

The second project was a simple game with a set camera angle that followed a ball on a platform. Using the ball, the player has to collect a number of coins while avoiding the enemy cubes falling out of the sky and onto the platform.

Game Design and Development: Roller Madness
Roller Madness

The third project was a basic shooter in a Tron like setting. The player has a set amount of time to shoot the right boxes to gain points; green boxes give points, white boxes grant additional time, and yellow boxes take away time. There’s a number of boxes already present when a level starts, but most boxes are generated dynamically; much like the coins and enemies were in the previous projects.

Game Design and Development: Box Shooter
Box Shooter

All three projects used mostly prototype and standard Unity assets. Other assets used in the project (such as certain textures and sounds) were provided via Coursera.


Second Course Impressions: Principles of Game Design

The second course, Principles of Game Design, was focused on coming up with a game idea and documenting a number of things to flesh out this game idea. I decided to work on one continuous game idea, titled ACCURSED, throughout this course and submitted the following documents:

  • High Concept Document
  • Story Bible
  • Game Design Document
  • Non-digital Game Mechanic Prototype

ACCURSED is a project I intend to further work on. Therefore, I do not feel inclined to upload the documents listed above at the moment, except for an excerpt of the Story Bible document:

ACCURSED is a futuristic cooperative first person shooter that takes during a space expedition mission to Saturn. Both players are cosmonaut researchers contracted by a private military corporation that operates on one of Saturn’s moons. This PMC is known as CAVAL. The players must ultimately reach the CAVAL headquarters and the established colonies of Saturn on their return journey back from Jupiter.

Each chapter takes place on a satellite of Saturn, starting with Iapetus.

On each satellite, there are stations built specifically to accommodate these long distance journeys. Each station is an opportunity for the players to refuel their ship and stock up on food and other resources. However, re-fueling and recharging the main generators of the ship takes a significant amount of time, usually up to 24 hours.

During this time, the players will have to use their tools and arsenal to survive together, face imminent environmental threats and enemies, and solve puzzles.

This course definitely provides valuable insight into how a game’s story, universe, systems, mechanics and gameplay all come together. It presents a model to follow in order to balance these aspects of the game to create an experience that is ultimately not only fun, but also engaging and exciting.

To be continued…

I’m enrolling in the third course now, and I’m interested to see what I’ll learn (surely a lot). I intend to make a new post to follow up on this one once I’ve completed the third and fourth courses.

Get any GitHub user’s email (Python)

Here’s a silly little Python script (github_email.py) I wrote to get a GitHub user’s email address based on their public event history.

It’s not bulletproof, but generally works:

from __future__ import print_function

import requests
import sys

from optparse import OptionParser


parser = OptionParser()
parser.add_option('-u', '--username', action='store', type='string', dest='username')
(options, args) = parser.parse_args()

gh_api_url = 'https://api.github.com/users/{username}/events/public'.format(username=options.username)

r = requests.get(gh_api_url)
r.raise_for_status()

gh_public_events = r.json()

if isinstance(gh_public_events, dict):
    if gh_public_events.get('message') and gh_public_events.get('message') == 'Not Found':
        print('User was not found!')
        sys.exit(0)

fullname = 'N/A'
email = 'N/A'

for event in gh_public_events:
    if event.get('payload').get('commits'):
        commits = event.get('payload').get('commits')
        for commit in commits:
            if commit.get('author'):
                fullname = commit.get('author').get('name')
                email = commit.get('author').get('email')
                break

print('Full Name: {name}'.format(name=fullname))
print('email: {email}'.format(email=email))

You can simply save this script as github_email.py (for example).

Also, do note, it uses Python 3, a deprecated standard library (optparse has been replaced by argparse) and one third party library, requests.

Usage Output

Usage: github_email.py [options]

Options:
  -h, --help            show this help message and exit
  -u USERNAME, --username=USERNAME

Expected Results

If the user exists and has public events:

$ github_email.py -u jbarnette
Full Name: John Barnette
email: jbarnette@github.com

If the user exists, but has no public events, you’ll get:

$ github_email.py -u test
Full Name: N/A
email: N/A

If the user does not exist, you’ll get:

$ github_email.py -u iou18y23123
User was not found!

And that’s how you get (almost) any GitHub user’s email!
It’s useful if you want to reach out to another contributor via email.

Disclaimer

The script is inspired from this blog post: How to Find Almost Any GitHub User’s Email Address

If you rather do this process manually, follow the instructions in the link right above.

github-email-blog