
dbling: The Chrome OS Forensic Tool¶
dbling is a tool for performing forensics in Chrome OS.
Please view the latest version of the documentation on Read the Docs and the latest version of the code on GitHub.
Publication¶
This work is based on the following publication:
- Mike Mabey, Adam Doupé, Ziming Zhao, and Gail-Joon Ahn. “dbling: Identifying Extensions Installed on Encrypted Web Thin Clients”. In: Digital Investigation (2016). The Proceedings of the Sixteenth Annual DFRWS Conference. URL: http://www.sciencedirect.com/science/article/pii/S174228761630038X
Installation¶
Coming soon!
dbling Components¶
dbling is divided into the following main components:
Crawler¶
The Crawler finds and downloads the list of the currently-available extensions on the Chrome Web Store, determines which extensions are at a version that has already been downloaded, downloads those that have not yet been downloaded, and adds information on the newly downloaded extensions to the database.
The code for the Crawler is under crawl: The Chrome Web Store Crawler.
Template Generator¶
The Template Generator runs concurrently with the Crawler. For each new extension downloaded by the Crawler, the Template Generator calculates the centroid of the extension and stores it in the database. The Template Generator does not run inside Chrome or Chrome OS, and so it does not use the same mechanisms for unpacking and installing that Chrome does natively. Instead, the primary function of the Template Generator is to mimic as closely as possible the Chrome’s functions as they pertain to unpacking and installing extensions.
The code for the Template Generator is implemented alongside the Crawler, but the main function that creates templates
is calc_centroid()
.
Profiler¶
Coming soon!
MERL Exporter¶
Coming soon!
gripper¶
Coming soon!
License¶
dbling is licensed under the MIT License.
dbling API¶
crawl
: The Chrome Web Store Crawler¶
tasks¶
Tasks for Celery workers.
Beat Tasks¶
Beat tasks are those that are run on a periodic basis, depending on the configuration in celeryconfig.py
or any cron
jobs setup in the Ansible playbooks. Beat tasks only initiate the workflow by creating the jobs, they don’t actually do
the work for each task.
Entry Points¶
Entry points are where an actual worker begins its work. A single task corresponds to a specific CRX file. The task function dictates what operations are performed on the CRX. Each operation is represented by a specific worker function (as described below).
Worker Functions¶
Worker functions each represent a discrete action to be taken on a CRX file.
Helper Tasks and Functions¶
These functions provide additional functionality that don’t fit in any of the above categories.
db_iface¶
webstore_iface¶
Chrome Web Store interface for dbling.
-
exception
crawl.webstore_iface.
ListDownloadFailedError
(*args, **kwargs)[source]¶ Raised when the list download fails.
Initialize RequestException with
request
andresponse
objects.
Raised when an extension isn’t downloadable.
-
exception
crawl.webstore_iface.
BadDownloadURL
[source]¶ Raised when the ID is valid but we can’t download the extension.
-
exception
crawl.webstore_iface.
VersionExtractError
[source]¶ Raised when extracting the version number from the URL fails.
-
class
crawl.webstore_iface.
DownloadCRXList
(ext_url, *, return_count=False, session=None)[source]¶ Generate list of extension IDs downloaded from Google.
As a generator, this is designed to be used in a
for
loop. For example:>>> crx_list = DownloadCRXList(download_url) >>> for crx_id in crx_list: ... print(crx_id)
The list of CRXs will be downloaded just prior to when the first item is generated. In other words, instantiating this class doesn’t start the download, iterating over the instance starts the download. This is significant given that downloading the list is quite time consuming.
Parameters: - ext_url (str) – Specially crafted URL that will let us download the list of extensions.
- return_count (bool) – When True, will return a tuple of the form:
(crx_id, job_number)
, wherejob_number
is the index of the ID plus 1. This way, the job number of the last ID returned will be the same aslen(DownloadCRXList)
. - session (requests.Session) – Session object to use when downloading
the list. If None, a new
requests.Session
object is created.
-
download_ids
()[source]¶ Starting point for downloading all CRX IDs.
This function actually creates an event loop and starts the downloads asynchronously.
Return type: None
-
crawl.webstore_iface.
save_crx
(crx_obj, download_url, save_path=None, session=None)[source]¶ Download the CRX, save in the
save_path
directory.The saved file will have the format:
<extension ID>_<version>.crx
If
save_path
isn’t given, this will default to a directory called “downloads” in the CWD.Adds the following keys to
crx_obj
:version
: Version number of the extension, as obtained from the final URL of the download. This may differ from the version listed in the extension’s manifest.filename
: The basename of the CRX file (not the full path)full_path
: The location (full path) of the downloaded CRX file
Parameters: - crx_obj (munch.Munch) – Previously collected information about the extension.
- download_url (str) – The URL template that already contains the correct
Chrome version information and
{}
where the ID goes. - save_path (str or None) – Directory where the CRX should be saved.
- session (requests.Session or None) – Optional
Session
object to use for HTTP requests.
Returns: Updated version of
crx_obj
withversion
,filename
, andfull_path
information added. If the download wasn’t successful, not all of these may have been added, depending on when it failed.Return type: munch.Munch
merl
: Matching Extension Ranking List Files¶
Google API Acquisition Tool¶
Access many APIs and acquire as much user identifying information as possible.
Getting Started¶
Gripper depends on an active G Suite account, which requires a domain name for your organization (you can create a new domain while creating your G Suite account if you don’t already have one).
Only has been tested in a Linux environment.
Install python 3 / pip3
Install google api python client
pip3 install --upgrade google-api-python-client
Prerequisites¶
Installing¶
Running¶
Command line:
Usage: gripper.py drive [options] (created | revised | comment) ...
gripper.py reports [options]
Options:
-c --cached Use a cached version of the data, if available.
-e EMAIL --email=EMAIL
The email address of a user to impersonate. This requires
domain-wide delegation to be activated. See
https://developers.google.com/admin-sdk/reports/v1/guides/delegation
for instructions.
--level=LEVEL The granularity level of the resulting heat map [default: hr]
--start=START -s START
The earliest data to collect. Can be any kind of date string,
as long as it is unambiguous (e.g. "2017"). It can even be
slang, such as "a year ago". Be aware, however, that only the
*day* of the date will be used, meaning time information will
be discarded.
--end=END -e END
The latest data to collect. Same format rules apply for this
as for --start.
--tz=TIMEZONE The timezone to convert all timestamps to before compiling.
This should be a standard timezone name. For reference, the
list that the timezone will be compared against is available
at https://github.com/newvem/pytz/blob/master/pytz/__init__.py.
If omitted, the local timezone of the computer will be used.
Note: If you start this script using ipython (recommended), you'll need to
invoke it like this:
$ ipython3 gripper.py -- [typical arguments]
The reason for this is that ipython interprets any options *before* the ``--``
as being meant for it.
Bugs or Issues¶
If you receive this error:
Failed to start a local webserver listening on either port 8080
or port 8090. Please check your firewall settings and locally
running programs that may be blocking or using those ports.
use: lsof -w -n -i tcp:8080
or lsof -w -n -i tcp:8090
respectively.
then: kill -9 PID
Or you can click the click provided in the terminal and then copy and paste the key from the webpage that is launched.
API Documentation¶
Please refer to the following documents for information on the API:
apis
¶
admin
¶-
class
google.apis.admin.
ReportsAPI
(http=None, impersonated_user_email=None, start=None, end=None, timezone=None)[source]¶ Class to interact with G Suite Admin Reports APIs.
Documentation for the Python API: - https://developers.google.com/resources/api-libraries/documentation/admin/reports_v1/python/latest/
See also: https://developers.google.com/admin-sdk/reports/v1/quickstart/python
Parameters: - http (httplib2.Http) – An Http object for sending the requests. In general, this should be left as None, which will allow for auto-adjustment of the kind of Http object to create based on whether a user’s email address is to be impersonated.
- impersonated_user_email (str) – The email address of a user to impersonate. This requires domain-wide delegation to be activated. See https://developers.google.com/admin-sdk/reports/v1/guides/delegation for instructions.
- start (str) – The earliest data to collect. Can be any kind of date string, as long as it is unambiguous (e.g. “2017”). It can even be slang, such as “a year ago”. Be aware, however, that only the day of the date will be used, meaning time information will be discarded.
- end (str) – The latest data to collect. Same format rules apply for
this as for the
start
parameter. - timezone (str) – The timezone to convert all timestamps to before compiling. This should be a standard timezone name. For reference, the list that the timezone will be compared against is available at https://github.com/newvem/pytz/blob/master/pytz/__init__.py. If omitted, the local timezone of the computer will be used.
-
activity
(user_key='all', app_name=None, **kwargs)[source]¶ Return the last 180 days of activities.
https://developers.google.com/admin-sdk/reports/v1/reference/activities/list
The
application_name
parameter specifies which events are to be retrieved. The possible values include:admin
– The Admin console application’s activity reports return account information about different types of administrator activity events.calendar
– The G Suite Calendar application’s activity reports return information about various Calendar activity events.drive
– The Google Drive application’s activity reports return information about various Google Drive activity events. The Drive activity report is only available for G Suite Business customers.groups
– The Google Groups application’s activity reports return information about various Groups activity events.gplus
– The Google+ application’s activity reports return information about various Google+ activity events.login
– The G Suite Login application’s activity reports return account information about different types of Login activity events.mobile
– The G Suite Mobile Audit activity report return information about different types of Mobile Audit activity events.rules
– The G Suite Rules activity report return information about different types of Rules activity events,.token
– The G Suite Token application’s activity reports return account information about different types of Token activity events.
Parameters: - user_key (str) – The value can be
'all'
, which returns all administrator information, or auserKey
, which represents a user’s unique G Suite profile ID or the primary email address of a person or entity. - app_name (str) – Name of application from the list above. If set
to
None
, data will be retrieved from all the applications listed above.
Returns: JSON
-
get_customer_usage_reports
(date, customer_id=False)[source]¶ Get customer usage reports.
https://developers.google.com/admin-sdk/reports/v1/reference/customerUsageReports/get
Parameters: - date –
- customer_id –
Returns: JSON
-
get_user_usage_report
(date, user_key='all')[source]¶ Get user usage report.
https://developers.google.com/admin-sdk/reports/v1/reference/userUsageReport/get
Parameters: - date –
- user_key –
Returns: JSON
-
class
google.apis.admin.
DirectoryAPI
(**kwargs)[source]¶ Class to interact with G Suite Admin Directory APIs.
Documentation for the Python API: - https://developers.google.com/resources/api-libraries/documentation/admin/directory_v1/python/latest/
See also: https://developers.google.com/admin-sdk/directory/v1/quickstart/python
-
list_chromeos_devices
(fields='*')[source]¶ List up to 100 Chrome OS devices in the organization.
Reference: https://developers.google.com/admin-sdk/directory/v1/reference/chromeosdevices/list
Parameters: fields (str) – Comma-separated list of metadata fields to request. Returns: The list of Chrome OS devices. See one of the documentation links above for the format of the return value. Return type: list
-
get_user
(user_email)[source]¶ Get information for a single user specified by their email.
https://developers.google.com/admin-sdk/directory/v1/reference/users/get
Parameters: user_email (str) – user’s email Returns: JSON
-
get_all_users
(domain_name)[source]¶ Return all users in the domain.
https://developers.google.com/admin-sdk/directory/v1/reference/users/list
Parameters: domain_name (str) – the name of the domain Returns: JSON
-
get_chrome_os_devices_properties
(device_id)[source]¶ Get data pertaining to a single ChromeOS device.
https://developers.google.com/admin-sdk/directory/v1/reference/chromeosdevices/get
Parameters: device_id (str) – unique ID for the device. Returns: JSON
-
list_customers_mobile_devices_properties
()[source]¶ Get a list of mobile devices.
https://developers.google.com/admin-sdk/directory/v1/reference/mobiledevices/list
Returns: JSON
-
get_mobile_devices_properties
(resource_id)[source]¶ Get data pertaining to a single mobile device.
https://developers.google.com/admin-sdk/directory/v1/reference/mobiledevices/get
Parameters: resource_id (str) – The unique ID the API service uses to identify the mobile device. Returns: JSON
-
suspend_user_account
(user_email)[source]¶ Suspend an user’s account.
https://developers.google.com/admin-sdk/directory/v1/reference/users/update https://developers.google.com/admin-sdk/directory/v1/guides/manage-users
Parameters: user_email (str) – Email for the user to be suspended. Returns: JSON
-
unsuspend_user_account
(user_email)[source]¶ Un-suspend a user’s account.
https://developers.google.com/admin-sdk/directory/v1/reference/users/update https://developers.google.com/admin-sdk/directory/v1/guides/manage-users
Parameters: user_email (str) – Email for the user to be un-suspended. Returns: JSON
-
drive
¶-
google.apis.drive.
DRIVE_BACKUP_FILE
= '/home/docs/checkouts/readthedocs.org/user_builds/dbling/checkouts/latest/google/apis/../drive_data_backup.pkl'¶ Location of pickled data when cached.
-
google.apis.drive.
SEGMENT_SIZE
= 4¶ Number of hours in a segment. Must be equally divisible by 24 to avoid issues.
-
class
google.apis.drive.
DriveAPI
(http=None, impersonated_user_email=None, start=None, end=None, timezone=None)[source]¶ Class to interact with Google Drive APIs.
Documentation for the Python API:
Quick start guide:
Parameters: - http (httplib2.Http) – An Http object for sending the requests. In general, this should be left as None, which will allow for auto-adjustment of the kind of Http object to create based on whether a user’s email address is to be impersonated.
- impersonated_user_email (str) – The email address of a user to impersonate. This requires domain-wide delegation to be activated. See https://developers.google.com/admin-sdk/reports/v1/guides/delegation for instructions.
- start (str) – The earliest data to collect. Can be any kind of date string, as long as it is unambiguous (e.g. “2017”). It can even be slang, such as “a year ago”. Be aware, however, that only the day of the date will be used, meaning time information will be discarded.
- end (str) – The latest data to collect. Same format rules apply for
this as for the
start
parameter. - timezone (str) – The timezone to convert all timestamps to before compiling. This should be a standard timezone name. For reference, the list that the timezone will be compared against is available at https://github.com/newvem/pytz/blob/master/pytz/__init__.py. If omitted, the local timezone of the computer will be used.
-
activity
(level, what=('files', 'revisions'), use_cached=False, **kwargs)[source]¶ Compile the user’s activity.
Note about revision history: One of the metadata fields for file revisions is called “keepForever”. This indicates whether to keep the revision forever, even if it is no longer the head revision. If not set, the revision will be automatically purged 30 days after newer content is uploaded. This can be set on a maximum of 200 revisions for a file.
Parameters: - level (str) –
Level of detail on the activity. Accepted values:
'dy'
: Activity is summarized by day'hr'
: Activity is summarized by hour, X:00:00 to X:59:59'sg'
: Activity throughout the day is divided into a number of segments (defined to beSEGMENT_SIZE
divided by24
).
- what (tuple or list) –
Indicates what kind of content to scan for activity. Accepted values:
'created'
'revisions'
'comments'
- use_cached (bool) – Whether or not to use cached data. When set, this avoids downloading all the file metadata from Google if a cached version of the data is available on disk.
Returns: A dictionary containing three keys:
x
,y
, andz
. Each of these stores a list suitable for passing as the data set for a plot.Return type: Raises: ValueError – When the
level
orwhat
parameters have an unsupported format or value.- level (str) –
-
get_about
(fields='*')[source]¶ Retrieves information about the user’s Drive. and system capabilities.
https://developers.google.com/drive/v3/reference/about
Parameters: fields (string) – fields to be returned Returns: JSON
-
get_changes
(spaces='drive', include_team_drives=True, restrict_to_my_drive=False, include_corpus_removals=None, include_removed=None)[source]¶ Return the changes for a Google Drive account.
The set of changes as returned by this method are more suited for a file syncing application.
In the returned
dict
, the key for changes in the user’s regular Drive is an empty string (''
). The data for each Team Drive (assuminginclude_team_drives
isTrue
) is stored using a key in the format'team_drive_X'
, whereX
is the ID of the Team Drive. For the form of the JSON data, go to https://developers.google.com/resources/api-libraries/documentation/drive/v3/python/latest/drive_v3.teamdrives.html#listhttps://developers.google.com/drive/v3/reference/changes
Parameters: - spaces (str) – A comma-separated list of spaces to query within the user corpus. Supported values are ‘drive’, ‘appDataFolder’ and ‘photos’.
- include_team_drives (bool) – Whether or not to include data from Team Drives as well as the user’s Drive.
- restrict_to_my_drive (bool) – Whether to restrict the results to changes inside the My Drive hierarchy. This omits changes to files such as those in the Application Data folder or shared files which have not been added to My Drive.
- include_corpus_removals (bool) – Whether changes should include the file resource if the file is still accessible by the user at the time of the request, even when a file was removed from the list of changes and there will be no further change entries for this file.
- include_removed (bool) – Whether to include changes indicating that items have been removed from the list of changes, for example by deletion or loss of access.
Returns: All data on changes by the user in JSON format and stored in a
dict
.Return type:
-
gen_file_data
(fields='*', spaces='drive', include_team_drives=True, corpora=None)[source]¶ Generate the metadata for the user’s Drive files.
This function is a generator, so it yields the metadata for one file at a time. For the format of the
dict
generated, see https://developers.google.com/resources/api-libraries/documentation/drive/v3/python/latest/drive_v3.files.html#listParameters: - fields (str) – The metadata fields to retrieve.
- spaces (str) – A comma-separated list of spaces to query within the user corpus. Supported values are ‘drive’, ‘appDataFolder’ and ‘photos’.
- include_team_drives (bool) – Whether or not to include data from Team Drives as well as the user’s Drive.
- corpora (str) – Comma-separated list of bodies of items (files/documents) to which the query applies. Supported bodies are ‘user’, ‘domain’, ‘teamDrive’ and ‘allTeamDrives’. ‘allTeamDrives’ must be combined with ‘user’; all other values must be used in isolation. Prefer ‘user’ or ‘teamDrive’ to ‘allTeamDrives’ for efficiency.
Returns: The file metadata.
Return type:
-
export_drive_file
(file_data, download_path)[source]¶ Exports and converts .g* files to real files and then downloads them
https://developers.google.com/drive/v3/reference/files/export
Parameters: - file_data (JSON) – List of file(s) to be downloaded
- download_path – Path where the file will be downloaded
Returns: boolean True if downloads succeeded, False if Downloads failed.
-
export_real_file
(file_data, download_path)[source]¶ Downloads real files. AKA not .g*
https://developers.google.com/drive/v3/reference/files/export
Parameters: - file_data (JSON) – List of file(s) to be downloaded
- download_path – Path where the file will be downloaded
Returns: Nothing
-
download_files
(file_list_array=False)[source]¶ Downloads files from the user’s drive
https://developers.google.com/drive/v3/web/manage-downloads
Parameters: file_list_array (array) – list of file(s) to be downloaded Returns: Nothing
-
get_app_folder
(fields='nextPageToken, files(id, name)')[source]¶ Returns the data in the users app data folder
https://developers.google.com/drive/v3/reference/files/list
Parameters: fields (string) – fields to be returned Returns: JSON
-
get_photo_data
(fields='nextPageToken, files(id, name)')[source]¶ Returns the data about the user’s photos
https://developers.google.com/drive/v3/reference/files/list
Parameters: fields (string) – fields to be returned Returns: JSON
-
google.apis.drive.
crunch
(level, **kwargs)[source]¶ Consolidate the data to the specified level.
Parameters: - data (CalDict) – The data from parsing the Drive metadata.
- level (str) – Must be one of
dy
,sg
, orhr
. For an explanation of these options, see the docstring forDriveAPI.activity()
. - start (datetime.date) – The earliest data to collect.
- end (datetime.date) – The latest data to collect.
Returns: Tuple with two elements. The first is a
DateRange
object which stores the first and last days with activity (the range of dates that the data corresponds to) in itsstart
andend
attributes, respectively. Both of these attributes aredate
objects.The second element in the returned tuple is a
list
containing the data for each day. The contents of this list vary based on the value oflevel
:dy
: A singlelist
ofint
s, one for each day.sg
:list
s ofint
s. Eachlist
corresponds to a segment, eachint
corresponds to a day. These lists are in reverse order, meaning the firstlist
represents the last segment of a day.hr
:list
s ofint
s. Eachlist
corresponds to an hour, eachint
corresponds to a day. These lists are in reverse order, meaning the firstlist
represents the last hour of a day.
Return type:
gmail
¶-
class
google.apis.gmail.
GmailAPI
(http=None, impersonated_user_email=None, start=None, end=None, timezone=None)[source]¶ Class to interact with Google Gmail APIs.
Parameters: - http (httplib2.Http) – An Http object for sending the requests. In general, this should be left as None, which will allow for auto-adjustment of the kind of Http object to create based on whether a user’s email address is to be impersonated.
- impersonated_user_email (str) – The email address of a user to impersonate. This requires domain-wide delegation to be activated. See https://developers.google.com/admin-sdk/reports/v1/guides/delegation for instructions.
- start (str) – The earliest data to collect. Can be any kind of date string, as long as it is unambiguous (e.g. “2017”). It can even be slang, such as “a year ago”. Be aware, however, that only the day of the date will be used, meaning time information will be discarded.
- end (str) – The latest data to collect. Same format rules apply for
this as for the
start
parameter. - timezone (str) – The timezone to convert all timestamps to before compiling. This should be a standard timezone name. For reference, the list that the timezone will be compared against is available at https://github.com/newvem/pytz/blob/master/pytz/__init__.py. If omitted, the local timezone of the computer will be used.
google
¶-
class
google.apis.google.
GoogleAPI
(http=None, impersonated_user_email=None, start=None, end=None, timezone=None)[source]¶ Interface to the Google API.
See the documentation for subclasses for more detailed information.
Parameters: - http (httplib2.Http) – An Http object for sending the requests. In general, this should be left as None, which will allow for auto-adjustment of the kind of Http object to create based on whether a user’s email address is to be impersonated.
- impersonated_user_email (str) – The email address of a user to impersonate. This requires domain-wide delegation to be activated. See https://developers.google.com/admin-sdk/reports/v1/guides/delegation for instructions.
- start (str) – The earliest data to collect. Can be any kind of date string, as long as it is unambiguous (e.g. “2017”). It can even be slang, such as “a year ago”. Be aware, however, that only the day of the date will be used, meaning time information will be discarded.
- end (str) – The latest data to collect. Same format rules apply for
this as for the
start
parameter. - timezone (str) – The timezone to convert all timestamps to before compiling. This should be a standard timezone name. For reference, the list that the timezone will be compared against is available at https://github.com/newvem/pytz/blob/master/pytz/__init__.py. If omitted, the local timezone of the computer will be used.
people
¶-
class
google.apis.people.
PeopleAPI
(http=None, impersonated_user_email=None, start=None, end=None, timezone=None)[source]¶ Class to interact with Google People APIs.
Parameters: - http (httplib2.Http) – An Http object for sending the requests. In general, this should be left as None, which will allow for auto-adjustment of the kind of Http object to create based on whether a user’s email address is to be impersonated.
- impersonated_user_email (str) – The email address of a user to impersonate. This requires domain-wide delegation to be activated. See https://developers.google.com/admin-sdk/reports/v1/guides/delegation for instructions.
- start (str) – The earliest data to collect. Can be any kind of date string, as long as it is unambiguous (e.g. “2017”). It can even be slang, such as “a year ago”. Be aware, however, that only the day of the date will be used, meaning time information will be discarded.
- end (str) – The latest data to collect. Same format rules apply for
this as for the
start
parameter. - timezone (str) – The timezone to convert all timestamps to before compiling. This should be a standard timezone name. For reference, the list that the timezone will be compared against is available at https://github.com/newvem/pytz/blob/master/pytz/__init__.py. If omitted, the local timezone of the computer will be used.
plus
¶-
class
google.apis.plus.
PlusAPI
(http=None, impersonated_user_email=None, start=None, end=None, timezone=None)[source]¶ Class to interact with Google Plus APIs.
Parameters: - http (httplib2.Http) – An Http object for sending the requests. In general, this should be left as None, which will allow for auto-adjustment of the kind of Http object to create based on whether a user’s email address is to be impersonated.
- impersonated_user_email (str) – The email address of a user to impersonate. This requires domain-wide delegation to be activated. See https://developers.google.com/admin-sdk/reports/v1/guides/delegation for instructions.
- start (str) – The earliest data to collect. Can be any kind of date string, as long as it is unambiguous (e.g. “2017”). It can even be slang, such as “a year ago”. Be aware, however, that only the day of the date will be used, meaning time information will be discarded.
- end (str) – The latest data to collect. Same format rules apply for
this as for the
start
parameter. - timezone (str) – The timezone to convert all timestamps to before compiling. This should be a standard timezone name. For reference, the list that the timezone will be compared against is available at https://github.com/newvem/pytz/blob/master/pytz/__init__.py. If omitted, the local timezone of the computer will be used.
-
google.apis.
get_api
(api, **kwargs)[source]¶ Shortcut for creating an API object.
Parameters: Returns: An instance of the created object.
Return type: DriveAPI or PlusAPI or PeopleAPI or DirectoryAPI or GmailAPI or ReportsAPI
Other Functions and Classes¶
The following functions and classes are helpers to the code documented elsewhere.
util
¶Google API Client Library Page https://developers.google.com/api-client-library/python/reference/pydoc Python Quick Start Page https://developers.google.com/drive/v3/web/quickstart/python
-
class
google.util.
CalDict
[source]¶ A
dict
-like class for storing hourly data for a year.This is intended to have a set of keys that correspond to years. Since Python’s syntax dictates that objects cannot have attributes with names consisting only of numbers (e.g.
cal.2017
), one solution would be to name the year keyscal.y2017
,cal.y2016
, etc. This is the intended convention forCalDict
objects and aligns with how month and day data is named.Once you have created an instance of
CalDict
, you can easily create the structures necessary to store a year’s worth of data like so:>>> cal = CalDict() >>> cal[2017]
Just accessing the
2017
key (which is anint
) assigns its value to be adict
with 12 keys, one for each month, numbered1
through12
. Each of those keys points to adict
object with 31 keys, numbered1
through31
. The day keys point to alist
of 24 integers, initialized to0
. This allows you to increment the value for a particular hour immediately after instantiation, like the following, which increments the counter for the 2 PM hour block on August 31, 2016:>>> cal2 = CalDict() >>> y, m, d = 2016, 8, 31 >>> cal2[y][m][d][14] += 1
Since all months in a
CalDict
instance have 31 days, I recommend you use an external method of validating a particular date before storing or retrieving data.
-
google.util.
get_credentials
(scope=None, application_name=None, secret=None, credential_file=None)[source]¶ Create the credential file for accessing the Google APIs.
https://developers.google.com/drive/v3/web/quickstart/python
Parameters: - scope (str) – String of Scopes separated by spaces to give access to
different Google APIs. Defaults to
SCOPES
. - application_name (str) – Name of this Application. Defaults to
APPLICATION_NAME
. - secret (str) – The secret file given from Google. Should be named
client_secret.json
. Defaults toCLIENT_SECRET_FILE
. - credential_file (str) – Name of the credential file to be created.
Defaults to
CREDENTIAL_FILE
.
Returns: Credential object.
Raises: InvalidCredsError – if the credential file is missing or invalid.
- scope (str) – String of Scopes separated by spaces to give access to
different Google APIs. Defaults to
-
google.util.
set_http
(impersonated_user_email=None)[source]¶ Create and return the Http object used to communicate with Google.
https://developers.google.com/drive/v3/web/quickstart/python
Parameters: impersonated_user_email (str) – Email address of the User to be impersonated. This uses domain wide delegation to do the impersonation. Returns: The Http object. Return type: httplib2.Http Raises: InvalidCredsError – if the credential file is missing or invalid.
-
google.util.
print_json
(obj, sort=False, indent=2)[source]¶ Print the JSON object in a human readable format.
Parameters: Return type:
-
google.util.
convert_mime_type_and_extension
(google_mime_type)[source]¶ Return the conversion type and extension for the given Google MIME type.
Converts mimeType given from google to one of our choosing for export conversion This is necessary to download .g* files.
Information on MIME types:
- https://developers.google.com/drive/v3/web/mime-types
- https://developers.google.com/drive/v3/web/integrate-open
Parameters: google_mime_type (str) – mimeType given from Google API Returns: Tuple in the form (conversion type, extension). If no supported conversion is supported for the given MIME type, the tuple will be (False, False)
.Return type: tuple
const
¶-
google.const.
CLIENT_SECRET_FILE
= 'client_secret.json'¶ This file is obtained from Google through the API pages. Guide posted below
Look for the “Create authorization credentials” subsection
-
google.const.
CREDENTIAL_FILE
= 'test_creds.json'¶ Name of the file that is made by get_credentials
-
google.const.
APPLICATION_NAME
= 'dbling'¶ Name of the application
-
google.const.
DOWNLOAD_DIRECTORY
= None¶ Optional, if set to a path the user’s drive files will be downloaded to that location
-
google.const.
PAGE_SIZE
= 1000¶ Page size for requests. Specifies the number of records to be returned in a single reply. The accepted range for most requests is [1, 1000].
-
google.const.
SCOPES
= 'https://www.googleapis.com/auth/drive.readonly https://www.googleapis.com/auth/drive.appfolder https://www.googleapis.com/auth/plus.login https://www.googleapis.com/auth/gmail.readonly https://www.googleapis.com/auth/contacts.readonly https://www.googleapis.com/auth/admin.directory.device.chromeos https://www.googleapis.com/auth/admin.directory.user https://www.googleapis.com/auth/admin.directory.device.mobile.readonly https://www.googleapis.com/auth/admin.directory.customer.readonly https://www.googleapis.com/auth/admin.reports.audit.readonly https://www.googleapis.com/auth/admin.reports.usage.readonly'¶ Scope: https://developers.google.com/drive/v3/web/about-auth
plot
¶-
google.plot.
HEATMAP_COLORS
= ('#e7f0fa', '#c9e2f6', '#95cbee', '#0099dc', '#4ab04a', '#ffd73e', '#eec73a', '#e29421', '#e29421', '#f05336', '#ce472e')¶ Color scale used by the heat map
-
google.plot.
STOP_FACTOR
= 80¶ Changes how quickly the colors go to the maximum
-
google.plot.
stop
(i)[source]¶ Return the
i
th color stop.In color gradients, the point where a defined color is (as opposed to in between the defined colors, where the colors are “graded”) is called a “stop”. This Python function defines an exponential math function that returns floating point values, in the range of
0
to1
, that define where the gradient stops should occur. In theheatmap()
function, these values will be used to determine the color of each cell based on the normalized values of thez
parameters.The number of stops is determined by the number of colors defined in
HEATMAP_COLORS
. The math function used is below. In it,m = STOP_FACTOR
andn = len(HEATMAP_COLORS)
.\[\frac{m^{(i/(n - 1))} - 1}{m - 1}\]Parameters: i (int) – The current stop number. Must be a value between 0 and len(HEATMAP_COLORS) - 1
, i.e.[0, n)
.Returns: Where the i
th color stop should occur. Will always be a value between 0 and 1.Return type: float Raises: ValueError – When i
isn’t between 0 andlen(HEATMAP_COLORS) -1
.
MIME Type Info¶
As specified in the Google Drive API documentation, G Suite and Google Drive use MIME types specific to those services, as follows:
MIME Type | Description |
---|---|
application/vnd.google-apps.audio | |
application/vnd.google-apps.document | Google Docs |
application/vnd.google-apps.drawing | Google Drawing |
application/vnd.google-apps.file | Google Drive file |
application/vnd.google-apps.folder | Google Drive folder |
application/vnd.google-apps.form | Google Forms |
application/vnd.google-apps.fusiontable | Google Fusion Tables |
application/vnd.google-apps.map | Google My Maps |
application/vnd.google-apps.photo | |
application/vnd.google-apps.presentation | Google Slides |
application/vnd.google-apps.script | Google Apps Scripts |
application/vnd.google-apps.sites | Google Sites |
application/vnd.google-apps.spreadsheet | Google Sheets |
application/vnd.google-apps.unknown | |
application/vnd.google-apps.video | |
application/vnd.google-apps.drive-sdk | 3rd party shortcut |
In addition to the above MIME types, Google Doc formats can be exported as the following MIME types, as described in the Drive documentation:
Google Doc Format | Conversion Format | Corresponding MIME type |
---|---|---|
Documents | HTML | text/html |
HTML (zipped) | application/zip | |
Plain text | text/plain | |
Rich text | application/rtf | |
Open Office doc | application/vnd.oasis.opendocument.text | |
application/pdf | ||
MS Word document | application/vnd.openxmlformats-officedocument .wordprocessingml.document | |
EPUB | application/epub+zip | |
Spreadsheets | MS Excel | application/vnd.openxmlformats-officedocument .spreadsheetml.sheet |
Open Office sheet | application/x-vnd.oasis.opendocument.spreadsh eet | |
application/pdf | ||
CSV (1st sheet only) | text/csv | |
TSV (1st sheet only) | text/tab-separated-values | |
HTML (zipped) | application/zip | |
Drawings | JPEG | image/jpeg |
PNG | image/png | |
SVG | image/svg+xml | |
application/pdf | ||
Presentations | MS PowerPoint | application/vnd.openxmlformats-officedocument .presentationml.presentation |
Open Office presentation | application/vnd.oasis.opendocument.presentati on | |
application/pdf | ||
Plain text | text/plain | |
Apps Scripts | JSON | application/vnd.google-apps.script+json |
Authors¶
- Daniel Caruso II - Creator - Daniel Caruso II
common
: Modules Used Throughout dbling¶
centroid
: Representation of a Centroid¶
clr
: Color Text Easily¶
Color text.
Typical usage:
>>> red('red text', False)
Returns the string “red text” where the text will be red and the background will be the default.
>>> red('red background')
Returns the string “red background” where the text will be the default color and the background will be red.
-
common.clr.
add_color_log_levels
(center=False)[source]¶ Alter log level names to be colored.
Levels are colored to have black text and a background colored as follows:
- Level 50 (Critical): red
- Level 40 (Error): magenta
- Level 30 (Warning): yellow
- Level 20 (Info): blue
- Level 10 (Debug): green
- Level 0 (Not Set): white
Parameters: center (bool) – If log text should be centered. When set to True
, the text will be centered to the width of"CRITICAL"
, which is 8 characters. This makes it so the level in the log output always takes up the same number of characters.Return type: None
const
: Constant Values¶
Constant values used by dbling.
-
common.const.
IN_PAT_VAULT
= re.compile('^/?home/\\.shadow/[0-9a-z]*?/vault/user/')¶ Regular expression pattern for including only the user’s files
-
common.const.
ENC_PAT
= re.compile('/ECRYPTFS_FNEK_ENCRYPTED\\.([^/]*)$')¶ Regular expression pattern for identifying encrypted files
-
common.const.
SLICE_PAT
= re.compile('.*(/home.*)')¶
-
common.const.
CRX_URL
= 'https://chrome.google.com/webstore/detail/%s'¶ URL used for downloading CRXs
-
common.const.
ISO_TIME
= '%Y-%m-%dT%H:%M:%SZ'¶ ISO format for date time values
-
common.const.
DENTRY_FIELD_BYTES
= 8¶ Number of bytes used by the dir entry fields (preceding the filename)
-
class
common.const.
FType
[source]¶ File types as stored in directory entries in ext2, ext3, and ext4.
-
common.const.
MODE_UNIX
= {32768: 1, 16384: 2, 24576: 4, 40960: 7, 4096: 5, 8192: 3, 49152: 6}¶ Maps the octal values that
stat
returns fromstat.S_IFMT
to one of the regular Unix file types
-
common.const.
TYPE_TO_NAME
= {0: '-', 1: 'r', 2: 'd', 3: 'c', 4: 'b', 5: 'p', 6: 's', 7: 'l'}¶ Maps Unix file type numbers to the character used in DFXML to represent that file type
Other file types defined in DFXML schema
- h - Shadow inode (Solaris)
- w - Whiteout (OpenBSD)
- v - Special (Used in The SleuthKit for added “Virtual” files, e.g. $FAT1)
-
class
common.const.
ModeTypeDT
[source]¶ File types as stored in the file’s mode.
In Linux,
fs.h
defines these values and stores them in bits 12-15 ofstat.st_mode
, e.g.(i_mode >> 12) & 15
. Infs.h
, the names are prefixed withDT_
, hence the name of this enum class. Here are the original definitions:#define DT_UNKNOWN 0 #define DT_FIFO 1 #define DT_CHR 2 #define DT_DIR 4 #define DT_BLK 6 #define DT_REG 8 #define DT_LNK 10 #define DT_SOCK 12 #define DT_WHT 14
-
common.const.
ECRYPTFS_SIZE_THRESHOLDS
= (84, 104, 124, 148, 168, 188, 212, 232, 252, -inf)¶ The index of these correspond with i such that 16*i is the lower bound and (16*(i+1))-1 is the upper bound for file name lengths that correspond to this value. Anything 16*9=144 or longer is invalid.
-
common.const.
ECRYPTFS_FILE_HEADER_BYTES
= 8192¶ Number of bytes used by eCryptfs for its header
-
common.const.
USED_FIELDS
= ('_c_num_child_dirs', '_c_num_child_files', '_c_mode', '_c_depth', '_c_type')¶ Fields used to calculate centroids
-
common.const.
USED_TO_DB
= {'_c_num_child_files': 'num_files', '_c_depth': 'depth', '_c_size': 'size', '_c_ctime': 'ctime', '_c_type': 'type', '_c_mode': 'perms', '_c_num_child_dirs': 'num_dirs'}¶ Mapping of USED_FIELDS to database colulmn names. USED_TO_DB doesn’t have the ttl_files field because it’s not explicitly stored in the graph object.
graph
: Customized Digraph Object¶
sync
: Easy Mutex Creation¶
Context manager for easily using a pymemcache mutex.
The acquire_lock
context manager makes it easy to use pymemcache
(which
uses memcached) to create a mutex for a certain portion of code. Of course,
this requires the pymemcache
library to be installed, which in turn
requires memcached to be installed.
Raised when a cached lock is already in use.
-
common.sync.
acquire_lock
(lock_id, wait=0, max_retries=0)[source]¶ Acquire a lock on the given lock ID, or raise an exception.
This context manager can be used as a mutex by doing something like the following:
>>> from time import sleep >>> job_done = False >>> while not job_done: ... try: ... with acquire_lock('some id'): ... sensitive_function() ... job_done = True ... except LockUnavailable: ... # Sleep for a couple seconds while the other code runs and ... # hopefully completes ... sleep(2)
In the above example,
sensitive_function()
should only be run if no other code is also running it. A more concise way of writing the above example would be to use the other parameters, like this:>>> with acquire_lock('some id', wait=2): ... sensitive_function()
Parameters: - lock_id (str or bytes) – The ID for this lock. See
pymemcache
‘s documentation on key constraints for more info. - wait (int) – Indicates how many seconds after failing to acquire the
lock to wait (sleep) before retrying. When set to 0 (default), will
immediately raise a
LockUnavailable
exception. - max_retries (int) – Maximum number of times to retry to acquire the
lock before raising a
LockUnavailable
exception. When set to 0 (default), will always retry. Has essentially no effect ifwait
is 0.
Raises: LockUnavailable – when a lock with the same ID already exists and
wait
is set to 0.- lock_id (str or bytes) – The ID for this lock. See
util
: Various Utilities for dbling¶
-
common.util.
validate_crx_id
(crx_id)[source]¶ Validate the given CRX ID.
Check that the Chrome extension ID has three important properties:
- It must be a string
- It must have alpha characters only (strictly speaking, these should be
lowercase and only from
a
-p
, but checking for this is a little overboard) - It must be 32 characters long
Parameters: crx_id (str) – The ID to validate. Raises: MalformedExtID – When the ID doesn’t meet the criteria listed above.
-
common.util.
get_crx_version
(crx_path)[source]¶ Extract and return the version number from the CRX’s path.
The return value from the download() function is in the form:
<extension ID>_<version>.crx
.The
<version>
part of that format is “x_y_z” for version “x.y.z”. To convert to the latter, we need to 1) get the basename of the path, 2) take off the trailing ”.crx”, 3) remove the extension ID and ‘_’ after it, and 4) replace all occurrences of ‘_’ with ‘.’.Parameters: crx_path (str) – The full path to the downloaded CRX, as returned by the download() function. Returns: The version number in the form “x.y.z”. Return type: str
-
common.util.
get_id_version
(crx_path)[source]¶ From the path to a CRX, extract and return the ID and version as strings.
Parameters: crx_path (str) – The full path to the downloaded CRX. Returns: The ID and version number as a tuple: (id, num)
Return type: tuple(str, str)
-
common.util.
separate_mode_type
(mode)[source]¶ Separate out the values for the mode (permissions) and the file type from the given mode.
Both returned values are integers. The mode is just the permissions (usually displayed in the octal format), and the type corresponds to the standard VFS types:
- 0: Unknown file
- 1: Regular file
- 2: Directory
- 3: Character device
- 4: Block device
- 5: Named pipe (identified by the Python stat library as a FIFO)
- 6: Socket
- 7: Symbolic link
Parameters: mode (int) – The mode value to be separated. Returns: Tuple of ints in the form: (mode, type)
Return type: tuple(int, int)
-
common.util.
calc_chrome_version
(last_version, release_date, release_period=10)[source]¶ Calculate the most likely version number of Chrome.
The calculation is based on the last known version number and its release date, based on the number of weeks (release_period) it usually takes to release the next major version. A list of releases and their dates is available on Wikipedia.
Parameters: - last_version (str) – Last known version number, e.g. “43.0”. Should only have the major and minor version numbers and exclude the build and patch numbers.
- release_date (list) – Release date of the last known version number. Must be a list of three integers: [YYYY, MM, DD].
- release_period (int) – Typical number of weeks between releases.
Returns: The most likely current version number of Chrome in the same format required of the last_version parameter.
Return type:
-
common.util.
make_download_headers
()[source]¶ Return a
dict
of headers to use when downloading a CRX.Returns: Set of HTTP headers as a dict
, where the key is the header type and the value is the header content.Return type: dict[str, str]
-
common.util.
dt_dict_now
()[source]¶ Return a
dict
of the current time.Returns: A dict
with the following keys:year
month
day
hour
minute
second
microsecond
Return type: dict[str, int]
-
common.util.
dict_to_dt
(dt_dict)[source]¶ Reverse of
dt_dict_now()
.Parameters: dt_dict (dict) – A dict
(such asdt_dict_now()
returns) that correspond with the keyword parameters of thedatetime
constructor.Returns: A datetime
object.Return type: datetime.datetime
-
class
common.util.
MunchyMunch
(f)[source]¶ Wrapper class to munchify
crx_obj
parameters.This wrapper converts either the kwarg
crx_obj
or the first positional argument (tests in that order) to aMunch
object, which allows us to refer to keys in theMunch
dictionary as if they were attributes. See the docs on themunch
library for more information.Example usage:
>>> @MunchyMunch ... def test_func(crx_obj) ... # crx_obj will be converted to a Munch ... print(crx_obj.id)
Parameters: f – The function to wrap.
-
common.util.
byte_len
(s)[source]¶ Return the length of
s
in number of bytes.Parameters: s (str or bytes) – The string
orbytes
object to test.Returns: The length of s
in bytes.Return type: int Raises: TypeError – If s
is not astr
orbytes
.
-
common.util.
ttl_files_in_dir
(dir_path, pat='.')[source]¶ Count the files in the given directory.
Will count all files except
.
and..
, including any files whose names begin with.
(using the-A
option ofls
).Parameters: Returns: The number of files in the directory.
Return type: Raises: NotADirectoryError – When
dir_path
is not a directory.
-
common.util.
chunkify
(iterable, chunk_size)[source]¶ Split an iterable into smaller iterables of a certain size (chunk size).
For example, say you have a list that, for whatever reason, you don’t want to process all at once. You can use
chunkify()
to easily split up the list to whatever size of chunk you want. Here’s an example of what this might look like:>>> my_list = range(1, 6) >>> for sublist in chunkify(my_list, 2): ... for i in sublist: ... print(i, end=', ') ... print()
The output of the above code would be:
1, 2, 3, 4, 5,
Idea borrowed from http://code.activestate.com/recipes/303279-getting-items-in-batches/.
Parameters: - iterable – The iterable to be split into chunks.
- chunk_size (int) – Size of each chunk. See above for an example.
Secret Files¶
The secret
directory is used to store sensitive information specific to an installation of dbling. The files in this
directory have excluded from the repository for obvious reasons, but should include a creds.py
file. It has should
have a form such as displayed below.
import yaml
from os import uname
from os.path import join, dirname
with open(join(dirname(__file__), 'passes.yml')) as passes:
passwd = yaml.load(passes)
crx_save_path = '' # Path where the CRXs should be saved when downloaded
db_info = { # Database access information
'uri': '', # Full URI for accessing the DB. See SQLAlchemy docs for more info.
'user': '',
'pass': '',
'nodes': ['host1', ], # Host names of machines that should use 127.0.0.1 instead of the value for full_url below
'full_url': '1.2.3.4', # IP address of host with the database (usually dbling master)
}
# Login info for workers to access the celery server on the dbling master
celery_login = {'user': 'sample_username', 'pass': 'secure_password', 'port': 5672}
admin_emails = ( # Names and email addresses of admins that should receive emails from Celery
('Admin Name', 'admin_email@example.com'),
)
sender_email_addr = 'ubuntu@{}'.format(uname().nodename) # Email address Celery should use when sending admin emails
The template above references another file that should be in the secret
directory, passes.yml
. This should have
a form as shown below. Without this file, the Ansible playbooks will not function properly.
---
mysql_rt_pass: '' # MySQL root user password
mysql_dbling_user: 'dbling_dbusr' # MySQL regular user name
mysql_dbling_pass: '' # MySQL regular user password
rabbit_user: 'dbling_crawler' # RabbitMQ user name
rabbit_pass: '' # RabbitMQ user password