Module classical_atlas.pleiades_wrangler

Classical Atlas : A Python Package for Open-Access Geospatial Datasets about the Ancient World Developed by Annie K. Lamar (Stanford University | kalamar@stanford.edu)

This module contains the methods you will use most often. With these methods, you can parse all Pleiades data, create Pleiad objects representing Pleiades Places, add Places to a networkX graph, add edges representing connections between places, and even add node attributes from other datasets (e.g. ToposText) to the graph.

To quickly obtain a graph all Pleiades places and connections, use the method get_pleiades_network_shortcut. You can also use the methods individually if you want to change the default settings. To add data from ToposText, create a graph of Pleiad objects and use add_topos_text_data_to_network(), passing the graph as a parameter.

Expand source code
"""
Classical Atlas : A Python Package for Open-Access Geospatial Datasets about the Ancient World
Developed by Annie K. Lamar (Stanford University | kalamar@stanford.edu)

This module contains the methods you will use most often. With these methods, you can parse all Pleiades data,
create Pleiad objects representing Pleiades Places, add Places to a networkX graph, add edges representing
connections between places, and even add node attributes from other datasets (e.g. ToposText) to the graph.

To quickly obtain a graph all Pleiades places and connections, use the method **get_pleiades_network_shortcut.** You
can also use the methods individually if you want to change the default settings. To add data from ToposText,
create a graph of Pleiad objects and use **add_topos_text_data_to_network()**, passing the graph as a parameter.
"""
from collections import defaultdict
import networkx as nx
import downloaders
from pleiad import Pleiad
import topos_wrangler


def make_pleiades_objects(download_latest_data=False):
    """
    Parses Pleiades data and returns a list of Pleiad objects. Each Pleiad object represents a single Pleiades place.
    With default settings, this method will use the JSON file included in the release of this Python package. If you
    want to make sure you are using the most recent Pleiades JSON file, set the parameter **download_latest_data=True**.
    Note that downloading recent data will take quite a while.
    """
    if download_latest_data:
        print("Warning: Pleiades data files are large and may require considerable time to download.")
        input("Press Enter to continue...")
        data = downloaders.get_pleiades_data(
            "http://atlantides.org/downloads/pleiades/json/pleiades-places-latest.json.gz")
    else:
        raw_data_file = downloaders.unzip_gz("data/pleiades-places-latest.json.gz")
        data = downloaders.get_df(raw_data_file)
    data = data['@graph']
    pleiades = []
    for row in range(len(data)):
        pleiades.append(Pleiad(data[row]))
    return pleiades


def find_keyword(list_of_pleiads, keyword, print_results=False):
    """Search through text fields of Pleiad objects, locations, and names for a specified keyword.

    Parameters
    ----------
    list_of_pleiads : list
        list of pleiad objects to search through
    keyword : string
        keyword to search for
    print_results : boolean, default=False
        if True, print results before returning

    Returns
    -------
    dictionary
        dictionary of lists; contains pleiad objects, locations, and names relavant to keyword
    """
    found_list = defaultdict(list)
    for pleiad in list_of_pleiads:
        if (pleiad.description and keyword in pleiad.description) \
                or (pleiad.details and keyword in pleiad.details) \
                or (pleiad.title and keyword in pleiad.title):
            found_list['pleiad'].append(pleiad)
        for location in pleiad.locations.keys():
            if (location.title and keyword in location.title) \
                    or (location.location_description and keyword in location.location_description) \
                    or (location.location_details and keyword in location.location_details):
                found_list['location'].append(location)
        for name in pleiad.names.keys():
            if keyword in name.romanized_name \
                    or (name.description and keyword in name.description) \
                    or (name.language and keyword in name.language):
                found_list['name'].append(name)
    if print_results:
        print("Pleiades relevant to keyword " + keyword + ": ")
        for pl in found_list['pleiad']:
            print("     " + pl.title)
        print("Locations relevant to keyword " + keyword + ": ")
        for loc in found_list['location']:
            print("     " + loc.title)
        print("Names relevant to keyword " + keyword + ": ")
        for name in found_list['name']:
            print("     " + name.romanized_name)
    return found_list


def get_pleiades_as_nodes(list_of_pleiades):
    """
    Add nodes representing places to a graph. Each node is a Pleiades "Place" and has linked Locations and Names.

    Parameters
    ----------
    list_of_pleiades : list
        a list of Pleiades objects

    Returns
    -------
    Graph
        a NetworkX Graph with nodes representing places
    """
    G = nx.Graph()
    for pl in list_of_pleiades:
        G.add_node(pl)
    return G


def add_connections_as_edges(graph):
    """
    Adds edges between places with connections. These connections are taken from the Pleiades metadata. Connection
    type is also included as an edge attribute. More information about connections is available here:
    https://pleiades.stoa.org/help/what-are-connections.

    Parameters
    ----------
    graph : Graph
        a networkX Graph with defined nodes

    Returns
    -------
    Graph
        a networkX Graph with added edges
    """
    temp = []
    for node in graph.nodes:
        temp.append(node)
    for pl in temp:
        for connection in pl.connections.keys():
            match = None
            for node in temp:
                if connection == node.id:
                    match = node
                    break
            graph.add_edge(pl, match, connection_type=pl.connections[connection][0])
    return graph


def get_pleiades_network_shortcut():
    """
    Get a networkX Graph representing the entire Pleiades dataset. This method is a shortcut to parse all Pleiades
    data, create Pleiad objects representing every Pleiades place with linked Locations and Names, add those objects
    to a Graph as nodes, and add edges that represent connections (and their attributes) between Places. This method
    takes no arguments and returns a networkX Graph object.

    Returns
    -------
    graph
        a NetworkX graph with nodes representing Pleiades places and edges representing connections between places
    """
    pleiades = make_pleiades_objects()
    gr = get_pleiades_as_nodes(pleiades)
    gr = add_connections_as_edges(gr)
    return gr


def add_topos_text_data_to_network(graph):
    """
    Add topos text data to the network. This method parses through all Topos Text data, matches the Topos Text data
    to Pleiades IDs when possible, and adds a list of textual references as a node attribute. The graph that you pass
    to this method should be a networkX Graph object created with the methods found in the pleiades_wrangler module.
    Edges are not required, but do not impact the functionality if present. The returned Graph is the same except
    nodes have one added attribute: a list of texts that reference that particular place.

    Parameters
    ----------
    graph : Graph
        a graph with nodes representing Pleiades objects

    Returns
    -------
    Graph
        a Graph with added node attributes of list of textual references
    """
    df = topos_wrangler.get_topos_data()
    topos_refs = topos_wrangler.switch_to_pleiades_ids(df, topos_wrangler.parse_topos_place_refs())
    topos_refs = topos_wrangler.swap_key_value_pairs(topos_refs)
    G = nx.Graph()
    for topos_id in topos_refs.keys():
        for node in graph.nodes:
            if node.id == topos_id:
                G.add_node(node, textual_refs=topos_refs[topos_id])
    return G

Functions

def add_connections_as_edges(graph)

Adds edges between places with connections. These connections are taken from the Pleiades metadata. Connection type is also included as an edge attribute. More information about connections is available here: https://pleiades.stoa.org/help/what-are-connections.

Parameters

graph : Graph
a networkX Graph with defined nodes

Returns

Graph
a networkX Graph with added edges
Expand source code
def add_connections_as_edges(graph):
    """
    Adds edges between places with connections. These connections are taken from the Pleiades metadata. Connection
    type is also included as an edge attribute. More information about connections is available here:
    https://pleiades.stoa.org/help/what-are-connections.

    Parameters
    ----------
    graph : Graph
        a networkX Graph with defined nodes

    Returns
    -------
    Graph
        a networkX Graph with added edges
    """
    temp = []
    for node in graph.nodes:
        temp.append(node)
    for pl in temp:
        for connection in pl.connections.keys():
            match = None
            for node in temp:
                if connection == node.id:
                    match = node
                    break
            graph.add_edge(pl, match, connection_type=pl.connections[connection][0])
    return graph
def add_topos_text_data_to_network(graph)

Add topos text data to the network. This method parses through all Topos Text data, matches the Topos Text data to Pleiades IDs when possible, and adds a list of textual references as a node attribute. The graph that you pass to this method should be a networkX Graph object created with the methods found in the pleiades_wrangler module. Edges are not required, but do not impact the functionality if present. The returned Graph is the same except nodes have one added attribute: a list of texts that reference that particular place.

Parameters

graph : Graph
a graph with nodes representing Pleiades objects

Returns

Graph
a Graph with added node attributes of list of textual references
Expand source code
def add_topos_text_data_to_network(graph):
    """
    Add topos text data to the network. This method parses through all Topos Text data, matches the Topos Text data
    to Pleiades IDs when possible, and adds a list of textual references as a node attribute. The graph that you pass
    to this method should be a networkX Graph object created with the methods found in the pleiades_wrangler module.
    Edges are not required, but do not impact the functionality if present. The returned Graph is the same except
    nodes have one added attribute: a list of texts that reference that particular place.

    Parameters
    ----------
    graph : Graph
        a graph with nodes representing Pleiades objects

    Returns
    -------
    Graph
        a Graph with added node attributes of list of textual references
    """
    df = topos_wrangler.get_topos_data()
    topos_refs = topos_wrangler.switch_to_pleiades_ids(df, topos_wrangler.parse_topos_place_refs())
    topos_refs = topos_wrangler.swap_key_value_pairs(topos_refs)
    G = nx.Graph()
    for topos_id in topos_refs.keys():
        for node in graph.nodes:
            if node.id == topos_id:
                G.add_node(node, textual_refs=topos_refs[topos_id])
    return G
def find_keyword(list_of_pleiads, keyword, print_results=False)

Search through text fields of Pleiad objects, locations, and names for a specified keyword.

Parameters

list_of_pleiads : list
list of pleiad objects to search through
keyword : string
keyword to search for
print_results : boolean, default=False
if True, print results before returning

Returns

dictionary
dictionary of lists; contains pleiad objects, locations, and names relavant to keyword
Expand source code
def find_keyword(list_of_pleiads, keyword, print_results=False):
    """Search through text fields of Pleiad objects, locations, and names for a specified keyword.

    Parameters
    ----------
    list_of_pleiads : list
        list of pleiad objects to search through
    keyword : string
        keyword to search for
    print_results : boolean, default=False
        if True, print results before returning

    Returns
    -------
    dictionary
        dictionary of lists; contains pleiad objects, locations, and names relavant to keyword
    """
    found_list = defaultdict(list)
    for pleiad in list_of_pleiads:
        if (pleiad.description and keyword in pleiad.description) \
                or (pleiad.details and keyword in pleiad.details) \
                or (pleiad.title and keyword in pleiad.title):
            found_list['pleiad'].append(pleiad)
        for location in pleiad.locations.keys():
            if (location.title and keyword in location.title) \
                    or (location.location_description and keyword in location.location_description) \
                    or (location.location_details and keyword in location.location_details):
                found_list['location'].append(location)
        for name in pleiad.names.keys():
            if keyword in name.romanized_name \
                    or (name.description and keyword in name.description) \
                    or (name.language and keyword in name.language):
                found_list['name'].append(name)
    if print_results:
        print("Pleiades relevant to keyword " + keyword + ": ")
        for pl in found_list['pleiad']:
            print("     " + pl.title)
        print("Locations relevant to keyword " + keyword + ": ")
        for loc in found_list['location']:
            print("     " + loc.title)
        print("Names relevant to keyword " + keyword + ": ")
        for name in found_list['name']:
            print("     " + name.romanized_name)
    return found_list
def get_pleiades_as_nodes(list_of_pleiades)

Add nodes representing places to a graph. Each node is a Pleiades "Place" and has linked Locations and Names.

Parameters

list_of_pleiades : list
a list of Pleiades objects

Returns

Graph
a NetworkX Graph with nodes representing places
Expand source code
def get_pleiades_as_nodes(list_of_pleiades):
    """
    Add nodes representing places to a graph. Each node is a Pleiades "Place" and has linked Locations and Names.

    Parameters
    ----------
    list_of_pleiades : list
        a list of Pleiades objects

    Returns
    -------
    Graph
        a NetworkX Graph with nodes representing places
    """
    G = nx.Graph()
    for pl in list_of_pleiades:
        G.add_node(pl)
    return G
def get_pleiades_network_shortcut()

Get a networkX Graph representing the entire Pleiades dataset. This method is a shortcut to parse all Pleiades data, create Pleiad objects representing every Pleiades place with linked Locations and Names, add those objects to a Graph as nodes, and add edges that represent connections (and their attributes) between Places. This method takes no arguments and returns a networkX Graph object.

Returns

graph
a NetworkX graph with nodes representing Pleiades places and edges representing connections between places
Expand source code
def get_pleiades_network_shortcut():
    """
    Get a networkX Graph representing the entire Pleiades dataset. This method is a shortcut to parse all Pleiades
    data, create Pleiad objects representing every Pleiades place with linked Locations and Names, add those objects
    to a Graph as nodes, and add edges that represent connections (and their attributes) between Places. This method
    takes no arguments and returns a networkX Graph object.

    Returns
    -------
    graph
        a NetworkX graph with nodes representing Pleiades places and edges representing connections between places
    """
    pleiades = make_pleiades_objects()
    gr = get_pleiades_as_nodes(pleiades)
    gr = add_connections_as_edges(gr)
    return gr
def make_pleiades_objects(download_latest_data=False)

Parses Pleiades data and returns a list of Pleiad objects. Each Pleiad object represents a single Pleiades place. With default settings, this method will use the JSON file included in the release of this Python package. If you want to make sure you are using the most recent Pleiades JSON file, set the parameter download_latest_data=True. Note that downloading recent data will take quite a while.

Expand source code
def make_pleiades_objects(download_latest_data=False):
    """
    Parses Pleiades data and returns a list of Pleiad objects. Each Pleiad object represents a single Pleiades place.
    With default settings, this method will use the JSON file included in the release of this Python package. If you
    want to make sure you are using the most recent Pleiades JSON file, set the parameter **download_latest_data=True**.
    Note that downloading recent data will take quite a while.
    """
    if download_latest_data:
        print("Warning: Pleiades data files are large and may require considerable time to download.")
        input("Press Enter to continue...")
        data = downloaders.get_pleiades_data(
            "http://atlantides.org/downloads/pleiades/json/pleiades-places-latest.json.gz")
    else:
        raw_data_file = downloaders.unzip_gz("data/pleiades-places-latest.json.gz")
        data = downloaders.get_df(raw_data_file)
    data = data['@graph']
    pleiades = []
    for row in range(len(data)):
        pleiades.append(Pleiad(data[row]))
    return pleiades