Dr. Strangethread

How I stopped worrying and
learned 🐍 threading

@cmheisel / PyATL March 2017

Who am I?

Chris Heisel's site
Pindrop
Well there's your problem
github.com/cmheisel/agile-analytics
  • List of every ticket finished and how long it took to complete
  • Bugs created during the time period
  • 50th, 75th, 95th percentile of how long it took to finish things
  • More nerdy team improvement stats
  1. Do some startup stuff
  2. For each team
    1. Fetch the data
    2. Generate all the reports
    3. For each report
      1. Write the report to Google spreadsheet
  3. Do some cleanup stuff

Sequentially


URLS = [
    "http://slowyourload.net/5/https://chrisheisel.com",
    "http://slowyourload.net/4/https://chrisheisel.com",
    "http://slowyourload.net/3/https://chrisheisel.com",
    "http://slowyourload.net/2/https://chrisheisel.com",
    "http://slowyourload.net/1/https://chrisheisel.com",
]


def get_url(url):
    print("GET {}".format(url))
    requests.get(url)
    print("\tDONE GET {}".format(url))


def main():
    print("Sequential ====================")
    start = timer()
    for url in URLS:
        get_url(url)
    end = timer()
    duration = (end - start)
    print("DONE in {} seconds".format(duration))
          

Sequential ====================
GET http://slowyourload.net/5/https://chrisheisel.com
  DONE GET http://slowyourload.net/5/https://chrisheisel.com
GET http://slowyourload.net/4/https://chrisheisel.com
  DONE GET http://slowyourload.net/4/https://chrisheisel.com
GET http://slowyourload.net/3/https://chrisheisel.com
  DONE GET http://slowyourload.net/3/https://chrisheisel.com
GET http://slowyourload.net/2/https://chrisheisel.com
  DONE GET http://slowyourload.net/2/https://chrisheisel.com
GET http://slowyourload.net/1/https://chrisheisel.com
  DONE GET http://slowyourload.net/1/https://chrisheisel.com
DONE in 23.727349369000876 seconds
          
Sequential diagram
As fast as possible
  1. Do some startup stuff
  2. For each team
    1. Fetch the data
    2. Generate all the reports
    3. For each report
      1. Write the report to Google spreadsheet
  3. Do some cleanup stuff

We're gonna talk about...

Threading and parallelism

 
  • Don't care what order the API calls are made in
  • Don't have to take the output from any one API call and feed it into another
  • Do some things only after all API calls are finished

In parallel

With threads!


def main():
    print("Threaded ====================")
    start = timer()
    for url in URLS:
        t = threading.Thread(
            name="get_url - {}".format(url),
            target=get_url,
            args=(url, )
        )
        t.start()
    end = timer()
    duration = (end - start)
    print("DONE in {} seconds".format(duration))
          

Threaded naive ====================
GET http://slowyourload.net/5/https://chrisheisel.com
GET http://slowyourload.net/4/https://chrisheisel.com
GET http://slowyourload.net/3/https://chrisheisel.com
GET http://slowyourload.net/2/https://chrisheisel.com
GET http://slowyourload.net/1/https://chrisheisel.com
DONE in 0.001873285997135099 seconds
  DONE GET http://slowyourload.net/2/https://chrisheisel.com
  DONE GET http://slowyourload.net/5/https://chrisheisel.com
  DONE GET http://slowyourload.net/4/https://chrisheisel.com
  DONE GET http://slowyourload.net/1/https://chrisheisel.com
  DONE GET http://slowyourload.net/3/https://chrisheisel.com
          
Naive threaded diagram

def main():
    print("Threaded ====================")
    start = timer()
    for url in URLS:
        t = threading.Thread(
            name="get_url - {}".format(url),
            target=get_url,
            args=(url, )
        )
        t.start()
    
    # Ooops, we just go into this next code block
    # without waiting for all those threads to stop
    
    end = timer()
    duration = (end - start)
    print("DONE in {} seconds".format(duration))
          

            def main():
                print("Threaded ====================")
                start = timer()
                my_threads = []
                for url in URLS:
                    t = threading.Thread(
                        name="get_url - {}".format(url),
                        target=get_url,
                        args=(url, )
                    )
                    t.start()
                    my_threads.append(t)

                for t in my_threads:
                    t.join()

                end = timer()
                duration = (end - start)
                print("DONE in {} seconds".format(duration))
          

Threaded ====================
GET http://slowyourload.net/5/https://chrisheisel.com
GET http://slowyourload.net/4/https://chrisheisel.com
GET http://slowyourload.net/3/https://chrisheisel.com
GET http://slowyourload.net/2/https://chrisheisel.com
GET http://slowyourload.net/1/https://chrisheisel.com
  DONE GET http://slowyourload.net/1/https://chrisheisel.com
  DONE GET http://slowyourload.net/2/https://chrisheisel.com
  DONE GET http://slowyourload.net/5/https://chrisheisel.com
  DONE GET http://slowyourload.net/3/https://chrisheisel.com
  DONE GET http://slowyourload.net/4/https://chrisheisel.com
DONE in 8.352946228995279 seconds
          
Fixed threaded diagram

Ohhhhh yeaaaaaa

Old busted: Threading

New hotness: asyncio


async def main(loop):
    print("Async ====================")
    start = timer()
    futures = []
    for url in URLS:
        future = loop.run_in_executor(None, get_url, url)
        futures.append(future)

    for response in await asyncio.gather(*futures):
        pass

    end = timer()
    duration = (end - start)
    print("DONE in {} seconds".format(duration))


if __name__ == "__main__":
    event_loop = asyncio.get_event_loop()
    try:
        event_loop.run_until_complete(main(event_loop))
    finally:
        event_loop.close()
          

Async ====================
GET http://slowyourload.net/5/https://chrisheisel.com
GET http://slowyourload.net/4/https://chrisheisel.com
GET http://slowyourload.net/3/https://chrisheisel.com
GET http://slowyourload.net/2/https://chrisheisel.com
GET http://slowyourload.net/1/https://chrisheisel.com
  DONE GET http://slowyourload.net/3/https://chrisheisel.com
  DONE GET http://slowyourload.net/5/https://chrisheisel.com
  DONE GET http://slowyourload.net/4/https://chrisheisel.com
  DONE GET http://slowyourload.net/2/https://chrisheisel.com
  DONE GET http://slowyourload.net/1/https://chrisheisel.com
DONE in 8.631839034002041 seconds
          

Ask more questions