newsroom-digital-blogs - Tumblr blog

newsroom-digital-blogs · 7 years ago

Text

Nonprofit Explorer Update: Full Text of 1.9 Million Records

We have updated our Nonprofit Explorer news application, adding raw data from more than 1.9 million electronically filed Form 990 documents dating back to 2010. This new trove includes the full text of more than 132,000 forms for which we did not previously have complete data.

In addition to making the machine-readable XML files available to download, we are publishing the full text of many of these documents as human-readable web pages. These appear similar to the PDFs that have appeared on Nonprofit Explorer in the past, but their text can be copy-and-pasted, and they are easier to browse and analyze.

You can find the XML and HTML of e-filed returns by clicking the buttons labeled “Full Text” and “Raw XML,” which appear on a nonprofit organization’s page under each year for which the data is available.

The release of the XML documents was made possible thanks to a 2015 lawsuit brought by Public.Resource.Org, a nonprofit organization that makes government documents available to the public. The suit compelled the IRS to fulfill Freedom of Information Act requests for electronically filed Form 990 documents in “Modernized e-File” XML format. The IRS started sharing the XML versions of e-filed forms as a public dataset starting in 2016.

For several years, Public.Resource.Org and its founder, Carl Malamud, have helped ProPublica acquire the page-image versions of Form 990 documents from the IRS. These documents make up the bulk of Nonprofit Explorer.

Malamud sees the release of XML data as a huge improvement.

“XML data is machine-processable,” Malamud wrote in an email to ProPublica. “You can instantly access the value of any specific field in a Form 990 (such as CEO compensation) from a computer program.”

Of the comparative advantage between XML and a page image, Malamud made an analogy. The raw XML data is like a spreadsheet, from which you can extract data easily. As for a page image, it’s as if “you make a printout of the spreadsheet, take a picture on your cellphone of the printout, and post the picture on Instagram.”

“Releasing the e-file data instead is vastly superior and will make the Form 990 a much more useful tool."

While the XML files provide the most complete and useful data possible for e-filed Form 990 documents, they’re formatted for computer programs to understand, not humans. So the IRS provides stylesheets that a programmer can use to make the documents look more like the paper forms that make up a Form 990 tax return. We adapted open-source code based on those IRS stylesheets to make cosmetic transformations for Form 990 documents from 2013 and later.

Most nonprofits file their tax documents electronically. However, there are still thousands of nonprofit organizations that file them on paper. We will continue to provide PDF versions of these documents in order to make sure we’re providing information for as many nonprofit organizations as possible.

Our work on the XML-based data is just beginning. In the coming months, we will continue to improve Nonprofit Explorer and the Nonprofit Explorer API, providing users with new ways to explore and analyze tax-exempt organizations.

from The ProPublica Nerd Blog http://ift.tt/2uK6IgM via IFTTT

#IFTTT #The ProPublica Nerd Blog

0 notes

newsroom-digital-blogs · 7 years ago

Text

Development of the live blog at the Guardian | Developer blog

The live blog is one of the Guardian’s signature digital formats. We look at its history and influence on the tools we build

The Guardian has been at the forefront of developing live blogs, starting with blogging sport events in the late 1990s. Now, it provides live online coverage of a wide range of news stories and events.

As a software developer in the editorial tools team, I am interested in understanding how the live blog came into being, because it could help us think about how similar innovations could come about in the future. I’ve been talking to people working in editorial, product and engineering who were involved with different stages of developing the live blog.

Continue reading... from Developer blog | The Guardian http://ift.tt/2hJtHra via IFTTT

#IFTTT #Developer blog | The Guardian

0 notes

newsroom-digital-blogs · 8 years ago

Text

The New York Times at SRCCON

SRCCON is a conference that brings together developers, data scientists, designers and other people within the news industry to discuss and collaborate on thorny problems. This August, The Times is sending eight people to present sessions.

Drawn Together: Doodling Our Way Toward Stronger Collaboration

Why is it so hard for even the nerdiest among us to work across teams in a newsroom? In this session, we’ll create short comics that illustrate collaboration challenges and (possible) solutions to problems we know and love: failure to communicate early, how to tap the right partners and how to manage time effectively. We’ll show some examples and talk about the scenarios they describe, then guide the group exercise to storyboard our own solutions. Toward the end, we’ll share our work and discuss. No drawing skills required!

Tiff Fehr is an Assistant Editor of Interactive News, where she focuses on building live coverage software.

Tiff is co-leading this session with Becky Bowers from The Wall Street Journal and Darla Cameron from The Washington Post.

Introducing EXTRA: An open source project to classify news text using rules

In this session, we will take some time to talk about why news tagging is important and how it can be particularly useful, asking questions like, “when is machine learning useful?” and “where do we get taxonomies from?”. Then we will present EXTRA (“EXTraction Rules Apparatus”), an open source project that was developed by IPTC with the support of Google DNI and allows news editors to precisely identify the categories to which a piece of news belongs to.

Katerina Iliakopoulou is a Software Engineer on the Personalization Team. She holds a dual master’s degree in Journalism and Computer Science from Columbia University, and she be began interning with The Times in 2015. In July 2016, Katerina joined the Personalization Team, where she develops systems that allow editors to pick news content for nytimes.com that is more efficiently based on user preferences and relevance. She is interested in all things machine learning and natural language processing.

Live Coverage for (and From) the Immediate Future

Liveblogs in 2016 are very different from live coverage in 2017. Breaking news formats have evolved. So have story pace and reader preferences. The New York Times has moved away from liveblogs towards a handful of new forms, each under active guidance/editing. The Guardian Mobile Lab is prototyping a future for live coverage that moves beyond the single pageview towards self-updating and sequential notifications, alongside “shifting lenses” to give different perspectives throughout a live event. But even the newest ideas continue to wrestle with being seen as a Product and all that entails. Let’s discuss the latest habits — for newsrooms, readership and notifications — and the future of live coverage tools.

Tiff Fehr is an Assistant Editor of Interactive News, where she focuses on building live coverage software.

Hamilton Boardman is a Senior Editor on the News Desk.

Tiff and Hamilton are co-leading this session with Alastair Coote from The Guardian.

Mentorship at Scale: How The NYT Women in Tech group build a mentorship program for 250+ people in our free time

Everyone agrees that mentorship is a good thing, but formal mentorship programs often fail because they are too time-consuming to run. The Times’ Women in Tech group decided to tackle this problem last year and created a mentorship program (for both women and men!) that was rolled out to the entire digital organization. We will share tips, discuss the lessons we learned and run a condensed version of the goal planning and peer coaching workshops that are part of the program.

Erica Greene is an Engineering Manager for the Community Team and she helps run The Times’ Women in Tech group.

Jessica Kosturko manages software engineers in Digital News Products. She is dedicated to developing her personal leadership skills as well as cultivating leadership in others. Additionally, she is one of the founders of The New York Times Digital Mentorship Program.

new sNerdFluxkit (2017): inspiration and provocations for people who make interactives

In this session, we’ll forget about our newsrooms, tools and data sets for a while and think about the essence of interaction. What inspires us as individuals to reach out and try to affect the world? As part of a group? By what means can we do so? How might we find we are changed in turn? What invites us and what repels us? Where do exploration and interaction diverge?

Britt Binler is an Interactive News Developer working on live coverage tools and prototyping new technologies for the newsroom. Previously, she worked at IBM and the University of Pennsylvania’s SIG Center for Computer Graphics. Having originally pursued study in the arts, Britt continues to research experimental approaches and innovative engagement with technology. She is currently exploring empathy in our hyper-distracted networks and is interested in developing more intuitive archives.

Scott Blumenthal is a Deputy Editor of Interactive News, where he builds and oversees the development of tools that help the newsroom experiment with new story forms and communicate with readers in new ways. Since joining the Times in 2012, he has also contributed to individual features and coverage of tentpole events such as elections, the Oscars and the Olympics (including a three-week stint in Rio last summer).

Should Our Engineers Donate to Campaigns?

You’re on a product, design or engineering team but you work at a news organization. Should you play by the same rules as the newsroom, or is this an infringement on your speech? Reasonable people can disagree, so let’s do that. Let’s disagree and see what we can learn on either end of the spectrum.

Brian Hamman is a Vice President of Engineering leading our Beta Engineering, News Products Engineering and Interactive News teams. Collectively these teams represent more than a hundred cross-functional engineers embedded in the newsroom and product teams working to bring Times journalism to millions of readers each day.

Brian is leading this session with Carrie Brown from CUNY’s Graduate School of Journalism.

Matt Ericson, Associate Managing Editor of News Platforms, and Justin Heideman, Senior Software Engineer for News Products, will also be attending SRCCON.

The New York Times at SRCCON was originally published in Times Open on Medium, where people are continuing the conversation by highlighting and responding to this story.

from Times Open - Medium http://ift.tt/2uVKfPQ via IFTTT

#IFTTT #Times Open - Medium

0 notes

newsroom-digital-blogs · 8 years ago

Text

The New York Times at SRCCON

Drawn Together: Doodling Our Way Toward Stronger Collaboration

Tiff Fehr is an Assistant Editor of Interactive News, where she focuses on building live coverage software.

Tiff is co-leading this session with Becky Bowers from The Wall Street Journal and Darla Cameron from The Washington Post.

Introducing EXTRA: An open source project to classify news text using rules

Live Coverage for (and From) the Immediate Future

Tiff Fehr is an Assistant Editor of Interactive News, where she focuses on building live coverage software.

Hamilton Boardman is a Senior Editor on the News Desk.

Tiff and Hamilton are co-leading this session with Alastair Coote from The Guardian.

Mentorship at Scale: How The NYT Women in Tech group build a mentorship program for 250+ people in our free time

Erica Greene is an Engineering Manager for the Community Team and she helps run The Times’ Women in Tech group.

new sNerdFluxkit (2017): inspiration and provocations for people who make interactives

Should Our Engineers Donate to Campaigns?

Brian is leading this session with Carrie Brown from CUNY’s Graduate School of Journalism.

Matt Ericson, Associate Managing Editor of News Platforms, and Justin Heideman, Senior Software Engineer for News Products, will also be attending SRCCON.

The New York Times at SRCCON was originally published in Times Open on Medium, where people are continuing the conversation by highlighting and responding to this story.

from Times Open - Medium http://ift.tt/2uVKfPQ via IFTTT

#IFTTT #Times Open - Medium

0 notes

newsroom-digital-blogs · 8 years ago

Text

Three things I learned while making Vox’s solar eclipse graphic

We recently published a graphic on Vox about the upcoming solar eclipse in August. While I enjoyed working on this graphic, building it certainly put my problem solving skills to the test. As I drew numerous circles on scratch paper and worked through calculations to determine where to place certain elements, a number of my colleagues expressed interest in what I was doing and wanted to know more about how the graphic works. Here are a few of my takeaways from building this piece.

1. The data you’re looking for is probably already out there

Did you know that The United States Naval Observatory (USNO) has an API that provides detailed eclipse data? Neither did I until I Googled “solar eclipse 2017 API.” Talk about a game changer. Learning that I could get all of the information that I needed to create the animation at the top from one source made my week. Well, really, it made this project.

In order to get the data for each zip code from the API, I needed to know the geographic coordinates for the zip code center. While I could have created this data myself, I didn’t need to. Thanks to erichurst, it already exists.

Turns out that there is also an existing data set for mapping zip codes to timezones from DoubleDor and Moment.js has library for formatting dates based on these timezones.

While we ended up compiling and manipulating these data sets to make them more manageable and reduce page weight, we didn’t have to create any of the data ourselves.

2. You don’t have to be a math expert

The animation at the top is essentially one circle moving over top of another circle along an invisible line that recurs every 15 seconds. Figuring out the math to draw this line was complicated, but for the most part doable. The USNO API gave me three important pieces of information that I needed to draw the path of the moon over the sun:

The starting vertex angle (the angle where the moon first touches the sun).

The maximum obscuration percentage.

The ending vertex angle (the angle where the moon last touches the sun).

One important (and interesting) thing to note: The vertex angle is measured counterclockwise from the point on the sun that has the highest local altitude, which differs from how circle angles are typically measured.

Diagram showing how vertex angles are measured as compared to how angles are typically measured on a circle.

Creating the animation

Since the USNO data provided the angles where the moon enters and exits the sun, it wasn’t too difficult to determine the starting and ending center points for the moon. These points would become the starting and ending points on the invisible line that the moon is animated along.

Diagram of calculating the starting and ending points for the line used to animate the moon.

Behind the scenes, the “sun” circle is actually a circle composed of 360 nodes (shoutout to Aaron Bycoffe’s Block, “Placing n elements around a circle with radius r”), which made it easy for me to pull out the coordinates for the entering and exiting angles and compute the line endpoints as shown above.

Now that I had the starting and ending points, I needed to determine where to place the line midpoint. Drawing a straight line between the two points would not account for the obscuration percentage or depict the true path of the moon across the sun. In order to place the midpoint, I needed to know how far away from the sun’s center it needed to be and at what angle.

Determining the angle of the midpoint was a combination of determining the bisecting angle of the start and end angles along with knowing whether the current zip code’s location relative to the path of totality. If the user would need to travel SW or SE to view the totality, the line midpoint would be placed somewhere below the sun’s center point and if the user would need to travel NW or NE to view the totality, the midpoint would be placed somewhere above the sun’s center point.

If you’re interested in seeing all of the fun math that went into figuring this out, here’s a look at some of my notes.

After figuring out the angle for the midpoint, I needed to figure out the distance between it and the sun’s center. Given that I knew the percentage that the circles overlapped (the maximum obscuration percentage), I was able to find a formula online that would help me compute the distance between the two points. And since I’m a little rusty on my trigonometry skills, I used Wolfram Alpha to help me compute the percentage overlap of the radii for all obscuration values between 1 percent and 100 percent. The percentage overlap multiplied by the size of the radius (in pixels) allowed me to find this last missing variable.

After placing this point and drawing the line, I ended up with an animation that looked like this.

3. Simple is okay

When I first created the mock for this graphic, I included a play/pause button on the timeline below the animation. We ended up nixing that idea fairly early on.

Everyone I showed the graphic to in the early stages found it straightforward and easy to understand. The animation loops every 15 seconds, so even if the user misses something the first time, they don’t have to wait long for it to come around again. Adding the play/pause ability for such a simple visualization seemed like a bit much (and would have added extra dev time).

from Vox Media Storytelling Blog - Front Page http://ift.tt/2uc9mfi via IFTTT

#IFTTT #Vox Media Storytelling Blog - Front Page

0 notes

newsroom-digital-blogs · 8 years ago

Text

Keep an Eye On Your State’s Congressional Delegation

If you’re a user of Represent, our congressional news app, or a developer who uses our Congress API, we’ve got some new features to tell you about.

On Represent, we’ve added new pages for every state’s delegation (here’s Arizona) and redesigned bill category pages, like legislation about environmental protection, to provide more useful information. You also can search the full text of bills by keyword or phrase.

That same full-text search is available in the API. We’ve also added more details to bill and member responses to the API.

Let’s say you’re a reporter in Kentucky covering health care. Your representatives have been at the center of the recent health care debate. Represent already makes it easy to see what lawmakers such as Mitch McConnell, Rand Paul or Andy Barr are individually saying about the effort to repeal and replace the Affordable Care Act, but we’re now making it easier to keep track of all of the members of your state’s delegation in one place.

You can see your state’s current congressional members and a stream of their activities from the past two weeks. The stream shows members’ statements, any activity on a piece of legislation they’ve sponsored, and articles written about them in local and national publications, courtesy of Google News. It is filterable by the member’s party and the type of activity.

To see what McConnell is saying about the Republican health care bill, what WLKY Louisville is publishing about John Yarmuth, or what’s happening with Paul’s latest legislation, you can just check out the Kentucky state delegation page. This makes it easier for local and state reporters to track congressional activity relevant to their audience, and for voters to keep tabs on their representatives.

It can be easy to miss important things that happen on Capitol Hill if they aren’t covered widely in the press. While the House’s effort to pass the American Health Care Act garnered headlines, Congress has also been hard at work voting on other health care-related bills. It was easy for somebody interested in health to miss the House recently passing the Protecting Access to Care Act, which was just as partisan a vote as the AHCA.

To make it easier to track all bills in specific topic areas, we’re launching an update to our bill category pages. The update makes it easier to see where important bills are in the passage process by separating bills that have been signed into law from those that have been recently voted on. We’ve updated the visualizations to more easily compare vote margins, and added more information about sponsors and cosponsors.

Take a look at the Economics and Public Finance page to see how budget proposal bills are faring, or the Government Operations and Politics page to find resolutions urging inquiry into President Trump’s records and finances.

We’ve also updated the recent bill actions feed to include filters for the party, state and chamber of Congress of the bill’s sponsor, making it easier for readers and reporters to track bills important to their interests.

If you want to dive deeper into health-related bills to focus on those mentioning “pre-existing conditions,” Represent has you covered, too. Although such conditions are an important part of health insurance proposals, pre-existing conditions don’t always make it into the titles or summaries of bills. Now you can search the full text of every bill since 2013 for anything you’re interested in -- like “preexisting”.

You’ll find a variety of bills from the current session of Congress, from a Democratic resolution “Urging the President to faithfully carry out the Affordable Care Act” to a Republican’s “Guaranteed Health Coverage for Pre-Existing Conditions Act of 2017” (which would take effect “upon repeal of the Patient Protection and Affordable Care Act”), along with a variety of bills about flood insurance.

Or if it’s lunchtime and your tastes are more light-hearted, you can simply search “pizza” -- and you’ll find the Common Sense Nutrition Disclosure Act of 2017, a bill that aims to change nutrition labeling requirements for “standard menu items” — like pizza — that “come in different flavors, varieties, or combinations, but which are listed as a single menu item.”

The same full-text bill search is now in the Congress API, too, along with a number of beefed-up responses for bills and members. The details about individual bills now contain the complete summary, if available, and we’ve added additional details about sponsors to bill list responses. Our subjects’ responses now include values indicating whether a given subject has bills or statements associated with it. Finally, we’ve replaced empty strings with null values where appropriate. You can see a more detailed changelog here.

A reminder to users of the Sunlight Congress API that it will be shut down after Aug. 31, 2017. We’re encouraging Sunlight users to switch to the ProPublica Congress API. Bill text search was one of the last major features to add to the ProPublica Congress API, and once we finish API responses for amendment and upcoming bills the process of merging features from the Sunlight API into our own will be complete. We’ll have more updates to the ProPublica Congress API in the next month.

from The ProPublica Nerd Blog http://ift.tt/2ux6Yl0 via IFTTT

#IFTTT #The ProPublica Nerd Blog

0 notes

newsroom-digital-blogs · 8 years ago

Text

Open Speaker Series: Simon Sinek on Leadership

Editor’s note: This is a recap from the Open Speaker Series, a regular series of talks held in-house at the Times featuring industry leaders in technology, design, product, organizational culture and leadership.

Simon Sinek

Perhaps best known for his TED talk on leadership, which has been viewed more than 32 million times, author and motivational speaker Simon Sinek visited The Times on May 22 to share his thoughts on leadership, motivation and the power of “Why.”

The event, which was co-sponsored by the Times’ Open Speaker Series and Women in Tech task force, was moderated by Times CTO Nick Rockwell.

Here are five highlights:

On prioritizing people in organizations: “The theme that runs through all of my work is people. It seems so obvious; I shouldn’t have to write a book about the fact that people matter and people come first in any organization and the industry, but I think we have forgotten that. Although I have never met a CEO on the planet, who doesn’t think that people are important and they all say how important people are, the problem is that when you show up in corporate events and you look at the list of priorities, yes people are on the list, but they come fourth or fifth. The reality is that people should always come first. Always.”

On the advantages of putting people first: “… when people feel that the organization knows that they exist and cares about them as human beings their natural biological reaction is to offer their blood sweat and tears to the organization and to the people who care about them. It’s called loyalty.”

On the importance of leadership training: “Companies should also have robust leadership training programs where people learn skills such as listening, effective confrontation, giving feedback and receiving feedback. Many companies don’t teach leadership.”

On gender equality on leadership roles: “We tend to value male characteristics in leadership, such aggression and decisiveness, and traditionally female characteristics, such as patience and empathy or caring, tend to be ignored,” he says. “And what you find is that the best leaders tend to embody a good balance of both male and female characteristics and the worst leaders, both male and female, tend to be more masculine.”

On the power of individuals to change organizations: “Instead of complaining of being the victim, you set yourself on the course of a people-first leadership. Be the leader you wish you had.”

Open Speaker Series: Simon Sinek on Leadership was originally published in Times Open on Medium, where people are continuing the conversation by highlighting and responding to this story.

from Times Open - Medium http://ift.tt/2vTMsud via IFTTT

#IFTTT #Times Open - Medium

0 notes

newsroom-digital-blogs · 8 years ago

Text

Open Speaker Series: Simon Sinek on Leadership

Simon Sinek

The event, which was co-sponsored by the Times’ Open Speaker Series and Women in Tech task force, was moderated by Times CTO Nick Rockwell.

Here are five highlights:

On the power of individuals to change organizations: “Instead of complaining of being the victim, you set yourself on the course of a people-first leadership. Be the leader you wish you had.”

Open Speaker Series: Simon Sinek on Leadership was originally published in Times Open on Medium, where people are continuing the conversation by highlighting and responding to this story.

from Times Open - Medium http://ift.tt/2vTMsud via IFTTT

#IFTTT #Times Open - Medium

0 notes

newsroom-digital-blogs · 8 years ago

Text

How (and Why) We’re Collecting Cook County Jail Data

figure.article-inline-image { display: none; } div.sidebar-inject { display: none; } body #page.article > .wrapper > article > section.bodytext .callout-ad { clear: none; } div.callout { border-top: 1px solid #ddd; display: inline-block; line-height: 1.4; padding: 0.8em; font-style: normal; font-size: 14px; font-family: Arial, sans-serif; z-index: 10; box-sizing: border-box; width: 300px; float: right; margin: 1em 2em; } div.callout img { width: 100%; } div.callout h3 { font-weight: bold; font-size: 1.3em; margin-top: 1.25em; margin-bottom: 1em; } div.callout-dek { margin-bottom: 1em; } div.callout-source { color: #999; margin-top: 1em; font-size: 13px; } .bodytext hr { width: 33%; height: 0; border: 0; border-top: 1px solid rgb(204, 204, 204); margin: 1.75em auto; text-align: center; } .bodytext h3 { font-family: "ff-meta-serif-web-pro","Georgia",serif; font-size: 18px; font-weight: bold; margin: 25px 0 15px 0; } div.pp-interactive { border-top: 1px solid #ddd; display: inline-block; line-height: 1.4; padding: 0.8em; font-style: normal; font-size: 14px; font-family: Arial, sans-serif; z-index: 10; box-sizing: border-box; width: 400px; float: right; margin: 1em -200px 1em 30px; } .pp-interactive h2 { font-family: "ff-meta-serif-web-pro", "Georgia", serif; font-weight: bold; font-size: 1.3em; margin: 10px 0 10px 0; } .pp-int-dek { margin-bottom: 1em; font-size: 0.9em; line-height: 1.35em !important; } .pp-interactive-source { color: #999; font-family: Arial; font-size: 0.7em; padding: 0 0 30px 0.8em; } .graphic-promo { float: right; padding: 10px 0 25px; position: relative; width: 500px; margin: 0 -200px 0 30px; /* border-top: 1px solid #ddd; */ } .graphic-promo-full { width: 100%; float: left; padding: 10px 0 25px; /*border-top: 1px solid #ddd;*/ } .graphic-promo-vertical { position: relative; z-index: 1000; width: 300px; float: right; padding: 10px 0 25px; border-top: 1px solid #ddd; margin-left: 30px; } .graphic-promo-full img.border { border: 1px solid #ddd; } .pp-interactive img, .graphic-promo img, .graphic-promo-full img, .graphic-promo-vertical img { max-width: 100%; } .data-store-promo .badge { display: block; width: 60%; } div.pp-app-promo { border-top: 1px solid #ddd; display: inline-block; line-height: 1.4; padding: 0.8em; font-style: normal; font-size: 14px; font-family: Arial, sans-serif !important; z-index: 1; box-sizing: border-box; width: 320px; float: right; margin: 1em 0 1em 2em; clear: none; } .pp-app-promo img { width: 100%; -webkit-transition-property: opacity, left, top, height; transition-property: opacity, left, top, height; -webkit-transition-duration: 0.5s, 2s; transition-duration: 0.5s, 2s; } body #page.article > .wrapper > article .ad { display: none; } @media screen and (max-width: 800px) and (min-width:481px) { div.pp-interactive { padding: 0px; } div.pp-interactive, .graphic-promo { /*float: none;*/ margin: 10px 0 0px 30px; width: 400px; } div.callout { margin-right: 0px; } } @media screen and (max-width: 480px) { div.pp-interactive { padding: 0px; } div.callout, div.pp-interactive, .graphic-promo, .graphic-promo-full { float: none; margin: 10px 0 0 0; width: 300px; } #page > div > article > section > div.graphic-promo > figure, #page > div > article > section > div.graphic-promo-full > figure { float: none; } .graphic-promo .photo-caption { width: 300px !important; } div.pp-app-promo { width: 100%; margin: 0 auto; float: none; } } @media print { .graphic-promo { margin-right: 0px; } } .cf:before, .cf:after { content: " "; /* 1 */ display: table; /* 2 */ } .cf:after { clear: both; }

At ProPublica Illinois, we’ve just restarted a data collection project to get new information about what happens to inmates at one of the country’s largest and most notorious jails.

Cook County Jail has been the subject of national attention and repeated reform efforts since its earliest days. Al Capone famously had “VIP accommodations” there in 1931, with homemade meals and a large cell in the hospital ward that he shared only with his bodyguard. Other prisoners have been more poorly accommodated: In the 1970s, the warden was fired for allegedly beating inmates with his own hands, and the facility was placed under federal oversight in 1974. During the 1980s, the federal government forced the jail to release inmates because of overcrowding. In 2008, the Department of Justice found systematic violation of inmates’ 8th Amendment rights and once again pushed for reforms.

These days, the jail, which has just recently been taken out of the federal oversight program, is under new management. Tom Dart, the charismatic and media-savvy sheriff of Cook County, oversees the facility. Dart has argued publicly for reducing the population and improving conditions at the jail. He’s also called the facility a de facto mental hospital, and said inmates should be considered more like patients, even hiring a clinical psychologist to run the jail.

Efforts to study the jail’s problems date back decades. A 1923 report on the jail by the Chicago Community Trust says, “Indifference of the public to jail conditions is responsible for Chicago’s jail being forty years behind the time.”

The promises to fix it go back just as far. The same 1923 report continues, “But at last the scientific method which has revolutionized our hospitals and asylums is making inroads in our prisons, and Chicago will doubtless join in the process.”

Patterns in the data about the inmate population could shed light on the inner workings of the jail, and help answer urgent questions, such as: How long are inmates locked up? How many court dates do they have? What are the most common charges? Are there disparities in the way inmates are housed or disciplined?

Such detailed data about the inmate population has been difficult to obtain, even though it is a matter of public record. The Sheriff’s Department never answered my FOIA request in 2012 when I worked for the Chicago Tribune.

Around the same time, I started a project at FreeGeek Chicago to teach basic coding skills to Chicagoans through working on data journalism projects. Our crew of aspiring coders and pros wrote code that scraped data from the web we couldn’t get other ways. Our biggest scraping project was the Cook County Jail website.

Over the years, the project lost momentum. I moved on and out of Chicago and the group dispersed. I turned off the scraper, which had broken for inexplicable reasons, last August.

I moved back home to Chicago earlier this month and found the data situation has improved a little. The Chicago Data Cooperative, a coalition of local newsrooms and civic-data organizations, is starting to get detailed inmate data via Freedom of Information requests. But there’s even more information to get.

So for my first project at ProPublica Illinois, I’m bringing back the Cook County Jail scraper. Along with Wilberto Morales, one of the original developers, we are rebuilding the scraper from scratch to be faster, more accurate and more useful to others interested in jail data. The scraper tracks inmates’ court dates over time and when and where they are moved within the jail complex, among other facts.

Our project complements the work of the Data Cooperative. Their efforts enable the public to understand the flow of people caught up in the system from arrest to conviction. What we’re adding will extend that understanding to what happens during inmates’ time in jail. It’s not clear yet if we’ll be able to track an individual inmate from the Data Cooperative’s data into ours. There’s no publicly available, stable and universal identifier for people arrested in Cook County.

The old scraper ran from June 5, 2014, until July 24, 2016. The new scraper has been running consistently since June 20, 2017. It is nearly feature-complete and robust, writing raw CSVs with all inmates found in the jail on a given day.

Wilberto will lead the effort to develop scripts and tools to take the daily CSVs and load them into a relational database.

We plan to analyze the data with tools such as Jupyter and R and use the data for reporting.

A manifest of daily snapshot files (for more information about those, read on) is available at http://ift.tt/2tv2qdB

How Our Scraper Works, a High-Level Overview

The scraper harvests the original inmate pages from the Cook County Jail website, mirrors those pages and processes them to create daily snapshots of the jail population. Each record in the daily snapshots data represents a single inmate on a single day.

The daily snapshots are anonymized. Names are stripped out, date of birth is converted to age at booking, and a one-way hash is generated from name, birth date and other personal details, so researchers can study recidivism. The snapshot data also contains the booking ID, booking date, race, gender, height, weight, housing location, charges, bail amount, next court data and next court location.

We don’t make the mirrored inmate pages public, to avoid misuse of personal data for things like predatory mugshot or background check websites.

How Our Scraper Works, the Nerdy Parts

The new scraper code is available on Github. It’s written in Python 3 and uses the Scrapy library for scraping.

Data Architecture

When we built our first version of the scraper in 2012, we could use the web interface to search for all inmates whose last name started with a given letter. Our code took advantage of this to collect the universe of inmates in the data management system, simply by running 26 searches and stashing the results.

Later, the Sheriff's Department tightened the possible search inputs and added a CAPTCHA. However, we were still able to access individual inmate pages via their Booking ID. This identifier follows a simple and predictable pattern: YYYY-MMDDXXX where XXX is a zero-padded number corresponding to the order that the inmate arrived that day. For example, an inmate with Booking ID “2017-0502016” would be the 16th inmate booked on May 2, 2017. When an inmate leaves the jail, the URL with that Booking ID starts returning a 500 HTTP status code.

The old scraper scanned the inmate locator and harvested URLs by checking all of the inmate URLs it already knew about and then incrementing the Booking ID until the server returned a 500 response. The new scraper works much the same way, though we’ve added some failsafes in case our scraper misses one or more days.

The new scraper can also use older data to seed scrapes. This reduces the number of requests we need to send and gives us the ability to compare newer records to older ones, even if our data set has missing days.

Scraping With Scrapy

We’ve migrated from a hodgepodge of Python libraries and scripts to Scrapy. Scrapy’s architecture makes scraping remarkably fast, and it includes safeguards to avoid overwhelming the servers we’re scraping.

Most of the processing is handled by inmate_spider.py. Spiders are perhaps the most fundamental elements that Scrapy helps you create. A spider handles generating URLs for scraping, follows links and parses HTML into structured data.

Scrapy also has a way to create data models, which it calls “Items.” Items are roughly analogous to Django models, but I found Scrapy’s system underdeveloped and difficult to test. It was never clear to me if Items should be used to store raw data and to process data during serialization or if they were basically fancy dicts that I should put clean data into.

Instead, I used a pattern I learned from Norbert Winklareth, one of the collaborators on the original scraper. I wrote about the technique in detail for NPR last year. Essentially, you create an object class that takes a raw HTML string in its constructor. The data model object then exposes parsed and calculated fields suitable for storage.

Despite several of its systems being a bit clumsy, Scrapy really shines due to its performance. Our original scraper worked sequentially and could harvest pages for the approximately 10,000 inmates under jail supervision in about six hours, though sometimes it took longer. Improvements we made to the scraper got this down to a couple hours. But in my tests, Scrapy was able to scrape 10,000 URLs in less than 30 minutes.

We follow the golden rule at ProPublica when we’re web scraping: “Do unto other people’s servers as you’d have them do unto yours.” Scrapy’s “autothrottle” system will back off if the server starts to lag, though we haven’t seen any effect so far on the server we’re scraping.

Scrapy’s speed gains are remarkable. It’s possible that these are due in part to increases in bandwidth, server capacity and in web caching at the Cook County Jail’s site, but in any event, it’s now possible to scrape the data multiple times every day for even higher accuracy.

Pytest

I also started using a test framework for this project I haven’t used before.

I’ve mostly used Nose, Unittest and occasionally Doctests for testing in Python. But people seem to like Pytest (including several of the original jail scraper developers) and the output is very nice, so I tried it this time around.

Pytest is pretty slick! You don’t have to write any boilerplate code, so it’s easy to start writing tests quickly. What I found particularly useful is the ability to parameterize tests over multiple inputs.

Take this abbreviated code sample:

testdata = ( (get_inmate('2015-0904292'), { 'bail_amount': '50,000.00', }), (get_inmate('2017-0608010'), { 'bail_amount': '*NO BOND*', }), (get_inmate('2017-0611015'), { 'bail_amount': '25,000', }), )

@pytest.mark.parametrize("inmate,expected", testdata) def test_bail_amount(inmate,expected): assert inmate.bail_amount == expected['bail_amount']

In the testdata variable assignment, the get_inmate function loads an Inmate model instance from sample data and then defines some expected values based on direct observation of the scraped pages. Then, by using the @pytest.mark.parameterize(...) decorator and passing it the testdata variable, the test function is run for all the defined values.

There might be a more effective way to do this with Pytest fixtures. Even so, this is a significant improvement over using metaclasses and other fancy Python techniques to parameterize tests as I did here. Those techniques yield practically unreadable test code, even if they do manage to provide good test coverage for real-world scenarios.

In the future, we hope to use the Moto library to mock out the complex S3 interactions used by the scraper.

How You Can Contribute

We welcome collaborators! Check out the contributing section of the project README for the latest information about contributing. You can check out the issue queue, fork the project, make your contributions and submit a pull request on Github!

And if you’re not a coder but you notice something in our approach to the data that we could be doing better, don’t be shy about submitting an issue.

from The ProPublica Nerd Blog http://ift.tt/2uRJ8CN via IFTTT

#IFTTT #The ProPublica Nerd Blog

0 notes

newsroom-digital-blogs · 8 years ago

Text

Things We Read This Week

Illustration by Kevin Zweerink for The New York Times

Welcome to Things We Read This Week, a weekly post featuring articles from around the internet recommended by New York Times team members. This is where we share articles we read and liked, things that made us think and things we couldn’t stop talking about. We will be taking next week off, but will resume on August 4th.

Our 6 Must Reads for Scaling Yourself as a Leader

As you take on more challenges in your organization, it’s necessary to find ways to give yourself the room to grow without burning out. This article is full of great links to resources that offer methods for managing your time and energy, which will allow you to become the leader you want to be. - Recommended by Modupe Akinnawonu, Product Manager, Android App

Kotlin: the Upstart Coding Language Conquering Silicon Valley

Kotlin has been making a buzz in the world of JVM languages over the past few years. At this year’s I/O, Google announced first-class support for Kotlin in Android. This article covers companies that use Kotlin and how it has become a great replacement for Java both for client and server-side development. I found it interesting to read a more technical piece from Wired. — Recommended by Mikhail Nakhimovich, Lead Engineer, Android App

How To Eliminate Organizational Debt

Just when you thought it was safe… you also have organizational debt. - Recommended by Nick Rockwell, Chief Technology Officer

Things We Read This Week was originally published in Times Open on Medium, where people are continuing the conversation by highlighting and responding to this story.

from Times Open - Medium http://ift.tt/2uiLybd via IFTTT

#IFTTT #Times Open - Medium

0 notes

newsroom-digital-blogs · 8 years ago

Text

Things We Read This Week

Illustration by Kevin Zweerink for The New York Times

Our 6 Must Reads for Scaling Yourself as a Leader

Kotlin: the Upstart Coding Language Conquering Silicon Valley

How To Eliminate Organizational Debt

Just when you thought it was safe… you also have organizational debt. - Recommended by Nick Rockwell, Chief Technology Officer

Things We Read This Week was originally published in Times Open on Medium, where people are continuing the conversation by highlighting and responding to this story.

from Times Open - Medium http://ift.tt/2uiLybd via IFTTT

#IFTTT #Times Open - Medium

0 notes

newsroom-digital-blogs · 8 years ago

Text

Open Speaker Series: Camille Fournier on Organizational Culture

The Open Speaker Series and Women in Tech recently co-sponsored a conversation with Camille Fournier, founding CTO of Rent the Runway and the author of The Manager’s Path and Ask the CTO. Camille is currently Managing Director of Platform Engineering at Two Sigma.

The discussion centered on three topics — career, management advice and promoting diversity in tech.

Here are five highlights:

On learning and leadership: “I think the most important thing new people in tech can do is get comfortable looking dumb. The most important thing that experienced people in tech can do is get comfortable letting people ask dumb questions and not shaming them for asking dumb questions.”

On developing a diverse organization: “I found that when I was more flexible in where I looked, I found really amazing talent that had more non-traditional backgrounds, and they were more creative and actually worked better with the product and the team that we needed to build.”

On dealing with bureaucracy: “I do encourage always digging in on the bottlenecks and inefficiencies in process, and asking the question and raising the issue and seeing what happens.”

On improving technical interviews: “I don’t think that someone has cracked the code of how to give the best, most accurate interview […] I think that questioning, “what are we even looking for?”, in an interview is a good thing to do. I definitely think questioning, “how do we determine who is qualified to interview with us, for which roles?”, is another good thing to do.”

On self-improvement: “I am way smarter because I know a hundred people smarter than me that are willing, that I have helped out myself and who can then teach me things in return. Don’t expect to know it or do it all yourself. You’re never going to be able to successfully do it all yourself, but relying on those around you — and being there for those around you — delegating your brain out a little bit. You’d be surprised what people will do if you just ask them nicely.”

Open Speaker Series: Camille Fournier on Organizational Culture was originally published in Times Open on Medium, where people are continuing the conversation by highlighting and responding to this story.

from Times Open - Medium http://ift.tt/2tjU5sU via IFTTT

#IFTTT #Times Open - Medium

0 notes

newsroom-digital-blogs · 8 years ago

Text

Open Speaker Series: Camille Fournier on Organizational Culture

The discussion centered on three topics — career, management advice and promoting diversity in tech.

Here are five highlights:

On dealing with bureaucracy: “I do encourage always digging in on the bottlenecks and inefficiencies in process, and asking the question and raising the issue and seeing what happens.”

from Times Open - Medium http://ift.tt/2tjU5sU via IFTTT

#IFTTT #Times Open - Medium

0 notes

newsroom-digital-blogs · 8 years ago

Text

Authenticating Email Using DKIM and ARC, or How We Analyzed the Kasowitz Emails

figure.article-inline-image { display: none; } div.sidebar-inject { display: none; } body #page.article > .wrapper > article > section.bodytext .callout-ad { clear: none; } div.callout { border-top: 1px solid #ddd; display: inline-block; line-height: 1.4; padding: 0.8em; font-style: normal; font-size: 14px; font-family: Arial, sans-serif; z-index: 10; box-sizing: border-box; width: 300px; float: right; margin: 1em 2em; } div.callout img { width: 100%; } div.callout h3 { font-weight: bold; font-size: 1.3em; margin-top: 1.25em; margin-bottom: 1em; } div.callout-dek { margin-bottom: 1em; } div.callout-source { color: #999; margin-top: 1em; font-size: 13px; } .bodytext hr { width: 33%; height: 0; border: 0; border-top: 1px solid rgb(204, 204, 204); margin: 1.75em auto; text-align: center; } div.pp-interactive { border-top: 1px solid #ddd; display: inline-block; line-height: 1.4; padding: 0.8em; font-style: normal; font-size: 14px; font-family: Arial, sans-serif; z-index: 10; box-sizing: border-box; width: 400px; float: right; margin: 1em -200px 1em 30px; } .pp-interactive h2 { font-family: "ff-meta-serif-web-pro", "Georgia", serif; font-weight: bold; font-size: 1.3em; margin: 10px 0 10px 0; } .pp-int-dek { margin-bottom: 1em; font-size: 0.9em; line-height: 1.35em !important; } .pp-interactive-source { color: #999; font-family: Arial; font-size: 0.7em; padding: 0 0 30px 0.8em; } .graphic-promo { float: right; padding: 10px 0 25px; position: relative; width: 500px; margin: 0 -200px 0 30px; /* border-top: 1px solid #ddd; */ } .graphic-promo-full { width: 100%; float: left; padding: 10px 0 25px; /*border-top: 1px solid #ddd;*/ } .graphic-promo-vertical { position: relative; z-index: 1000; width: 300px; float: right; padding: 10px 0 25px; border-top: 1px solid #ddd; margin-left: 30px; } .graphic-promo-full img.border { border: 1px solid #ddd; } .pp-interactive img, .graphic-promo img, .graphic-promo-full img, .graphic-promo-vertical img { max-width: 100%; } .data-store-promo .badge { display: block; width: 60%; } div.pp-app-promo { border-top: 1px solid #ddd; display: inline-block; line-height: 1.4; padding: 0.8em; font-style: normal; font-size: 14px; font-family: Arial, sans-serif !important; z-index: 1; box-sizing: border-box; width: 320px; float: right; margin: 1em 0 1em 2em; clear: none; } .pp-app-promo img { width: 100%; -webkit-transition-property: opacity, left, top, height; transition-property: opacity, left, top, height; -webkit-transition-duration: 0.5s, 2s; transition-duration: 0.5s, 2s; } body #page.article > .wrapper > article .ad { display: none; } @media screen and (max-width: 800px) and (min-width:481px) { div.pp-interactive { padding: 0px; } div.pp-interactive, .graphic-promo { /*float: none;*/ margin: 10px 0 0px 30px; width: 400px; } div.callout { margin-right: 0px; } } @media screen and (max-width: 480px) { div.pp-interactive { padding: 0px; } div.callout, div.pp-interactive, .graphic-promo, .graphic-promo-full { float: none; margin: 10px 0 0 0; width: 300px; } #page > div > article > section > div.graphic-promo > figure, #page > div > article > section > div.graphic-promo-full > figure { float: none; } .graphic-promo .photo-caption { width: 300px !important; } div.pp-app-promo { width: 100%; margin: 0 auto; float: none; } } @media print { .graphic-promo { margin-right: 0px; } } .cf:before, .cf:after { content: " "; /* 1 */ display: table; /* 2 */ } .cf:after { clear: both; }

It has become a common scenario: A reporter gets a newsworthy email forwarded out of the blue. But is the email legit? It turns out there are a few technical tools you can use to check on an email, in tandem with the traditional ones like calling for confirmation. I used some of these techniques last week to help authenticate some emails forwarded to my colleague Justin Elliott. Those emails were sent by Marc Kasowitz, one of President Trump’s personal attorneys.

This post is a very brief introduction to the tools I used and that you can use when you need to authenticate an email message.

There’s a cryptographic technique that can tell us if an email message that you or your source has received matches what was sent. It comes in two similar flavors. One’s called “DomainKeys Identified Mail,” or DKIM, and the other is “Authenticated Received Chain,” or ARC. You can use them to authenticate emails that come in over the transom. It takes a tiny bit of command-line work and maybe a little coaxing of your source, but it can offer you a mathematical guarantee that the email you have on your screen is identical to the one that the source received, with no possibility of intermediate tampering.

To understand it, we need to do a little bit of e-spelunking into how email and cryptography work.

An email message has two parts: the body, which is the text of the message, and the headers, which are kind of like the outside of an envelope on a piece of snail mail. The headers include stuff you are familiar with, like To, From and the Subject lines. But they also include a lot of other, more obscure fields that aren’t shown in Gmail or Outlook. For instance, one of the fields contains what is essentially a tracking log for the email, recording the path it took from the sender’s email service to the service hosting your own email.

The obscure header we’re interested in is called the DKIM Signature. It’s kind of like the shipper’s packing list. The DKIM Signature field contains two things: First, a set of instructions for making a summary of the email, mushing up some of the headers and the message itself, and, second, a version of that summary — technically, a “hash” — that’s cryptographically signed by the sending server.

It’s meant to give the receiving server the ability to see if the contents of the email changed in transit, the digital equivalent of detecting whether the mailman steamed open the envelope and modified the contents of a letter. We can put it to good use as journalists by creating our own version of the hash and then decrypting the one made by the sending server. If the hash we create from those instructions matches the decrypted one from the message exactly, we have mathematical proof that our email is the same as the one that was sent/received.

The inverse isn’t true. That is, if the hash we create isn’t the same as the hash in the DKIM Signature field, it doesn’t necessarily mean the messages are completely different or that the message was tampered with. Some email servers are a little wonky and make little changes to an email — adding or removing spaces at the end or something like that. Even a tiny change will totally throw off the cryptographic comparison, and isn’t at all uncommon. So if the keys don’t match, it’s possible this means the email was tampered with, but you can’t draw that conclusion from a DKIM hash mismatch alone.

There are other reasons why verification might fail on a genuine email. Older emails, in particular, are more likely to not validate because the public key used to decrypt the summary of the email might have been changed. (Remember — DKIM is meant to be used when the email was received right away, not months or years later.)

So that’s DKIM. Now for ARC. ARC is similar to DKIM, but instead of being used by the sending server, it’s used by intermediaries in the email process, like listservs or servers that receive email. Many emails that arrive into Gmail are signed by Google, but this is a new development — the ARC protocol isn’t even formally approved yet.

Some emails will have both DKIM signatures and ARC signatures. Some will have only one. For instance, the email our source received only had an ARC signature, put there by Google when it arrived in Gmail. It didn’t have a DKIM signature, because the email server used by the sender’s law firm doesn’t include them. And some have neither; both of these systems are slathered on top of the original email system like sunscreen — and, also like sunscreen, some people don’t use them.

What DKIM and ARC Prove (and What They Can’t Prove)

While a validated DKIM signature guarantees that you have the same email that was sent; a validated ARC signature can guarantee that you have the same email that was received by the receiving server. In practice, this was perfect for us, because we needed to know for sure that the email that was forwarded to us was exactly the one our source originally received.

Just because a message you’ve been forwarded matches what was sent or received, that doesn’t mean it’s completely authentic. DKIM and ARC can’t tell you whether the sender’s server was hacked or misconfigured.

And neither technique guarantees that the sender is who they say they are. It’s theoretically possible for me to create my own email server that pretends to be hillaryclinton.com. There’s a system called Sender Policy Framework (SPF) that validates whether a sending server is really allowed to send email on behalf of a given domain. Read up on that if you think that scenario is a possibility.

DKIM and ARC also can’t confirm that the person who typed the email was the person whose name is on the account, instead of somebody else with access to it. (An email that the sender “signed” with a different kind of encryption tool like PGP or S/MIME would have cryptographic proof that it came from a computer belonging to the sender, but that’s beyond the scope of this post. The emails we were analyzing didn’t have a PGP signature, and most people don’t use these tools.)

How to Check DKIM and ARC

You’ll need a little bit of command-line knowledge and to have Python installed on your computer. I’ll assume you have both, and I’ll also assume that you’re dealing with an email forwarded by a source you are still in contact with.

You’re also going to need an original copy of the email. A forwarded version won’t work at all — the headers we care about are stripped out. That means that the emails that Donald Trump Jr. tweeted can’t be verified using these techniques (but then, I suppose, he authenticated them for us).

You’ll need to find out the service your source uses to receive email — Gmail, Outlook, Yahoo, etc. — and then find their instructions on how to forward a message as an attachment. I’m including the Gmail ones here.

Here’s how you or your source can get the original message in Gmail:

Open the message.

Find the little down arrow next to the reply button and click it.

Click the Show Original button.

In the new tab that opens, you’ll see the source of the message — including all the headers that Gmail hides from you by default. Your source should click the Download Original message and email it to you as an attachment.

Now that you have the email you want to authenticate as an attachment, you’ll need two Python libraries.

dnspython. We’ll use this to fetch the decryption key that’s used to guarantee that the two summaries match. This library grabs the key from the DNS system (it’s not included in the message, which makes it harder to spoof). You should be able to install this using pip or easy_install.

dkimpy. This is a Python library for authenticating DKIM and ARC signatures. You can grab it at http://ift.tt/1PS8rJZ.

Once these are installed:

Using the command line, go to the directory where you unzipped dkimpy. On Mac and Linux, that’s probably something like cd ~/Downloads/dkimpy-0.6.2.

Make sure you know where the email message (sent as attachment) is. It might also be in Downloads — so let’s assume the path is ~/Downloads/original_msg.txt. The file path might be something like .eml or .msg. That’s fine, too.

Execute the signature validation tool, providing the original message as an argument. That’s going to look something like one of these two commands:

python dkimverify.py < original_msg.txt

python arcverify.py < original_msg.txt

Interpret the results. If the command comes back saying arc verification: cv=b'pass' success (for ARC or signature ok (for DKIM) then we know the message is the same as sent or received, as the case may be. If the response is “signature verification failed” or “Message is not ARC signed,” we don’t know if the email’s been tampered with or not. (Seriously — you can’t conclude that it has been tampered with. You just don’t know.)

I hope you find these tools useful. Seeing as how the phrase “email scandal” could refer to any number of different political brouhahas over the past two years, it’s clear that email verification is a process that we’re all going to have to get more familiar with.

from The ProPublica Nerd Blog http://ift.tt/2uzfqmd via IFTTT

#IFTTT #The ProPublica Nerd Blog

0 notes

newsroom-digital-blogs · 8 years ago

Text

Managing a Team with a Co-Lead

Illustration by Kevin Zweerink

Ben Solwitz and I co-lead the Digital Subscriptions Backend team at The New York Times. Our team develops core services for digital subscriptions, including the (in)famous paywall and the billing and subscription management system that has enabled The Times to grow to almost 2 million digital-only subscribers. We each have six direct reports, but we treat our team like one big team with twelve engineers. I often joke that while our developers do pair programming, Ben and I do pair managing.

Just like pair programming, pair managing gives the opportunity to solve problems and bounce ideas around with someone who has shared context and goals.

Gaining New Perspectives

Last year, I observed that several team members were working late at night or on weekends, but when I asked everyone individually whether they were overworked they all said no. So I asked them whether they thought their teammates were overworked, and they all said yes! I was certain everyone had too much work but wouldn’t admit it to me.

Then I talked about it with Ben, and he pointed out that the people working the most had recently taken on Tech Lead roles that had more responsibility for bigger projects. Ben suggested that maybe our tech leads were working extra hard because they felt a greater sense of ownership over their work, which was a perspective I hadn’t considered before, so we decided that we didn’t need to intervene after all.

Working Through Disagreement

Earlier this year, Ben told his direct reports to set their goals using OKRs (“Objectives & Key Results”, a framework for defining ambitious objectives and measurable outcomes), but I told mine I didn’t care how they set goals as long as they found their goal setting method helpful. We realized we were giving different direction on goal setting and we argued about this for a couple days.

In the end, I realized that OKRs weren’t out of line with what I wanted, even though the process was more disciplined. We decided that using a structured approach would work better if everyone on the team was doing it, so I got on board. Ben and I planned and ran OKR workshops together for the whole team, then we had our direct reports write and share their individual OKRs with the team. Despite our initial disagreement, I felt very happy to work together to find a solution that allowed Ben and I to effectively co-lead our team.

How to Make Pair Managing Work

Be Co-Leaders

It’s important for both managers to agree to co-lead the team and commit to doing so. It’s not necessary to set all terms and expectations up front — your partnership should be able to evolve — but you do need to take on mutual ownership for leadership of your team.

Pair management can only work if both managers believe that success is not zero sum. Managers should be invested in helping each other thrive.

Talk to each other. A lot.

Make time to talk to each other every day. The trick is to develop a habit of sharing the information that shapes your thoughts and decisions as a manager. This helps you build shared context, which makes solving problems together more productive and makes it easier to agree.

When you read an article that illuminates new ideas, have your co-lead read it too. Discuss what you learned from the reading and decide together how to apply it to your team.

Identify a mode of communication that lets you promptly share information that impacts your team. Both managers should be able to represent the team at any given time, so it’s important to be able to be aware of new information as it comes up. Whether it’s Slack, email or hallway chats, choosing a method of communication that works for both you and your co-lead will help keep you informed.

Set up a weekly one-on-one meeting to cover big topics and long-term planning. A one-on-one is a chance to focus on your “important but not urgent” work. You can also use the time for peer mentorship: ask yourself and your co-lead what’s going well and what needs improvement, what you like best about your job or where you see yourself in three years.

Just like in pair programming, there is communication overhead. You will have to explain yourself and justify your decisions more frequently and more thoroughly. But talking things through often results in higher quality solutions and keeps you both focused on what’s most important.

Share goals, values and responsibilities

Define shared values and a common vision for what it means to be a good manager and what a great team looks like. Your hiring process and your team mission are great opportunities to work together to make those shared values more explicit. You can work together to identify the competencies your team needs, and design your job descriptions and interview process to match; you can write or revise your team mission statement together.

Sharing goals and values helps maintain alignment, but sharing responsibilities is important too. Pair managing lets each manager take advantage of their unique strengths. You’ll each be good at different things, so you can split up the responsibilities and cover them better than you could alone.

Build relationships with the whole team

Set up one-on-one meetings with the whole team. You’ll naturally have regular contact with your own direct reports, but your co-lead’s direct reports are part of your team too, so find ways to form relationships with them. Having occasional one-on-ones can be useful, and socializing as a team is another way to build rapport with everyone.

Pair managing yields many of the same benefits as pair programming:

Increased quality of work — talking things through with a co-lead results in clearer articulation of complexities and risks, and more thorough understanding of issues.

Better transfer of skills — managers can share skills too, from ideas for more effective one-on-one meetings to time management techniques.

Improved engagement — pair managing requires active collaboration, which fosters a sense of belonging and builds community at work. Co-leads can also help each other stay focused on the most important work.

For me, pair managing also reinforces my values as a manager. Collaboration and openness are important to me, and pair managing lets me practice and demonstrate those values every day. The more my co-lead and I are willing to share, stay open to new ideas and work together to find solutions, the better we do at our job.

Charlyn Gee is an Engineering Manager at the New York Times. Her team builds the backend services behind NYT’s Digital Subscriptions business.

Managing a Team with a Co-Lead was originally published in Times Open on Medium, where people are continuing the conversation by highlighting and responding to this story.

from Times Open - Medium http://ift.tt/2varMyF via IFTTT

#IFTTT #Times Open - Medium

0 notes

newsroom-digital-blogs · 8 years ago

Text

Managing a Team with a Co-Lead

Illustration by Kevin Zweerink

Just like pair programming, pair managing gives the opportunity to solve problems and bounce ideas around with someone who has shared context and goals.

Gaining New Perspectives

Working Through Disagreement

How to Make Pair Managing Work

Be Co-Leaders

Pair management can only work if both managers believe that success is not zero sum. Managers should be invested in helping each other thrive.

Talk to each other. A lot.

When you read an article that illuminates new ideas, have your co-lead read it too. Discuss what you learned from the reading and decide together how to apply it to your team.

Share goals, values and responsibilities

Build relationships with the whole team

Pair managing yields many of the same benefits as pair programming:

Increased quality of work — talking things through with a co-lead results in clearer articulation of complexities and risks, and more thorough understanding of issues.

Better transfer of skills — managers can share skills too, from ideas for more effective one-on-one meetings to time management techniques.

Charlyn Gee is an Engineering Manager at the New York Times. Her team builds the backend services behind NYT’s Digital Subscriptions business.

Managing a Team with a Co-Lead was originally published in Times Open on Medium, where people are continuing the conversation by highlighting and responding to this story.

from Times Open - Medium http://ift.tt/2varMyF via IFTTT

#IFTTT #Times Open - Medium

0 notes