Blogs

PhillyDB: Falling In and Out of Love with DynamoDB

Submitted by admin on Sun, 06/09/2013 - 13:44

Join PhillyDB and PhillyAWS for an evening of database and AWS presentations.

Our feature presentation is "Falling In and Out of Love with DynamoDB"

Amazon's DynamoDB offers the opportunity to offload the cost of maintenance and deployment of a web-scale NoSQL database, but comes with its own challenges. Tim Gross will discuss application and schema design as well as operations and cost-control measures in the context of his experience in building production applications with DynamoDB. He'll cover some hard-learned lessons and whether DynamoDB is really the best option for common real-world use cases.

Tim Gross develops software and infrastructure for DramaFever, an online television network, where his current focus is building distributed web services on the AWS and AppEngine platforms, working on libraries to improve caching and data access scalability, analytics, and devops.

We're also lining up some lightning talks. If you'd like to add one on a database or AWS topic, contact me or Aaron Feng.

Hope to see you there!

PhillyDB will meet on the 16th floor of the MSB. Note that the meeting may be listed at the front desk under the name 'Philly Tech Groups' or 'Technically Philly Groups'.

Schedule:

6:00 - pizza and socializing
6:30 - presentations
8:00 - beverages at Nodding Head Brewpub

June 18, 2013 - Municipal Services Building

Register at the PhillyDB site

PhillyDB: To DB or not to DB? That is the Question

Submitted by admin on Sat, 04/20/2013 - 14:42

The role of flat file data formats in genomics and other science domains.

"You have a data management problem and you decide to use a database. Now you have two problems."

Seriously though, adding a service layer to an application that needs to store and crunch a stream of structured data would be overly complicated and most of the time unnecessary. This is the reason that most scientific algorithms, which one would assume would benefit greatly from databases, don't actually use a database service. More often than not, they invent their own file formats to complete the task at hand. In this short talk, we'll look at a few trends in file formats that happened in genomics, look at the motivations of each, and try to come out a saner person because of it.

The talk will be presented by:

Angel Pizarro
Director, Joint Informatics Group
Institute for Translational Medicine and Therapeutics
Penn Genome Frontiers Institute

Additionally, we've got a couple of lightning talks lined up. If you can add one, we'd love to hear it!

This is a participating event in:

Philly Tech Week

Big Data Week

This event is being hosted by our friends at NextFab Studio - Philly's largest and most capable maker space. Come by early (or hang out afterwards) for a tour of the facility/equipment, and to find out how you can get in on the coolest place in town!

April 23 · 6:30 PM - NextFab Studio

Register at the PhillyDB site

PhillyDB: Introduction to Graph Databases

Submitted by admin on Thu, 04/04/2013 - 19:57

We'll introduce Neo4j, an open source graph database that you can embed, deploy as a server, or use in the cloud. Starting with graph databases 101, you'll leave with an understanding of graph-like thinking, when to use a graph, and what great new magic you'll be able to add to your toolbox.

If you are dealing with highly connected data, writing crazy stored procedures, battling monster join tables or just want a new perspective on databases, you'll want to come to this meet-up. Learn the secret sauce behind Neo4j, and see some example Neo4j projects to help you get started.

PhillyDB will meet on the 16th floor of the MSB. Note that the meeting may be listed at the front desk under the name 'Technically Philly Groups'.

Schedule:

6:00 - pizza and socializing
6:30 - presentations
8:00 - beverages at Nodding Head Brewpub

Speaker: Max De Marzi
Max is a graph database enthusiast. He built the Neography Ruby Gem, a rest api wrapper to the Neo4j Graph Database. He blogs about all the things you can do with Neo4j at http://maxdemarzi.com

April 16 · 6:00 PM - Municipal Services Building

Register at the PhillyDB site

PhillyDB: Hbase and MapR M7

Submitted by admin on Wed, 03/13/2013 - 17:08

This presentation will provide an overview of Hbase as well as MapR's M7 NoSQL database.

We will begin with a discussion of the basic Hbase architecture and the problems it solves. We will then discuss how MapR's M7, like M5, adds innovative features that provide tangible advantages to Hbase users while maintaining API compatibility."

Keys Botzum is a Senior Principal Technologist with MapR Technologies. He has over 15 years of experience in large scale distributed system design. Mr. Botzum has worked with a variety of distributed technologies, including Sun RPC, DCE, CORBA, Java EE, AFS, and DFS. Recently, he has been focusing on Hadoop and related technologies. Previously he was a Senior Technical Staff Member with IBM and a respected author of many articles on WebSphere Application Server as well as a book. He holds a Masters degree in Computer Science from Stanford University and a B.S. in Applied Mathematics/Computer Science from Carnegie Mellon University.

Mr. Botzum has published numerous papers on WebSphere and WebSphere security on IBM DeveloperWorks WebSphere. He is also an author of the book IBM WebSphere: Deployment and Advanced Configuration

http://www.amazon.com/exec/obidos/tg/detail/-/0131468626/ref=cm_ea_pl_pr....

To learn more about MapR Technologies and their market leading Hadoop distribution, see http://www.mapr.com.

PhillyDB will meet on the 16th floor of the MSB. Note that the meeting may be listed at the front desk under the name 'Technically Philly Groups'.

Schedule:

6:00 - 6:30 - Food, networking
6:30 - 8:00 - Presentation
8:15 - ??? - Socializing/networking @ Nodding Head Brewpub

March 19 · 6:00 PM - Municipal Services Building

Register at the PhillyDB site

PhillyDB: Business Discovery Using Qlikview

Submitted by admin on Thu, 02/14/2013 - 11:27

Business Discovery is user-driven business intelligence that helps people make decisions based on multiple sources of insight: data, people, and the environment. Users can create and share knowledge and analysis in groups and across organizations.

Locally headquartered (Radnor) Qliktech pioneered Business Discovery with its flagship software QlikView.

This presentation will help you understand the capabilities of QlikView; what makes it unique, the Associative Experience, and the underlying architecture that has kept QlikView on Gartner’s magic quadrant for the last three straight years.

The presentation will include a hands on demonstration of:

- the Desktop Client for application authoring,
- the QlikView web AccessPoint for selecting and using QlikView documents
- application architecture (deploying for enterprise scalability, integrating with authentication services...)

Tony Strano has sixteen years designing and developing data warehouse databases and business intelligence reporting for companies such as Oracle, Towers Watson, GSI commerce, and Northrop Grumman. He has delivered solutions using Hyperion tools (such as Essbase), Oracle and SQL Server databases, IBM Websphere DataStage and Brio reporting on both the Windows and UNIX platforms. Tony is a Technical Architect for QlikView in the Performance and Scalability group at QlikView Consulting Services.

Schedule:

6:00 - pizza/socializing/networking
6:30 - presentation
8:00 - beverages at Nodding Head Brewpub

February 19 · 6:00 PM - Municipal Services Building

Register at the PhillyDB site

PhillyDB: Beyond Batch

Submitted by admin on Sun, 01/06/2013 - 13:14

The venerable MapReduce framework has allowed Hadoop to prove its worth in the big data space, and to store and analyze much larger data sets than was possible before. But there is a lot of activity in the big data ecosystem currently surrounding other major categories of workflows beyond batch.

These emerging tools include low latency i/o (HBase), interactive queries (Drill), stream processing (Storm), and text processing / indexing (Solr). This talk discusses some of the more interesting developments in Drill and Storm, their capabilities, and how they are being put to use in real world situations.

Brad Anderson has been wrangling data for 20 years, first with enterprise data warehouses and more recently building and using non-relational big data tools. Previously, he worked on a large-scale video-on-demand platform in Erlang, helped Cloudant build its hosted NoSQL offering based on CouchDB, and organized the NoSQL East 2009 conference in Atlanta. He has recently worked with Cascading, Storm and Neo4J, and has contributed code to the HBase open-source project. Brad has founded or co-founded four technology companies, and his first company operates today as Mirus Restaurant Solutions.

(Note that Brad's original Apache Drill talk had to be postponed due to weather. We're excited about this new, expanded version!)

January 15 · 6:00 PM - Municipal Services Building

Register at the PhillyDB site

PhillyDB: Postgrespalooza

Submitted by admin on Fri, 11/16/2012 - 00:14

Hey folks, it's been a busy month for the organizers of the PhillyDB and Philly PostgreSQL groups. So we're joining forces for our November meeting. Wonder Twin powers, activate! Form of: Postgrespalooza!

John Ashmead will talk about PostgreSQL & Postgis and Jim Mlodgensky will talk about PostgreSQL Foreign Data Wrappers.

Using PostgreSQL Foreign Data Wrappers

As more and more alternative data stores come into use, the problem of being able to easily use and report on the data scattered across those data stores becomes increasingly difficult. PostgreSQL has a feature called Foreign Data Wrappers that allows external data sources to be queried from PostgreSQL and look like a standard table. Using Foreign Data Wrappers, users can create a report that joins data residing in MySQL, CouchDB and MongoDB all in a single query.

In this talk, we'll discuss how to set up a Foreign Data Wrapper for various data sources, the pros and cons using them and time permitting, a little about how to write one.

How to Install PostgreSQL & Postgis on an Intel Mac

And why you might want to do this. There are at least five different ways to do the install: from the raw source, using Stack Builder, using a prebuilt distribution, using macports, & using fink. Each approach has its pluses & minuses. I've just sweated thru the installs & learned a lot about the pluses and the minuses of each approach. I'll describe what the issues are & which approach might be best in particular cases. A fair amount of what I learned applies not only to the Mac and Postgres but to Unixoid machines and open source databases in general. I'll try to turn my pain into your gain.

Schedule:

6:00 - pizza and socializing
6:30 - presentations
8:00 - beverages at Nodding Head Brewpub

November 20, 2012 · 6:00 PM - Municipal Services Building

Register at the PhillyDB site

PhillyDB: MapR Hadoop

Submitted by admin on Thu, 10/04/2012 - 12:48

MapR provides a unique Hadoop distribution that addresses enterprise requirements for dependability, ease of use, and integration. MapR can be used on standalone clusters, as well as being an option on AWS EMR and Google Compute Engine.

In this presentation we will briefly introduce Hadoop, then discuss MapR's unique reliability and manageability features. We'll conclude with a discussion of MapR's support for industry standard interfaces, including NFS. The NFS interface enables users to leverage standard file-based applications, and makes it easier to get data into and out of the cluster, This talk covers the motivation for supporting industry-standard interfaces as well as several real-world use cases. In addition, this talk explains the technical details behind these capabilities and how they actually work.

Keys Botzum is a Senior Principal Technologist with MapR Technologies. He has over 15 years of experience in large scale distributed system design. Mr. Botzum has worked with a variety of distributed technologies, including Sun RPC, DCE, CORBA, Java EE, AFS, and DFS. Recently, he has been focusing on Hadoop and related technologies. Previously he was a Senior Technical Staff Member with IBM and a respected author of many articles on WebSphere Application Server as well as a book. He holds a Masters degree in Computer Science from Stanford University and a B.S. in Applied Mathematics/Computer Science from Carnegie Mellon University.

Mr. Botzum has published numerous papers on WebSphere and WebSphere security on IBM DeveloperWorks WebSphere. He is also an author of the book IBM WebSphere: Deployment and Advanced Configuration

http://www.amazon.com/exec/obidos/tg/detail/-/0131468626/ref=cm_ea_pl_pr....

To learn more about MapR Technologies and their market leading Hadoop distribution, see http://www.mapr.com.

Schedule:

6:00 - 6:30 - Food, networking
6:30 - 8:00 - Presentation

9/18/2012

Our September meetup will feature two presentations.

First up is Jim Mlodgenski from StormDB:

Geo-spatial Queries with StormDB

With the global explosion of mobile devices, location aware applications are now commonplace, but their potential has been limited the ability to quickly process Geo-spatial data. StormDB is a database service built on the open source project Postgres-XC. The Postgres-XC is a horizontally scalable clustering technology base PostgreSQL allowing for read and write scalability while being transparent to your application. Data is automatically sharded across a number of different data nodes while still maintaining full ACID compliance and consistency.

Learn how StormDB can provide the ability to:

- Process large amounts of Geo-spatial quickly
- Horizontally scale data without giving up consistency or availability
- Create a production ready database in the Cloud
- Free developers and administrators from worrying about the data layer

Jim Mlodgenski is CEO and Co-Founder at StormDB. He is a technology leader with extensive experience in developing enterprise class technology solutions utilizing open source software. In addition to his role at StormDB, Jim is also an avid advocate of PostgreSQL as one of the organizers of the New York City and Philly PostgreSQL User Groups. Prior to StormDB, Jim was Founder of Cirrus Technologies, a professional services company focused on helping move database centric applications to the Cloud. Before that, Jim was Chief Architect at EnterpriseDB.

Our second presentation features Bradley Anderson from MapR:

An Introduction to Apache Drill

Wikipedia's entry on Big Data begins:

“In information technology, big data is a loosely-defined term used to describe data sets so large and complex that they become awkward to work with using on-hand database management tools.”

This isn't a particularly satisfactory answer, and not simply because it is both self-referrential and vacuous.

Okay, maybe that is the reason. Regardless, everyone reading this deserves a more informative answer.

To that end, the talk will start off with a mercifully brief operational definition of Big Data. We will then look at a few examples that would appear to fit this definition, and why they are generally recognized as being “big data challenged” when “traditional database technology” is applied.

The bulk of the talk, then, will consist of a description, explanation and cross comparison of the three primary approaches to Big Data, namely:

1) Parallel, shared-nothing, (columnar), database clusters
2) Distributed key/value stores (nosql)
3) Map/Reduce (Hadoop) implementations

Questions will be welcome at any time during the talk. In fact, if you have any burning ones up front, you are welcome to submit them beforehand and, time permitting, they will addressed during the course of the talk.

Admittedly this is an ambitious agenda, but since the talk will be completely devoid of Powerpoint animations and sound effects, that extra time can be put to effective use.

Howie Rosenshine has spent more years in “the industry” than he cares to admit. He also has a Masters degree in Computer Science from Penn, where his interests were functional programming and databases. (This he is perfectly happy to admit.)

His formative years included working with device control and assembly language and Logic Programming/Knowledge Representation (Prolog).

Syndicate content