Roberto Congiu's blog

Streaming Data to HTTP using Akka Streams with Exponential Backoff on 429 Too Many Requests

March 12, 2019
by rcongiu

HTTP/REST is probably the most used protocol to exchange data between different services, especially in today's microservice world...

Tagged: akka, akka-streams, big data, bigdata, scala

Posted in: programming

Basic Authorization and htaccess style authentication on the Play! Framework an Silhouete

August 18, 2018
by Roberto Congiu

0

Silhouette is probably the best library to implement authentication and authorization within the Play Framework. Git repo here : https://github.com/rcongiu/play-silhouette-basic-auth It is very powerful, as you can manage a common identity from multiple providers, so you can have users logging into your site from google, facebook, JWT, and may other methods. It also allows you to fine […]

Posted in: programming

Custom Window Function in Spark to create Session IDs

October 29, 2017
by Roberto Congiu

0

(note: crossposted from my Nuvolatech Blog If you’ve worked with Spark, you have probably written some custom UDF or UDAFs. UDFs are ‘User Defined Functions’, so you can introduce complex logic in your queries/jobs, for instance, to calculate a digest for a string, or if you want to use a java/scala library in your queries.

Posted in: programming

Creating Nested data (Parquet) in Spark SQL/Hive from non-nested data

April 4, 2015
by Roberto Congiu

0

Sometimes you need to create denormalized data from normalized data, for instance if you have data that looks like CREATE TABLE flat ( propertyId string, propertyName String, roomname1 string, roomsize1 string, roomname2 string, roomsize2 int, .. ) but we want something like CREATE TABLE nested ( propertyId string, propertyName string, rooms <array<struct<roomname:string,roomsize:int>> ) […]

Tagged: big data, parquet, programming, scala, spark

Posted in: programming

Panna Cotta, my recipe.

January 10, 2015
by Roberto Congiu

0

Panna cotta is one of my favorite dessert and one you can enjoy at many Italian restaurants here in LA. It looks and sounds fancy, but it’s incredibly easy to make if you just get the right ingredients, in particular the gelatin. It is also very important to get very fresh ingredients, since it’s basically […]

Posted in: cooking

Structured data in Hive: a generic UDF to sort arrays of structs

September 17, 2013
by Roberto Congiu

0

Introduction Hive has a rich and complex data model that supports maps, arrays and structs, that could be mixed and matched, leading to arbitrarily nested structures, like in JSON. I wrote about a JSON SerDe in another post and if you use it, you know it can lead to pretty complicated nested tables. Unfortunately, hive […]

Posted in: programming

A JSON read/write SerDe for Hive

July 11, 2011
by Roberto Congiu

1

Today I finished coding another SerDe for Hive which, with my employer’s permission, I published on github here: https://github.com/rcongiu/Hive-JSON-Serde.git. Since the code is still fresh in my mind, I thought I’d write another article on how to write a SerDe, since the official documentation on how to do it it scarce and you’d have to […]

Tagged: hive, java

Posted in: programming

Writing a Hive SerDe for LWES event files

October 27, 2009
by Roberto Congiu

0

I am currently working to set up an OLAP data warehouse using Hive on top of Hadoop. We have a considerable amount of data that comes from the ad servers on which we need to perform various kinds of analysis. Writing a map-reduce job is not difficult in principle – it’s just time consuming and […]

Tagged: hive, java

Posted in: programming

Data Warehousing Books

October 27, 2009
by Roberto Congiu

0

With the constant increasing of the quantity of data that companies collect and need to process, Data Warehousing is a job sector that’s expnding even in the recession. It it also living a second youth, thanks to a number of open source projects that have been slowly but surely gaining popularity in a manner similar […]

Tagged: java

Posted in: programming

Joins in Hadoop using CompositeInputFormat

June 7, 2009
by Roberto Congiu

0

One of the first questions that a ‘traditional’ ETL engineer asks when learning hadoop is, “How do I do a join ?” For instance, how can we do in hadoop something like querying for the names of all employees who are in a California city: SELECT e.name, c.name from employees e INNER JOIN cities c […]

Tagged: hadoop, java

Posted in: programming

Streaming Data to HTTP using Akka Streams with Exponential Backoff on 429 Too Many Requests

Basic Authorization and htaccess style authentication on the Play! Framework an Silhouete

Custom Window Function in Spark to create Session IDs

Creating Nested data (Parquet) in Spark SQL/Hive from non-nested data

Panna Cotta, my recipe.

Structured data in Hive: a generic UDF to sort arrays of structs

A JSON read/write SerDe for Hive

Writing a Hive SerDe for LWES event files

Data Warehousing Books

Joins in Hadoop using CompositeInputFormat

Tweets

Recent Posts

Clustermap

Archives

Categories

Meta