Linq to Objective-C

In the past few months I have been immersing myself into the world of iOS development. Whilst the Objective-C language takes a little getting used to, with its odd syntax and memory management rules, it is certainly a fun platform to develop for. I do miss a few C# language features, such as events, but for the most part I am a happy iOS developer.

... that is apart from Linq!

tree

A few weeks back I was writing some examples for a future release of the ShinobiControls Grid. The example in question required a datasource of people grouped by the first letter of their surname. Despite the use of predicates and key valued sorting, the code looked like this:

- (NSArray*) createGroupedTestData
{
    NSMutableArray* groupedData = [[NSMutableArray alloc] init];

    // create some people
    NSArray* people = [PersonDataSource generatePeople:50];

    // create a set of letters - based on the first letter of the surname
    NSMutableSet* groups = [[NSMutableSet alloc] init];
    for(PersonDataObject* person in people)
    {
        [groups addObject:[person.surname substringToIndex:1]];
    }

    // create the groups
    for(NSString* letter in groups)
    {
        PersonGroup* group = [[PersonGroup alloc] init];
        group.letter = letter;

        // locate the person objects that belong in this group
        NSPredicate* predicate = [NSPredicate predicateWithBlock:^BOOL(id evaluatedObject, NSDictionary *bindings) {
            PersonDataObject* person = (PersonDataObject*)evaluatedObject;
            return [[person.surname substringToIndex:1] isEqualToString:letter];
        }];
        group.items = [people filteredArrayUsingPredicate:predicate];

        [groupedData addObject:group];
    }

    // sort the groups
    NSSortDescriptor* sortDescriptor = [[NSSortDescriptor alloc] initWithKey:@"letter" ascending:YES];
    NSArray* sortedGroupedData = [groupedData sortedArrayUsingDescriptors:@[sortDescriptor]];

    return sortedGroupedData;
}

That's a pretty unwieldy lump in my opinion. With Linq I am used to performing this sort of logic with a single query!

So, I decided to go ahead and implement a bunch of Linq methods for Objective-C, which results in a much cleaner implementation:

- (NSArray*) createGroupedTestData
{

    // create some people
    NSArray* people = [PersonDataSource generatePeople:50];

    Selector firstLetterOfSurname = ^id(id person){
        return [[person surname] substringToIndex:1];
    };

    Selector groupForLetter = ^id(id letter) {
        PersonGroup* group = [[PersonGroup alloc] init];
        group.letter = letter;
        group.items = [people where:^BOOL(id person) {
            return [[[person surname] substringToIndex:1] isEqualToString:letter];
        }];
        return group;
    };

    return [[[[people   select:firstLetterOfSurname]
                        sort]
                        distinct]
                        select:groupForLetter];
}

Not only is the above code more concise, which is nearly always a good thing (unless you are talking regex-concise!), it is also much more readable. The 'select, distinct, sort, select' query makes it much easier to determine what the resulting output of this code will be.

In this blog post I'll explain what Linq is (for those of you who have not heard of it before), and how I implemented Linq-to-Objective-C. If you just want to use the code, you can grab a copy from github.

An introduction to Linq

If you are a seasoned .NET developer, you can skip this section, however, if you have not heard of Linq before, then read on ...

Linq was added to the C# language back in 2007. The name 'linq' stands for Language Integrated Query, and as the name implies, its original aim was to more tightly integrate queries into the C# language. As an example, when querying database developers find themselves writing queries using SQL statements embedded within literal strings:

string query = "SELECT * FROM Person WHERE age < 10"

The problem with the above code is that the SQL statements are not checked by the compiler, and as a result, errors will only surface at runtime. This is not a problem that is peculiar to C# - you will find SQL statements embedded within strings in virtually every language.

With Linq queries are no longer literal strings, they are instead constructed using keywords that are part of the C# language itself:

var query =  from p in PersonCollection
               where p.age < 10
               select p;

This allows the compiler to check your query syntax, resulting in less error prone code.

C# provides a couple of different syntaxes for Linq, the query syntax, shown above - and the fluent syntax, where method invocations are chained together.

The two are functionally equivalent:

var query =  from p in PersonCollection
               where p.age < 10
               select p;

var query = PersonCollection.Where(p => p.age < 10);

Whilst the application of Linq for querying database is obvious, Linq can be used to query practically anything. The .NET framework supplies Linq to SQL, XML and Objects - while the community has added Linq to CSV, JSON, DataSets, Facebook, and my own contribution Linq-to-VisualTree.

I've found myself using Linq in virtually every .NET application I have ever written - and I miss it!

Fortunately the ~25 different Linq extensions methods are actually quite easy to implement, as the C# guru Jon Skeet demonstrated in an epic series of blog posts. So, why not bring this API to Objective-C?

Linq to Objective-C

In this section I'll rattle through the implementation of a number of Linq methods, for the full set, head to github.

Where

The Linq 'where' method filters an array of objects based on a predicate that returns true for any object that should be included in the output array. Using a block for the predicate, gives the following method signature:

typedef BOOL (^Predicate)(id);

/** Filters a sequence of values based on a predicate.

 @param The function to test each source element for a condition.
 @return The elements from the input sequence that satisfy the condition.
 */
- (NSArray*) where:(Predicate)predicate;

Before diving straight into the implementation, I'll demonstrate its usage via a unit test. Given the following array of Person objects:

- (NSArray*) createTestData
{
    return @[[Person personWithName:@"bob" age:@25],
    [Person personWithName:@"frank" age:@45],
    [Person personWithName:@"ian" age:@35],
    [Person personWithName:@"jim" age:@25],
    [Person personWithName:@"joe" age:@55]];
}

Where Person is a simple object with name and age properties. The test I created for the where method is as follows:

- (void)testWhere
{
    NSArray* input = [self createTestData];

    NSArray* peopleWhoAre25 = [input where:^BOOL(id person) {
        return [[person age] isEqualToNumber:@25];
    }];

    STAssertEquals(peopleWhoAre25.count, 2U, @"There should have been 2 items returned");
    STAssertEquals([peopleWhoAre25[0] name], @"bob", @"Bob is 25!");
    STAssertEquals([peopleWhoAre25[1] name], @"jim", @"Jim is 25!");
}

Which simply tests that the simple where query successfully returns all Person instances with an age of 25.

The implementation of where is really quite trivial:

- (NSArray *)where:(Predicate)predicate
{
    NSMutableArray* result = [[NSMutableArray alloc] init];
    for(id item in self) {
       if (predicate(item)) {
           [result addObject:item];
       }
    }
    return result;
}

We're off to a good start!

So, how does this compare to the existing Objective-C APIs? Through the use of predicates it is quite easy to perform the same 'people who are 25' query. You can use block syntax:

NSPredicate* predicate = [NSPredicate predicateWithBlock:^BOOL(id person, NSDictionary *bindings) {
    return [[person age] isEqualToNumber:@25];
}];
NSArray* peopleWhoAre25 = [input filteredArrayUsingPredicate:predicate];

Or the more concise string-based predicates:

NSArray* peopleWhoAre25 = [input filteredArrayUsingPredicate:[NSPredicate predicateWithFormat:@"age=25"]];

With the Objective-C support for predicates, this Linq-style where method might not be the most useful, but the others are more interesting.

Select

The Linq 'select' method allows you to perform transformation, which is known as a projection. Each element is acted upon by a selector function, with the returned result being used to populate the output sequence. Here's the method signature:

typedef id (^Selector)(id);

/** Projects each element of a sequence into a new form.

 @param selector Projects each element of a sequence into a new form.
 @return An array whose elements are the result of invoking the transform function on each element of source.
 */
- (NSArray*) select:(Selector)selector;

Again, I'll demonstrate the using via a unit test. This example uses the select method to extract the name property for each person objects in the source array:

- (void)testSelect
{
    NSArray* input = [self createTestData];

    NSArray* names = [input select:^id(id person) {
        return [person name];
    }];

    STAssertEquals(names.count, 5U, nil);
    // 'spot' check a few values
    STAssertEquals(names[0], @"bob", nil);
    STAssertEquals(names[4], @"joe", nil);
}

Again, the implementation is really quite simple:

- (NSArray *)select:(Selector)selector
{
    NSMutableArray* result = [[NSMutableArray alloc] init];
    for(id item in self) {
        [result addObject:selector(item)];
    }
    return result;
}

Sort

Linq doesn't actually have a sort method, instead it provides OrderBy and ThenBy methods. Because I don't want to get into multi-property sorting, I'll substitute both of these for a single 'sort' function.

/** Sorts the elements of a sequence in ascending order.

@return An array whose elements are sorted in ascending order. */- (NSArray*) sort;

/** Sorts the elements of a sequence in ascending order by using a specified keySelector.

@param keySelector A selector that provides the 'key' which the array should by sorted by. @return An array whose elements are sorted in ascending order. */- (NSArray*) sort:(Selector)keySelector;

The first method simply sorts the array by invoking the compare method on each object. The second uses a selector to extract the key that is then used for the comparison.

The unit tests are as follows:

- (void)testSort
{
    NSArray* input = @[@21, @34, @25];

    NSArray* sortedInput = [input sort];

    STAssertEquals(sortedInput.count, 3U, nil);
    STAssertEqualObjects(sortedInput[0], @21, nil);
    STAssertEqualObjects(sortedInput[1], @25, nil);
    STAssertEqualObjects(sortedInput[2], @34, nil);
}

- (void)testSortWithKeySelector
{
    NSArray* input = [self createTestData];

    NSArray* sortedByName = [input sort:^id(id person) {
        return [person name];
    }];

    STAssertEquals(sortedByName.count, 5U, nil);
    STAssertEquals([sortedByName[0] name], @"bob", nil);
    STAssertEquals([sortedByName[1] name], @"frank", nil);
    STAssertEquals([sortedByName[2] name], @"ian", nil);
    STAssertEquals([sortedByName[3] name], @"jim", nil);
    STAssertEquals([sortedByName[4] name], @"joe", nil);
}

As you can see, one simply sorts an array of NSNumber instances, whereas the other sorts an array of People instances via the name property using the key selector.

The implementation is as follows:

- (NSArray *)sort:(Selector)keySelector
{
    return [self sortedArrayUsingComparator:^NSComparisonResult(id obj1, id obj2) {
        id valueOne = keySelector(obj1);
        id valueTwo = keySelector(obj2);
        NSComparisonResult result = [valueOne compare:valueTwo];
        return result;
    }];
}

- (NSArray *)sort
{
    return [self sort:^id(id item) { return item;} ];
}

The sort-with-keySelector method makes use of the NSArray sort method that sorts via the result returned by a block. The 'natural' sort simply uses an identity selector in order to re-use the same implementation.

OfType

The Linq of-type method is use to filter a list of objects to remove all of those which are not of a specific type. Here's the method signature:

/** Filters the elements of an an array based on a specified type.

 @param type The type to filter the elements of the sequence on.
 @return An array whose elements are all of the given type.
 */
- (NSArray*) ofType:(Class)type;

And here is the associated test:

- (void)testOfType
{
    NSArray* mixed = @[@"foo", @25, @"bar", @33];

    NSArray* strings = [mixed ofType:[NSString class]];

    STAssertEquals(strings.count, 2U, nil);
    STAssertEqualObjects(strings[0], @"foo", nil);
    STAssertEqualObjects(strings[1], @"bar", nil);
}

Which filters out the NSString instances from a mixed array.

The implementation is quite trivial, making use of the where method which was detailed above:

- (NSArray *)ofType:(Class)type
{
    return [self where:^BOOL(id item) {
        return [[item class] isSubclassOfClass:type];
    }];
}

The ofType method is quite useful for tasks such as finding all the subviews of a UIView that are of a specific type.

Distinct

The distinct method returns the distinct elements from an array, in other words, it removes duplicates based on an equality comparison. The signature is as follows:

/** Returns distinct elements from a sequence.

 @return An array of distinct elements.
 */
- (NSArray*) distinct;

The unit test is as follows:

- (void)testDistinct
{
    NSArray* names = @[@"bill", @"bob", @"bob", @"brian", @"bob"];

    NSArray* distinctNames = [names distinct];

    STAssertEquals(distinctNames.count, 3U, nil);
    STAssertEqualObjects(distinctNames[0], @"bill", nil);
    STAssertEqualObjects(distinctNames[1], @"brian", nil);
    STAssertEqualObjects(distinctNames[2], @"bob", nil);
}

The implementation uses an NSMutableArray to construct an array of unique objects:

- (NSArray *)distinct
{
    NSMutableArray* distinctSet = [[NSMutableArray alloc] init];
    for (id item in self) {
        if (![distinctSet containsObject:item]) {
            [distinctSet addObject:item];
        }
    }
    return distinctSet;
}

Note that the use of containsObject on an NSMutableArray is a potential performance issue for large arrays of data. A better method might make use of an NSSet which uses the object's hash in order to more quickly determine whether an item is already contained within a set.

Select Many

The Linq select-many method is projection where the selector returns an array of objects. The select-many function flattens the returned arrays into a single array. For a good visual explanation of select-many check out this blog post.

Here is the method signature:

/** Projects each element of a sequence to an NSArray and flattens the resulting sequences into one sequence.

 @param transform A transform function to apply to each element, this should return an NSArray.
 @return An array whose elements are the result of invoking the one-to-many transform function on each element of the input sequence.
 */
- (NSArray*) selectMany:(Selector)transform;

The unit test takes an array of string, projecting each element to an array via the componentsSeparatedByString method:

- (void)testSelectMany
{
    NSArray* data = @[@"foo, bar", @"fubar"];

    NSArray* components = [data selectMany:^id(id string) {
        return [string componentsSeparatedByString:@", "];
    }];

    STAssertEquals(components.count, 3U, nil);
    STAssertEqualObjects(components[0], @"foo", nil);
    STAssertEqualObjects(components[1], @"bar", nil);
    STAssertEqualObjects(components[2], @"fubar", nil);
}

A more realistic example might take an array of orders, projecting each order to the order-items that it contains, with the result being all of the order-items over all orders ... this could of course be followed by a 'distinct' query which would find all the distinct (i.e. unique) items ordered for a collection of orders.

The implementation involves a couple of nested for-in loops:

- (NSArray *)selectMany:(Selector)transform
{
    NSMutableArray* result = [[NSMutableArray alloc] init];
    for(id item in self) {
        for(id child in transform(item)){
            [result addObject:child];
        }
    }
    return result;
}

Aggregate

The next Linq method, aggregate, is an interesting one because it doesn't return an array, instead it applies an accumulator function to the elements of an array, returning a single result. The signature is as follows:

/** Applies an accumulator function over a sequence.

 @param accumulator An accumulator function to be invoked on each element.
 @return The final accumulator value.
 */
- (id) aggregate:(Accumulator)accumulator;

The unit test uses this method to convert an array of strings into a CSV:

- (void)testAggregate
{
    NSArray* names = @[@"bill", @"bob", @"brian"];

    id aggregate = [names aggregate:^id(id item, id aggregate) {
        return [NSString stringWithFormat:@"%@, %@", aggregate, item];
    }];

    STAssertEqualObjects(aggregate, @"bill, bob, brian", nil);
}

Although you can use aggregate to do all sorts of things, such as find the maximum value for an array of integers.

The implementation is pretty simple, seeding the aggregate with the initial value and using the accumulator for subsequent items:

- (id)aggregate:(Accumulator)accumulator
{
    id aggregate = nil;
    for (id item in self) {
        if (aggregate == nil) {
            aggregate = item;
        } else {
            aggregate = accumulator(item, aggregate);
        }
    }
    return aggregate;
}

A brief note on deferred execution

In C# Linq methods do not act on arrays, instead they act upon the IEnumerable interface. This allows them to query anything that can be enumerated, which not only includes arrays, but can also include infinite sources of data (such as a stream of random numbers). The implementation of Linq relies on the 'yield' keyword, which results in the C# compiler creating a state machine to implement the IEnumerable interface.

The practical implication of all this very clever stuff is that Linq queries are not executed immediately. As an example, when you create a query as follows:

var query = PersonCollection.Where(p => p.Age < 10)
                            .OrderBy(p => p.Name);

Nothing will actually happen until you start to enumerate the 'query' variable. In other words, you have to 'pull' data from the query.

This results in a feature which is often called 'deferred execution', where queries are executed on demand. This can yield tremendous performance benefits, allowing queries to terminate early.

Objective-C does not have 'yield' keyword, making it hard to implement Linq in such a way as to allow deferred execution. I did actually manage to create an implementation of a few query methods where each were presented as a state machine. Also, the Obj-C for-in iterator uses the NSFastEnumeration, which I was able to implement giving deferred execution and early termination. However, I decided to stick with the much simpler NSArray implementation here. If anyone is interested in the more complex version, give me a shout - I'd be happy to share.

Conclusions

Creating a Linq-style API for Objective-C has proven to be relatively simple (and fun!). I have provided implementations for some of the most common and useful Linq methods, however the .NET Linq APIs for querying collections have a great many more methods than the ones I have covered.

If you are interested, why not pop over to github, fork and add a few more? I'll probably add to this project myself over time.

Regards, Colin E.

Scott Logic / Altogether Smarter

Linq to Objective-C

An introduction to Linq

Linq to Objective-C

Where

Select

Sort

OfType

Distinct

Select Many

Aggregate

A brief note on deferred execution

Conclusions

Read more

Want to receive more insights?

Linq to Objective-C

An introduction to Linq

Linq to Objective-C

Where

Select

Sort

OfType

Distinct

Select Many

Aggregate

A brief note on deferred execution

Conclusions

Read more

Want to receive more insights?

Categories