Physical Address

304 North Cardinal St.
Dorchester Center, MA 02124

Rockfish helps companies leverage synthetic data


For years, Vyas Sekar would call Muckai Girish, an old friend from high school, to talk about potential startup ideas and get Girish’s opinion. The two would usually talk through an idea and end the conversation on it. When Sekar called Girish with an idea involving synthetic data in early 2022, the conversation didn’t just end when they hung up the phone.

Sekar and his colleague at Carnegie Mellon University, Giulia Fanti, have been working on building synthetic data to solve the crisis of reproducibility, or inability to reproduce data, in academia. While Sekar saw the need for a solution in academia, Girish knew that his clients at the time were facing the same problem. After talking with a few companies, the thesis was further validated.

“At the time, I felt this was very real and there was an opportunity,” CEO Girish told TechCrunch. “So that’s what started us and in the next two months we talked to some investors, people we knew, and more importantly companies and realized that this was a significant problem and worth putting, you know, a whole life behind to this.”

The result is Rockfish, a startup that uses generative AI to create synthetic data for operational workflows to help companies break down their data silos. Rockfish integrates with database providers including AWS and Azure, among others, and helps users choose the best configuration for their data based on company policies or uses for the data.

Synthetic data is increasingly becoming a hot topic in the AI ​​world, but there was already growing momentum for it when the company launched in June 2022. Girish said Rockfish wanted to make sure it built a product that was differentiated from its product. peers and also a solution that companies use every day, not just every now and then.

That’s why the company’s product is designed to constantly ingest data and is focused on operational data, which includes data on things like financial transactions, cybersecurity and supply chains. These areas are constantly producing data for businesses and are also constantly changing. Girish believes that focus here helps Rockfish stand out from other competitors.

The company now works with a handful of enterprise customers, Girish said, including streaming analytics platform Conviva, in addition to government departments including the US Army and the US Department of Defense. United States.

Rockfish announces a $4 million seed round led by Emergent Ventures with participation from Foster Ventures, TEN13, and Dallas VC, among others. This brings the company’s total funding to around $6 million.

Anupam Rastogi, a managing partner at Emergent Ventures, told TechCrunch that he had tracked down Sekar long before Rockfish was founded. He said that what caused the company to invest was “team, market and product, in that order”. Additionally, Rockfish’s focus on building for businesses made it a better fit for Emergent than some of the other players in the space.

“The team is high-quality data scientists, multiple PhDs,” said Rastogi. “This is an area that we think is very technically sophisticated and having that technical force around the table is really critical. They have done a lot of fundamental work in the space, not only in the company, but the whole industry.”

While Rockfish hopes its focus will help give it a moat among competitors, it doesn’t change the fact that synthetic data will likely be an increasingly crowded market. AI companies are turning to synthetic data as many players think the market has exhausted other AI training data.

There are already numerous startups trying to tackle the market, including AI tonicwhich has raised more than $45 million in venture funding; Especially AIwhich raised $31 million in VC funding; and Nebulawhich raised $14.5 million before being acquired by SAS in 2024, just to name a few.

Girish said the company is looking to add to its approach to synthetic data by incorporating other types of models such as state space models, mathematical models that use state variables. The company is also looking to improve its end-to-end functions.

“It’s not like you’re taking random data from the internet and generating synthetic data,” Girish said. “There is no guarantee that it will do well. But if you put all this together for business, it is actually very relevant and realistic. So that is the key to it, and then being able to do this on a constant basis is what that we find useful.”



Source link